Words and spans

Some important searchable text elements contain non-alphanumeric characters, for example, email addresses (jfd@recoll.org), proper names (O'Brien) or internet addresses (192.168.4.1).

If we treat the special characters as white space in this situation, the only way to search for these terms with a reasonable degree of precision would to use phrase searches ("jf dockes org").

However, phrase searches need a lot of computation and are generally slower. This was especially true with older Xapian versions.

Recoll has special processing for these elements, designated as spans. The corresponding linkage characters will be designated as span glue in the following.

When indexing a span like jfd@recoll.org, Recoll generates both regular individual terms (jfd, recoll, org) and multiword terms linked by span glue: jfd@recoll.org, jfd@recoll, recoll.org.

When searching, only the larger term (complete span: jfd@recoll.org) is used, so that Xapian executes a regular single-term search instead of a phrase one.