The term explorer tool

Recoll automatically manages the expansion of search terms to their derivatives (e.g.: plural/singular, verb inflections). But there are other cases where the exact search term is not known. For example, you may not remember the exact spelling, or only know the beginning of the name.

The search will only propose replacement terms with spelling variations when no matching document were found. In some cases, both proper spellings and mispellings are present in the index, and it may be interesting to look for them explicitly.

The term explorer tool (started from the toolbar icon or from the Term explorer entry of the Tools menu) can be used to search the full index terms list, or (later addition), display some statistics or other index information. It has several modes of operations:

Wildcard

In this mode of operation, you can enter a search string with shell-like wildcards (*, ?, []). e.g.: xapi* would display all index terms beginning with xapi. (More about wildcards here).

Regular expression

This mode will accept a regular expression as input. Example: word[0-9]+. The expression is implicitly anchored at the beginning. E.g.: press will match pression but not expression. You can use .*press to match the latter, but be aware that this will cause a full index term list scan, which can be quite long.

Stem expansion

This mode will perform the usual stem expansion normally done as part user input processing. As such it is probably mostly useful to demonstrate the process.

Spelling/Phonetic

In this mode, you enter the term as you think it is spelled, and Recoll will do its best to find index terms that sound like your entry. This mode uses the Aspell spelling application, which must be installed on your system for things to work (if your documents contain non-ascii characters, Recoll needs an aspell version newer than 0.60 for UTF-8 support). The language which is used to build the dictionary out of the index terms (which is done at the end of an indexing pass) is the one defined by your NLS environment. Weird things will probably happen if languages are mixed up.

Show index statistics

This will print a long list of boring numbers about the index

List files which could not be indexed

This will show the files which caused errors, usually because recollindex could not translate their format into text.

Note that in cases where Recoll does not know the beginning of the string to search for (e.g. a wildcard expression like *coll), the expansion can take quite a long time because the full index term list will have to be processed. The expansion is currently limited at 10000 results for wildcards and regular expressions. It is possible to change the limit in the configuration file.

Double-clicking on a term in the result list will insert it into the simple search entry field. You can also cut/paste between the result list and any entry field (the end of lines will be taken care of).