Configurations, multiple indexes

Recoll supports defining multiple indexes, each defined by its own configuration directory. A configuration directory contains several files which describe what should be indexed and how.

When recoll or recollindex is first executed, it creates a default configuration directory. This configuration is the one used for indexing and querying when no specific configuration is specified. It is located in $HOME/.recoll/ for Unix-like systems and %LOCALAPPDATA%/Recoll on Windows (typically C:/Users/[me]/Appdata/Local/Recoll).

All configuration parameters have defaults, defined in system-wide files. Without further customisation, the default configuration will process your complete home directory, with a reasonable set of defaults. It can be adjusted to process a different area of the file system, select files in different ways, and many other things.

In some cases, it may be useful to create additional configuration directories, for example, to separate personal and shared indexes, or to take advantage of the organization of your data to improve search precision.

In order to do this, you would create an empty directory in a location of your choice, and then instruct recoll or recollindex to use it by setting either a command line option (-c /some/directory), or an environment variable (RECOLL_CONFDIR=/some/directory). Any modification performed by the commands (e.g. configuration customisation or searches by recoll or index creation by recollindex) would then apply to the new directory and not to the default one.

Once multiple indexes are created, you can use each of them separately by setting the -c option or the RECOLL_CONFDIR environment variable when starting a command, to select the desired index.

It is also possible to instruct one configuration to query one or several other indexes in addition to its own, by using the External index function in the recoll GUI, or some equivalent in the command line and programming tools.

A plausible usage scenario for the multiple index feature would be for a system administrator to set up a central index for shared data, that you choose to search or not in addition to your personal data. Of course, there are other possibilities. for example, there are many cases where you know the subset of files that should be searched, and where narrowing the search can improve the results. You can achieve approximately the same effect by using a directory filter clause in a search, but multiple indexes may have better performance and may be worth the trouble in some cases.

A more advanced use case would be to use multiple indexes to improve indexing performance, by updating several indexes in parallel (using multiple CPU cores and disks, or possibly several machines), and then merging them, or querying them in parallel.

See the section about configuring multiple indexes for more detail