Multiple Recoll indexes can be created by using several configuration directories which would typically be set to index different areas of the file system.
A plausible usage scenario for the multiple index feature would be for a system administrator to set up a central index for shared data, that you choose to search or not in addition to your personal data. Of course, there are other possibilities. for example, there are many cases where you know the subset of files that should be searched, and where narrowing the search can improve the results. You can achieve approximately the same effect by using a directory filter clause in a search, but multiple indexes may have better performance and may be worth the trouble with huge data sets.
A more advanced use case would be to use multiple indexes to improve indexing performance, by updating several indexes in parallel (using multiple CPU cores and disks, or possibly several machines), and then either merging them, or querying them in parallel.
A specific configuration can be selected by setting the RECOLL_CONFDIR
environment variable or giving the -c option to recoll
and recollindex.
The recollindex program, used for creating or updating indexes, always works on a single index. The different configurations are entirely independent (no parameters are ever shared between configurations when indexing).
All the search interfaces (recoll, recollq, the Python API, etc.) operate with a main configuration, from which both configuration and index data are used, and can also query data from multiple additional indexes. Only the index data from additional indexes is used, their configuration parameters are ignored. This implies that some parameters should be consistent among index configurations which are to be used together.
When searching, the current main index (defined by RECOLL_CONFDIR or
-c) is always active. If this is undesirable, you can set up your base
configuration to index an empty directory.
Index configuration parameters can be set either by using a text editor on the files,
or, for most parameters, by using the recoll index configuration
GUI. In the latter case, the configuration directory for which parameters are
modified is the one which was selected by RECOLL_CONFDIR or the
-c parameter, and there is no way to switch configurations within the
GUI.
See the configuration section for a detailed description of the parameters
Some configuration parameters must be consistent among a set of multiple indexes used together for searches. Most importantly, all indexes to be queried concurrently must have the same option concerning character case and diacritics stripping, but there are other constraints. Most of the relevant parameters affect the term generation.
Using multiple configurations implies a small level of command line or file manager usage. The user must explicitly create additional configuration directories, the GUI will not do it. This is to avoid mistakenly creating additional directories when an argument is mistyped. Also, the GUI or the indexer must be launched with a specific option or environment to work on the right configuration.
To start a new configuration, you need to create an empty directory in a location of
your choice, and then instruct recoll or recollindex
to use it by setting either a command line option (-c
/some/directory), or an environment variable
(RECOLL_CONFDIR=/some/directory). Any
modification performed by the commands (e.g. configuration customisation or searches by
recoll or index creation by recollindex) would then
apply to the new directory and not to the default one.

