Indexing is the process by which the set of documents is analyzed and the data entered
into the database. Recoll indexing is normally incremental: documents will only be processed
if they have been modified since the last run. On the first execution, all documents will
need processing. A full index build can be forced later by specifying an option to the
indexing command (recollindex
-z or -Z).
recollindex skips files which caused an error during a previous
pass. This is a performance optimization, and the command line option -k
can be set to retry failed files, for example after updating an input handler.
When a file has been deleted, recollindex removes the corresponding data from the index. The exact moment when this happens depends on the indexing mode. There are provisions to avoid deleting data for an umounted removable volume.
The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections.
Depending on your data, temporary files may be created during indexing, some of them
possibly quite big. You can set the RECOLL_TMPDIR environment variable to
determine where they are created. If RECOLL_TMPDIR is not set, Recoll will fall
back to other locations depending on the system. On Unix-like systems and MacOS systems TMPDIR,
TMP and TEMP will be tried before falling back to
/tmp/. On Windows, Recoll will call the GetTempPath() function. Using the
system normal mechanism instead of RECOLL_TMPDIR has the nice property that the
auxiliary commands executed by recollindex should then create their own
temporary files in the same location.

