Introduction

Indexing is the process by which the set of documents is analyzed and the data entered into the database. Recoll indexing is normally incremental: documents will only be processed if they have been modified since the last run. On the first execution, all documents will need processing. A full index build can be forced later by specifying an option to the indexing command (recollindex -z or -Z).

recollindex skips files which caused an error during a previous pass. This is a performance optimization, and the command line option -k can be set to retry failed files, for example after updating an input handler.

When a file has been deleted, recollindex removes the corresponding data from the index. The exact moment when this happens depends on the indexing mode. There are provisions to avoid deleting data for an umounted removable volume.

The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections.

Depending on your data, temporary files may be created during indexing, some of them possibly quite big. You can set the RECOLL_TMPDIR environment variable to determine where they are created. If RECOLL_TMPDIR is not set, Recoll will fall back to other locations depending on the system. On Unix-like systems and MacOS systems TMPDIR, TMP and TEMP will be tried before falling back to /tmp/. On Windows, Recoll will call the GetTempPath() function. Using the system normal mechanism instead of RECOLL_TMPDIR has the nice property that the auxiliary commands executed by recollindex should then create their own temporary files in the same location.