Indexing is the process by which the set of documents is analyzed and the data entered
into the database. Recoll indexing is normally incremental: documents will only be processed
if they have been modified since the last run. On the first execution, all documents will
need processing. A full index build can be forced later by specifying an option to the
indexing command (recollindex
-z
or -Z
).
recollindex skips files which caused an error during a previous
pass. This is a performance optimization, and the command line option -k
can be set to retry failed files, for example after updating an input handler.
When a file has been deleted, recollindex removes the corresponding data from the index. The exact moment when this happens depends on the indexing mode. There are provisions to avoid deleting data for an umounted removable volume.
The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections.
Depending on your data, temporary files may be needed during indexing, some of them
possibly quite big. You can use the
RECOLL_TMPDIR
or TMPDIR
environment
variables to determine where they are created (the default is to
use /tmp
). Using TMPDIR
has
the nice property that it may also be taken into account by
auxiliary commands executed by recollindex.