There are a number of potential issues with indexing that may need investigation, such as:

  • A file can’t be found by searching even if it appears that it should have be indexed (this could happen because the file is not selected at all or because a filter program crashes).

  • The indexing process gets stuck and never finishes.

  • The indexing process ends up with an error.

  • The indexing process seems to be using too much system capacity.

But first a few things to check if a file is not indexed. There are possible causes among the configuration parameters:

  • compressedfilemaxkbs is the maximum compressed file size. recollindex will not try to uncompress bigger files (.gz/.bz2/.xz). The default is 50 megabytes (50000).

  • skippedNames and skippedPaths exclude files by name, see the manual.

  • Other reasons are possible, see especially the document selection section of the manual.

When investigating an indexing issue, it is preferrable to set a separate indexer log and, if running Linux or Mac OS, log to stderr to catch possible external commands error messages.

See log file setup for details about setting up the message logs.

Also set the indexer in single-threading mode, this will produce a more readable message log. The Windows indexer is single-threaded. On Linux or Mac OS, edit $HOME/.recoll/recoll.conf and add:

thrQSizes = -1 -1 -1

You should then check that no recoll or recollindex process is currently running, and kill any you find.

Then, if this is an issue about an identified file, try indexing it only:

recollindex -e -i /path/to/myunfindablefile.xxx > /tmp/myindexlog 2>&1

If this is a general issue with indexing (process not finishing properly), just start it:

recollindex > /tmp/myindexlog 2>&1

Usually, having a look at the trace will allow to see what is wrong (e.g.: a configuration issue or missing filter), and solve the problem.

In case of indexer misbehaviour (e.g. crashing or using too much memory), you should run tail -f on the log to see what is going on.

If this is not enough, please open a tracker issue and attach or link to the log data, or just email me (jfd at recoll.org).

recollindex and recollindex -i usually have the same criteria to include a file or not (but see the Path gotcha note below). It may happen that they behave differently, so it may sometimes be useful to run a full recollindex even for a specific file, but this will produce a big log file.

When you are done, it is better to reset the verbosity to a reasonable level (e.g.: 2 : just errors, 3 : information, listing indexed files).

Note: the path gotcha

recollindex -i will only index files under the directories defined by the topdirs configuration variable (your home directory by default). Unfortunately, the test is done on the file path text, ignoring possible symbolic links. If you give a simple file name as a parameter to recollindex -i and there are symbolic links inside the topdirs entries, the comparison may fail. For example, if your home directory is '/home/me/' and '/home/' is a link to '/usr/home/', recollindex -i somefilename will actually try to index '/usr/home/somefilename/', and fail (because '/usr/home/me/' is not a subdirectory of '/home/me/'). This will manifest itself in the log by a message like the following.

:4:../index/fsindexer.cpp:149:FsIndexer::indexFiles: skipping [/usr/home/me/somefile] (ntd)

If this happens, give a full path consistent with what is found in the configuration file (e.g.: recollindex -i /home/me/somefile).

File system occupation

One of the possible reasons for failed indexing is a maxfsoccup parameter set too low. This is the value of file system occupation, not free space, where indexing will stop. It is set from the GUI indexing configuration or by editing recoll.conf. A value of 0 implies no checking, but a low, non-zero, value will just prevent indexing.