Using the log file to investigate indexing issues

All Recoll processes print trace messages. By default these go to the standard error output, and you may not ever see them (in the case, for example, of the recoll GUI started from the desktop interface).

There are a number of potential issues with indexing that may need investigation, such as:

  • A file can’t be found by searching even if it appears that it should have be indexed (this could happen because the file is not selected at all or because a filter program crashes).

  • The indexing process gets stuck and never finishes.

  • The indexing process ends up with an error.

  • The indexing process seems to be using too much system capacity.

The best way to approach these problems is to use the recollindex command line tool (instead of the recoll GUI), and to set up the trace log to provide information about what indexing is actually doing.

But first a few things to check if a file is not indexed. There are possible causes among the configuration parameters:

  • compressedfilemaxkbs is the maximum compressed file size. recollindex will not try to uncompress bigger files (.gz/.bz2/.xz). The default is 50 megabytes (50000).

  • skippedNames and skippedPaths exclude files by name, see the manual.

  • Other reasons are possible, see especially the document selection section of the manual.

Trace log parameters can be set either from the GUI Preferences→Indexing Configuration→Global Parameters panel, or by editing the configuration file '~/.recoll/recoll.conf'.

When investigating problems, it’s often preferrable to edit the file. Set the following parameters to log to stderr, at maximum verbosity, and to turn multithreading off.

idxloglevel = 5
idxlogfilename = stderr
thrQSizes = -1 -1 -1

We use stderr instead of an actual file in order to capture direct filter messages (such as a python stack trace) along with normal recollindex messages.

The last line sets recollindex for single-threaded operation, which will make the log much more readable.

You should then check that no recoll or recollindex process is currently running, and kill any you find.

Then, if this is an issue about an identified file, try indexing it only:

recollindex -e -i /path/to/myunfindablefile.xxx > /tmp/myindexlog 2>&1

If this is a general issue with indexing (process not finishing properly), just start it:

recollindex > /tmp/myindexlog 2>&1

Usually, having a look at the trace will allow to see what is wrong (e.g.: a configuration issue or missing filter), and solve the problem.

In case of indexer misbehaviour (e.g. using too much memory, you should run tail -f on the log to see what is going on.

If this is not enough, please open a tracker issue and attach or link to the log data, or just email me (jfd at recoll.org).

recollindex and recollindex -i usually have the same criteria to include a file or not (but see the Path gotcha note below). It may happen that they behave differently, so it may sometimes be useful to run a full recollindex even for a specific file, but this will produce a big log file.

When you are done, it is better to reset the verbosity to a reasonable level (e.g.: 2 : just errors, 3 : information, listing indexed files).

Note: the path gotcha

recollindex -i will only index files under the directories defined by the topdirs configuration variable (your home directory by default). Unfortunately, the test is done on the file path text, ignoring possible symbolic links. If you give a simple file name as a parameter to recollindex -i and there are symbolic links inside the topdirs entries, the comparison may fail. For example, if your home directory is '/home/me/' and '/home/' is a link to '/usr/home/', recollindex -i somefilename will actually try to index '/usr/home/somefilename/', and fail (because '/usr/home/me/' is not a subdirectory of '/home/me/'). This will manifest itself in the log by a message like the following.

:4:../index/fsindexer.cpp:149:FsIndexer::indexFiles: skipping [/usr/home/me/somefile] (ntd)

If this happens, give a full path consistent with what is found in the configuration file (e.g.: recollindex -i /home/me/somefile).

File system occupation

One of the possible reasons for failed indexing is a maxfsoccup parameter set too low. This is the value of file system occupation, not free space, where indexing will stop. It is set from the GUI indexing configuration or by editing 'recoll.conf'. A value of 0 implies no checking, but a very low, non-zero, value will just prevent indexing.