Query data access for external indexers

Recoll has internal methods to access document data for its internal (filesystem) indexer. An external indexer needs to provide data access methods if it needs integration with the GUI (e.g. preview function), or support for the rclextract module.

The index data and the access method are linked by the rclbes (recoll backend storage) Doc field. You should set this to a short string value identifying your indexer (e.g. the filesystem indexer uses either "FS" or an empty value, the Web history indexer uses "BGL").

The link is actually performed inside a backends configuration file (stored in the configuration directory). This defines commands to execute to access data from the specified indexer. Example, for the mbox indexing sample found in the Recoll source (which sets rclbes="MBOX"):

[MBOX]
fetch = /path/to/recoll/src/python/samples/rclmbox.py fetch
makesig = path/to/recoll/src/python/samples/rclmbox.py makesig

fetch and makesig define two commands to execute to respectively retrieve the document text and compute the document signature (the example implementation uses the same rclmbox.py script with different first parameters to perform both operations, but this is in no way mandatory).

The scripts are called with three additional arguments: udi, url, ipath. These were set by the indexer and stored with the document by the addOrUpdate() call described above. Not all arguments are needed in all cases, the script will use what it needs to perform the requested operation. The caller expects the result data on stdout.