Recoll has internal methods to access document data for its internal (filesystem)
indexer. An external indexer needs to provide data access methods if it needs
integration with the GUI (e.g. preview function), or support for the
rclextract
module.
The index data and the access method are linked by the
rclbes
(recoll backend storage)
Doc
field. You should set this to a short string
value identifying your indexer (e.g. the filesystem indexer uses either
"FS" or an empty value, the Web history indexer uses "BGL").
The link is actually performed inside a backends
configuration
file (stored in the configuration directory). This defines commands to execute to access
data from the specified indexer. Example, for the mbox indexing sample found in the
Recoll source (which sets
rclbes="MBOX"
):
[MBOX] fetch = /path/to/recoll/src/python/samples/rclmbox.py fetch makesig = path/to/recoll/src/python/samples/rclmbox.py makesig
fetch
and makesig
define two commands to
execute to respectively retrieve the document text and compute the document signature
(the example implementation uses the same rclmbox.py
script with
different first parameters to perform both operations, but this is in no way
mandatory).
The scripts are called with three additional
arguments: udi
, url
,
ipath
. These were set by the indexer and stored with the document by
the addOrUpdate()
call described above. Not all arguments are needed
in all cases, the script will use what it needs to perform the requested operation. The
caller expects the result data on stdout
.