Recoll used to have no support for indexing removable volumes (portable disks, USB keys, etc.). Recent versions have improved the situation and support indexing removable volumes in two different ways:
By indexing the volume in the main, fixed, index, and ensuring that the volume data is not purged if the indexing runs while the volume is mounted. (since Recoll 1.25.2).
By storing a volume index on the volume itself (since Recoll 1.24).
As of version 1.25.2, Recoll provides a simple way to ensure that the index data for an absent volume will not be purged. Two conditions must be met:
The volume mount point must be a member of the
topdirs
list.The mount directory must be empty (when the volume is not mounted).
If recollindex finds that one of the
topdirs
is empty when starting up, any existing
data for the tree will be preserved by the indexing
pass (no purge for this area).
As of Recoll 1.24, it has become possible to build self-contained datasets including a Recoll configuration directory and index together with the indexed documents, and to move such a dataset around (for example copying it to an USB drive), without having to adjust the configuration for querying the index.
Note
This is a query-time feature only. The index must only be updated in its original location. If an update is necessary in a different location, the index must be reset.
The principle of operation is that the configuration stores the location of the original configuration directory, which must reside on the movable volume. If the volume is later mounted elsewhere, Recoll adjusts the paths stored inside the index by the difference between the original and current locations of the configuration directory.
To make a long story short, here follows a script to create a Recoll configuration and index under a given directory (given as single parameter). The resulting data set (files + recoll directory) can later to be moved to a CDROM or thumb drive. Longer explanations come after the script.
#!/bin/sh fatal() { echo $*;exit 1 } usage() { fatal "Usage: init-recoll-volume.sh <top-directory>" } test $# = 1 || usage topdir=$1 test -d "$topdir" || fatal $topdir should be a directory confdir="$topdir/recoll-config" test ! -d "$confdir" || fatal $confdir should not exist mkdir "$confdir" cd "$topdir" topdir=`pwd` cd "$confdir" confdir=`pwd` (echo topdirs = '"'$topdir'"'; \ echo orgidxconfdir = $topdir/recoll-config) > "$confdir/recoll.conf" recollindex -c "$confdir"
The examples below will assume that you have a dataset under
/home/me/mydata/
, with the index configuration and
data stored inside
/home/me/mydata/recoll-confdir
.
In order to be able to run queries after the dataset has been moved, you must ensure the following:
The main configuration file must define the orgidxconfdir variable to be the original location of the configuration directory (
orgidxconfdir=/home/me/mydata/recoll-confdir
must be set inside/home/me/mydata/recoll-confdir/recoll.conf
in the example above).The configuration directory must exist with the documents, somewhere under the directory which will be moved. E.g. if you are moving
/home/me/mydata
around, the configuration directory must exist somewhere below this point, for example/home/me/mydata/recoll-confdir
, or/home/me/mydata/sub/recoll-confdir
.You should keep the default locations for the index elements which are relative to the configuration directory by default (principally
dbdir
). Only the paths referring to the documents themselves (e.g.topdirs
values) should be absolute (in general, they are only used when indexing anyway).
Only the first point needs an explicit user action, the Recoll defaults are compatible with the third one, and the second is natural.
If, after the move, the configuration directory needs to be
copied out of the dataset (for example because the thumb drive is too
slow), you can set the
curidxconfdir,
variable inside the copied configuration to
define the location of the moved one. For example if
/home/me/mydata
is now mounted onto
/media/me/somelabel
, but the configuration
directory and index has been copied to
/tmp/tempconfig
, you would set
curidxconfdir
to
/media/me/somelabel/recoll-confdir
inside
/tmp/tempconfig/recoll.conf
.
orgidxconfdir
would still be
/home/me/mydata/recoll-confdir
in the original and
the copy.
If you are regularly copying the configuration out of the
dataset, it will be useful to write a script to automate the
procedure. This can't really be done inside Recoll because there are
probably many possible variants. One example would be to copy the
configuration to make it writable, but keep the index data on the
medium because it is too big - in this case, the script would also need
to set dbdir
in the copied configuration.
The same set of modifications (Recoll 1.24) has also made it possible to run queries from a readonly configuration directory (with slightly reduced function of course, such as not recording the query history).