Removable volumes

Recoll used to have no support for indexing removable volumes (portable disks, USB keys, etc.). Recent versions have improved the situation and support indexing removable volumes in two different ways:

  • By indexing the volume in the main, fixed, index, and ensuring that the volume data is not purged if the indexing runs while the volume is mounted. (since Recoll 1.25.2).

  • By storing a volume index on the volume itself (since Recoll 1.24).

Indexing removable volumes in the main index

As of version 1.25.2, Recoll provides a simple way to ensure that the index data for an absent volume will not be purged. Two conditions must be met:

  • The volume mount point must be a member of the topdirs list.

  • The mount directory must be empty (when the volume is not mounted).

If recollindex finds that one of the topdirs is empty when starting up, any existing data for the tree will be preserved by the indexing pass (no purge for this area).

Self contained volumes

As of Recoll 1.24, it has become possible to build self-contained datasets including a Recoll configuration directory and index together with the indexed documents, and to move such a dataset around (for example copying it to an USB drive), without having to adjust the configuration for querying the index.

Note

This is a query-time feature only. The index must only be updated in its original location. If an update is necessary in a different location, the index must be reset.

The principle of operation is that the configuration stores the location of the original configuration directory, which must reside on the movable volume. If the volume is later mounted elsewhere, Recoll adjusts the paths stored inside the index by the difference between the original and current locations of the configuration directory.

To make a long story short, here follows a script to create a Recoll configuration and index under a given directory (given as single parameter). The resulting data set (files + recoll directory) can later to be moved to a CDROM or thumb drive. Longer explanations come after the script.

#!/bin/sh

fatal()
{
echo $*;exit 1
}
usage()
{
fatal "Usage: init-recoll-volume.sh <top-directory>"
}

test $# = 1 || usage
topdir=$1
test -d "$topdir" || fatal $topdir should be a directory

confdir="$topdir/recoll-config"
test ! -d "$confdir" || fatal $confdir should not exist

mkdir "$confdir"
cd "$topdir"
topdir=`pwd`
cd "$confdir"
confdir=`pwd`

(echo topdirs = '"'$topdir'"'; \
echo orgidxconfdir = $topdir/recoll-config) > "$confdir/recoll.conf"

recollindex -c "$confdir"

The examples below will assume that you have a dataset under /home/me/mydata/, with the index configuration and data stored inside /home/me/mydata/recoll-confdir.

In order to be able to run queries after the dataset has been moved, you must ensure the following:

  • The main configuration file must define the orgidxconfdir variable to be the original location of the configuration directory (orgidxconfdir=/home/me/mydata/recoll-confdir must be set inside /home/me/mydata/recoll-confdir/recoll.conf in the example above).

  • The configuration directory must exist with the documents, somewhere under the directory which will be moved. E.g. if you are moving /home/me/mydata around, the configuration directory must exist somewhere below this point, for example /home/me/mydata/recoll-confdir, or /home/me/mydata/sub/recoll-confdir.

  • You should keep the default locations for the index elements which are relative to the configuration directory by default (principally dbdir). Only the paths referring to the documents themselves (e.g. topdirs values) should be absolute (in general, they are only used when indexing anyway).

Only the first point needs an explicit user action, the Recoll defaults are compatible with the third one, and the second is natural.

If, after the move, the configuration directory needs to be copied out of the dataset (for example because the thumb drive is too slow), you can set the curidxconfdir, variable inside the copied configuration to define the location of the moved one. For example if /home/me/mydata is now mounted onto /media/me/somelabel, but the configuration directory and index has been copied to /tmp/tempconfig, you would set curidxconfdir to /media/me/somelabel/recoll-confdir inside /tmp/tempconfig/recoll.conf. orgidxconfdir would still be /home/me/mydata/recoll-confdir in the original and the copy.

If you are regularly copying the configuration out of the dataset, it will be useful to write a script to automate the procedure. This can't really be done inside Recoll because there are probably many possible variants. One example would be to copy the configuration to make it writable, but keep the index data on the medium because it is too big - in this case, the script would also need to set dbdir in the copied configuration.

The same set of modifications (Recoll 1.24) has also made it possible to run queries from a readonly configuration directory (with slightly reduced function of course, such as not recording the query history).