OCR for image documents

As of Recoll 1.43.3, the alternate Python rclimg.py handler can execute OCR on image files. The default image handler is the Perl-based rclimg script and has not been OCR-enabled. So, for performing image OCR, you need to tell Recoll to use the alternate handler and also to enable OCR by setting the imgocr variable.

If you are running an older Recoll release, you can grab an up to date copy of rclimg.py from the git repository. You will have to copy it to the Recoll filters/ directory and make it executable. The script needs to run from the installation directory because of how it runs the OCR script.

Example configuration:

In $RECOLL_CONFDIR/mimeconf (e.g. ~/.recoll/mimeconf):

[index]
image/gif = execm rclimg.py
image/jp2 = execm rclimg.py
image/jpeg = execm rclimg.py
image/png = execm rclimg.py
image/tiff = execm rclimg.py
image/x-nikon-nef = execm rclimg.py
image/x-xcf = execm rclimg.py

Of course you can also only use a subset of the image types.

In $RECOLL_CONFDIR/recoll.conf:

ocrprogs = tesseract
tesseractlang = eng
[/path/to/my/images/directory]          
imgocr = 1