As of Recoll 1.43.3, the alternate Python rclimg.py handler can
execute OCR on image files. The default image handler is the Perl-based
rclimg script and has not been OCR-enabled. So, for performing image
OCR, you need to tell Recoll to use the alternate handler and also to enable OCR by setting
the imgocr
variable.
If you are running an older Recoll release, you can grab an up to date copy of
rclimg.py
from the git
repository. You will have to copy it to the Recoll filters/
directory and make it executable. The script needs to run from the installation directory
because of how it runs the OCR script.
Example configuration:
In $RECOLL_CONFDIR/mimeconf
(e.g. ~/.recoll/mimeconf
):
[index] image/gif = execm rclimg.py image/jp2 = execm rclimg.py image/jpeg = execm rclimg.py image/png = execm rclimg.py image/tiff = execm rclimg.py image/x-nikon-nef = execm rclimg.py image/x-xcf = execm rclimg.py
Of course you can also only use a subset of the image types.
In $RECOLL_CONFDIR/recoll.conf
:
ocrprogs = tesseract tesseractlang = eng [/path/to/my/images/directory] imgocr = 1