It must be noted that, if modifying the files (or a copy) is acceptable, then using OCRmyPDF to add a text layer to the PDF itself is a better solution than using the Recoll OCR feature: e.g. allowing Recoll to position the PDF viewer on the search target when opening the document, and permitting secondary search in the native tool.
The Recoll OCR is enabled by the pdfocr
configuration variable, and
will only be executed if the processed file has no text content.
Example configuration fragment in recoll.conf
:
pdfocr = 1 ocrprogs = tesseract tesseractlang = eng
The pdfocr
variable can be set globally or for specific
subtrees.