Parameters for OCR processing

ocrprogs

OCR modules to try. The top OCR script will try to load the corresponding modules in order and use the first which reports being capable of performing OCR on the input file. Modules for tesseract (tesseract) and ABBYY FineReader (abbyy) are present in the standard distribution. For compatibility with the previous version, if this is not defined at all, the default value is "tesseract". Use an explicit empty value if needed. A value of "abbyy tesseract" will try everything.

ocrcachedir

Location for caching OCR data. The default if this is empty or undefined is to store the cached OCR data under $RECOLL_CONFDIR/ocrcache.

tesseractlang

Language to assume for tesseract OCR. Important for improving the OCR accuracy. This can also be set through the contents of a file in the currently processed directory. See the rclocrtesseract.py script. Example values: eng, fra... See the tesseract documentation.

tesseractcmd

Path for the tesseract command. Do not quote. This is mostly useful on Windows, or for specifying a non-default tesseract command. E.g. on Windows. tesseractcmd = C:/ProgramFiles(x86)/Tesseract-OCR/tesseract.exe

abbyylang

Language to assume for abbyy OCR. Important for improving the OCR accuracy. This can also be set through the contents of a file in the currently processed directory. See the rclocrabbyy.py script. Typical values: English, French... See the ABBYY documentation.

abbyyocrcmd

Path for the abbyy command The ABBY directory is usually not in the path, so you should set this.