pdfocr
Attempt OCR of PDF files with no text content. This can be defined in subdirectories. The default is off because OCR is so very slow.
pdfoutline
Extract outlines and bookmarks from PDF documents (needs pdftohtml). This is not enabled by default because it is rarely needed, and the extra command takes a little time.
pdfattach
Enable PDF attachment extraction by executing pdftk (if available). This is normally disabled, because it does slow down PDF indexing a bit even if not one attachment is ever found.
pdfextrameta
Extract text from selected XMP metadata tags. This is a space-separated list of qualified XMP tag names. Each element can also include a translation to a Recoll field name, separated by a '|' character. If the second element is absent, the tag name is used as the Recoll field names. You will also need to add specifications to the "fields" file to direct processing of the extracted data.
pdfextrametafix
Define name of XMP field editing script. This defines the name of a script to be loaded for editing XMP field values. The script should define a 'MetaFixer' class with a metafix() method which will be called with the qualified tag name and value of each selected field, for editing or erasing. A new instance is created for each document, so that the object can keep state for, e.g. eliminating duplicate values.