Fields and metadata

Apart from the main text content, documents usually aggregate other data elements, such as an author names, dates, abstracts, etc. These are usually called metadata elements because they qualify or describe the data rather than being part of it. Recoll has a slightly more general notion of field to mean any named piece of data associated with a document.

Fields are extracted by the document handlers when processing a document and further used by Recoll for searching or displaying results.

Some fields, like e.g. a file modification time, have a strict and predefined usage. For most fields though, the processing is entirely configurable and defined in the fields configuration file

Fields have two main processing options (at least one of which will be set if they are processed at all):

  • Their content can be indexed. This makes them searchable.

  • Their content can be stored in the index as document attribute data. This makes them displayable as part of a result list entry.

These options are preset in the default fields file for common elements like a title or an author name.

The terms from indexed fields are stored in the inverted index with a specific prefix, which makes them searchable by specifying the field name (e.g. author:Balzac). The terms can optionally also be used for the main index section to provide hits for non-prefixed searches. This is decided by an attribute in the fields file.

In most cases, field data is provided by the document itself, for example, by HTML <meta> elements. They can also be obtained from other sources, this is described in the following section.