Apart from the main text content, documents usually aggregate other data elements, such
as an author names, dates, abstracts, etc. These are usually called metadata
elements because they qualify or describe the data rather than being part of it. Recoll has a
slightly more general notion of field
to mean any named piece of
data associated with a document.
Fields are extracted by the document handlers when processing a document and further used by Recoll for searching or displaying results.
Some fields, like e.g. a file modification time, have a strict and predefined
usage. For most fields though, the processing is entirely configurable and defined in the
fields
configuration file
Fields have two main processing options (at least one of which will be set if they are processed at all):
Their content can be indexed. This makes them searchable.
Their content can be stored in the index as document attribute data. This makes them displayable as part of a result list entry.
These options are preset in the default fields
file for common
elements like a title or an author name.
The terms from indexed fields are stored in the inverted index with a specific prefix,
which makes them searchable by specifying the field name
(e.g. author
:Balzac
). The terms can optionally
also be used for the main index section to provide hits for non-prefixed searches. This is
decided by an attribute in the fields
file.
In most cases, field data is provided by the document itself, for example, by HTML
<meta>
elements. They can also be obtained from other sources, this
is described in the following section.