Fields
are named pieces of information
in or about documents, like title
,
author
, abstract
.
The field values for documents can appear in several ways
during indexing: either output by input handlers
as meta
fields in the HTML header section, or
extracted from file extended attributes, or added as attributes
of the Doc
object when using the API, or
again synthetized internally by Recoll.
The Recoll query language allows searching for text in a specific field.
Recoll defines a number of default fields. Additional
ones can be output by handlers, and described in the
fields
configuration file.
Fields can be:
indexed
, meaning that their terms are separately stored in inverted lists (with a specific prefix), and that a field-specific search is possible.stored
, meaning that their value is recorded in the index data record for the document, and can be returned and displayed with search results.
A field can be either or both indexed and stored. This and
other aspects of fields handling is defined inside the
fields
configuration file.
Some fields may also designated as supporting range queries, meaning that the results may be selected for an interval of its values. See the configuration section for more details.
The sequence of events for field processing is as follows:
During indexing, recollindex scans all
meta
fields in HTML documents (most document types are transformed into HTML at some point). It compares the name for each element to the configuration defining what should be done with fields (thefields
file)If the name for the
meta
element matches one for a field that should be indexed, the contents are processed and the terms are entered into the index with the prefix defined in thefields
file.If the name for the
meta
element matches one for a field that should be stored, the content of the element is stored with the document data record, from which it can be extracted and displayed at query time.At query time, if a field search is performed, the index prefix is computed and the match is only performed against appropriately prefixed terms in the index.
At query time, the field can be displayed inside the result list by using the appropriate directive in the definition of the result list paragraph format. All fields are displayed on the fields screen of the preview window (which you can reach through the right-click menu). This is independent of the fact that the search which produced the results used the field or not.
You can find more information in the
section about the fields
file,
or in comments inside the file.
You can also have a look at the example in the FAQs area, detailing how one could add a page count field to pdf documents for displaying inside result lists.