General syntax

Here follows a sample request that we are going to explain:

        author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
      

This would search for all documents with John Doe appearing as a phrase in the author field (exactly what this is would depend on the document type, e.g.: the From: header, for an email message), and containing either beatles or lennon and either live or unplugged but not potatoes (in any part of the document).

An element is composed of an optional field specification, and a value, separated by a colon (the field separator is the last colon in the element). Examples:

  • Eugenie
  • author:balzac
  • dc:title:grandet
  • dc:title:"eugenie grandet"

The colon, if present, means "contains". Xesam defines other relations, which are mostly unsupported for now (except in special cases, described further down).

All elements in the search entry are normally combined with an implicit AND. It is possible to specify that elements be OR'ed instead, as in Beatles OR Lennon. The OR must be entered literally (capitals), and it has priority over the AND associations: word1 word2 OR word3 means word1 AND (word2 OR word3) not (word1 AND word2) OR word3.

You can use parentheses to group elements (from version 1.21), which will sometimes make things clearer, and may allow expressing combinations which would have been difficult otherwise.

An element preceded by a - specifies a term that should not appear.

By default, words inside double-quotes define a phrase search (the order of words is significant), so that title:"prejudice pride" is not the same as title:prejudice title:pride, and is unlikely to find a result. This can be changed by using modifiers.

Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wildcard on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.

To save you some typing, Recoll versions 1.20 and later interpret a field value given as a comma-separated list of terms as an AND list and a slash-separated list as an OR list. No white space is allowed. So

author:john,lennon

will search for documents with john and lennon inside the author field (in any order), and

author:john/ringo

would search for john or ringo. This behaviour is only triggered by a field prefix: without it, comma- or slash- separated input will produce a phrase search. However, you can use a text field name to search the main text this way, as an alternate to using an explicit OR, e.g. text:napoleon/bonaparte would generate a search for napoleon or bonaparte in the main text body.

Modifiers can be set on a double-quote value, for example to specify a proximity search (unordered). See the modifier section. No space must separate the final double-quote and the modifiers value, e.g. "two one"po10

Recoll currently manages the following default fields:

  • title, subject or caption are synonyms which specify data to be searched for in the document title or subject.

  • author or from for searching the documents originators.

  • recipient or to for searching the documents recipients.

  • keyword for searching the document-specified keywords (few documents actually have any).

  • filename for the document's file name. You can use the shorter fn alias. This value is not set for all documents: internal documents contained inside a compound one (for example an EPUB section) do not inherit the container file name any more, this was replaced by an explicit field (see next). Sub-documents can still have a filename, if it is implied by the document format, for example the attachment file name for an email attachment.

  • containerfilename, aliased as cfn. This is set for all documents, both top-level and contained sub-documents, and is always the name of the filesystem file which contains the data. The terms from this field can only be matched by an explicit field specification (as opposed to terms from filename which are also indexed as general document content). This avoids getting matches for all the sub-documents when searching for the container file name.

  • ext specifies the file name extension (Ex: ext:html).

  • rclmd5 the MD5 checksum for the document. This is used for displaying the duplicates of a search result (when querying with the option to collapse duplicate results). Incidentally, this could be used to find the duplicates of any given file by computing its MD5 checksum and executing a query with just the rclmd5 value.

You can define aliases for field names, in order to use your preferred denomination or to save typing (e.g. the predefined fn and cfn aliases defined for filename and containerfilename). See the section about the fields file.

The document input handlers have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.