Here follows a sample request that we are going to explain:
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
This would search for all documents with
John Doe
appearing as a phrase in the author field (exactly what this is
would depend on the document type, e.g.: the
From:
header, for an email message),
and containing either beatles
or
lennon
and either
live
or
unplugged
but not
potatoes
(in any part of the document).
An element is composed of an optional field specification, and a value, separated by a colon (the field separator is the last colon in the element). Examples:
Eugenie
author:balzac
dc:title:grandet
dc:title:"eugenie grandet"
The colon, if present, means "contains". Xesam defines other relations, which are mostly unsupported for now (except in special cases, described further down).
All elements in the search entry are normally combined
with an implicit AND. It is possible to specify that elements be
OR'ed instead, as in Beatles
OR
Lennon
. The
OR
must be entered literally (capitals), and
it has priority over the AND associations:
word1
word2
OR
word3
means
word1
AND
(word2
OR
word3
)
not
(word1
AND
word2
) OR
word3
.
You can use parentheses to group elements (from version 1.21), which will sometimes make things clearer, and may allow expressing combinations which would have been difficult otherwise.
An element preceded by a -
specifies a
term that should not appear.
By default, words inside double-quotes define a phrase
search (the
order of words is significant), so
that title:"prejudice pride"
is not the same
as title:prejudice title:pride
, and is unlikely to find a
result. This can be changed by
using modifiers.
Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wildcard on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.
To save you some typing, Recoll versions 1.20 and later interpret a field value given as a comma-separated list of terms as an AND list and a slash-separated list as an OR list. No white space is allowed. So
author:john,lennon
will search for documents
with john
and lennon
inside
the author
field (in any order),
and
author:john/ringo
would search
for john
or ringo
. This behaviour is only triggered by
a field prefix: without it, comma- or slash- separated input will produce a phrase
search. However, you can use a text
field name to search the main text
this way, as an alternate to using an explicit OR
,
e.g. text:napoleon/bonaparte
would generate a search
for napoleon
or bonaparte
in the main
text body.
Modifiers can be set on a double-quote value, for example to specify
a proximity search (unordered). See
the modifier section.
No space must separate the final double-quote and the modifiers
value, e.g. "two one"po10
Recoll currently manages the following default fields:
title
,subject
orcaption
are synonyms which specify data to be searched for in the document title or subject.author
orfrom
for searching the documents originators.recipient
orto
for searching the documents recipients.keyword
for searching the document-specified keywords (few documents actually have any).filename
for the document's file name. You can use the shorterfn
alias. This value is not set for all documents: internal documents contained inside a compound one (for example an EPUB section) do not inherit the container file name any more, this was replaced by an explicit field (see next). Sub-documents can still have afilename
, if it is implied by the document format, for example the attachment file name for an email attachment.containerfilename
, aliased ascfn
. This is set for all documents, both top-level and contained sub-documents, and is always the name of the filesystem file which contains the data. The terms from this field can only be matched by an explicit field specification (as opposed to terms fromfilename
which are also indexed as general document content). This avoids getting matches for all the sub-documents when searching for the container file name.ext
specifies the file name extension (Ex:ext:html
).rclmd5
the MD5 checksum for the document. This is used for displaying the duplicates of a search result (when querying with the option to collapse duplicate results). Incidentally, this could be used to find the duplicates of any given file by computing its MD5 checksum and executing a query with just therclmd5
value.
You can define aliases for field names, in order to use your preferred denomination or
to save typing (e.g. the predefined fn
and cfn
aliases
defined for filename
and containerfilename
). See
the section about the fields
file.
The document input handlers have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.