As of Recoll version 1.18 you have a choice of building an
index with terms stripped of character case and diacritics, or
one with raw terms. For a source term of
Résumé
, the former will store
resume
, the latter
Résumé
.
Each type of index allows performing searches insensitive to case and diacritics: with a raw index, the user entry will be expanded to match all case and diacritics variations present in the index. With a stripped index, the search term will be stripped before searching.
A raw index allows using case and diacritics to discriminate
between terms, e.g., returning different results when searching for
US
and us
or
resume
and résumé
.
Read the
section about search case and diacritics sensitivity
for more details.
The type of index to be created is controlled by the
indexStripChars
configuration
variable which can only be changed by editing the
configuration file. Any change implies an index reset (not
automated by Recoll), and all indexes in a search must be set
in the same way (again, not checked by Recoll).
Recoll creates a stripped index by default if
indexStripChars
is not set.
As a cost for added capability, a raw index will be slightly bigger than a stripped one (around 10%). Also, searches will be more complex, so probably slightly slower, and the feature is relatively little used, so that a certain amount of weirdness cannot be excluded.
One of the most adverse consequence of using a raw index is that some phrase and proximity searches may become impossible: because each term needs to be expanded, and all combinations searched for, the multiplicative expansion may become unmanageable.