Search case and diacritics sensitivity

For Recoll versions 1.18 and later, and when working with a raw index (not the default), searches can be sensitive to character case and diacritics. How this happens is controlled by configuration variables and what search data is entered.

The general default is that searches entered without upper-case or accented characters are insensitive to case and diacritics. An entry of resume will match any of Resume, RESUME, résumé, Résumé etc.

Two configuration variables can automate switching on sensitivity (they were documented but actually did nothing until Recoll 1.22):

autodiacsens

If this is set, search sensitivity to diacritics will be turned on as soon as an accented character exists in a search term. When the variable is set to true, resume will start a diacritics-unsensitive search, but résumé will be matched exactly. The default value is false.

autocasesens

If this is set, search sensitivity to character case will be turned on as soon as an upper-case character exists in a search term except for the first one. When the variable is set to true, us or Us will start a diacritics-unsensitive search, but US will be matched exactly. The default value is true (contrary to autodiacsens).

As in the past, capitalizing the first letter of a word will turn off its stem expansion and have no effect on case-sensitivity.

You can also explicitly activate case and diacritics sensitivity by using modifiers with the query language. C will make the term case-sensitive, and D will make it diacritics-sensitive. Examples:

        "us"C
      

will search for the term us exactly (Us will not be a match).

        "resume"D
      

will search for the term resume exactly (résumé will not be a match).

When either case or diacritics sensitivity is activated, stem expansion is turned off. Having both does not make much sense.