Installing over an older version and other notes
|
Note
|
Ubuntu commands installed as snap packages can’t create arbitrary files under /tmp. If some of
the external commands used for indexing are snaps, for best results, set TMPDIR to a location which
belongs to you (e.g. inside your home, with something like export TMPDIR=~/tmp in your shell
startup script). Recoll could conceivably work around the problem all by itself, but I find it in
bad taste to create temporary files in an arbitrary location inside your home.
|
1.20 to 1.43 indexes are fully backward compatible. No need to reindex when upgrading.
Always reset the index if you do not know by which version it was created (e.g.: you’re not sure
it’s at least 1.18). The best method is to quit all Recoll programs and delete the index directory
(rm -rf ~/.recoll/xapiandb), then start recoll or recollindex.
recollindex -z will do the same in most, but not all, cases. It’s better to use the rm method,
which will also ensure that no debris from older releases remain (e.g.: old stemming files which are
not used any more).
On Windows, the index is located by default in C:/Users/[yourlogin]/AppData/Local/Recoll/xapiandb
Case/diacritics sensitivity is off by default. It can be turned on only by editing recoll.conf ( see the manual). If you do so, you must then reset the index.
Changes in Recoll 1.43.0
-
Allow suspending the real time indexer when the system is running on battery power. This is controlled by the
suspendonbatteryconfiguration parameter. -
Use maxmemberkbs to limit the size of documents returned by exec handlers (it was already used for execm).
-
Use textfilemaxmbs to limit the size of an email body text (for avoiding issues with pathological email archives).
-
Added two safeguard parameters for the max sizes of stored text and metadata: maxdbdatarecordkbs and maxdbstoredtextmbs.
-
Add option to produce num-dash spans like 123-456-789.
-
GUI: clean up qss and css style processing. Now always base + optional dark + optional user.
-
Allow escaping wildcard characters in some cases.
-
MacOS webarchive format support.
-
Fix result table csv export: it exported too few documents.
-
Fix backslashasletter.
Minor releases at a glance
-
1.43.16
-
Windows: the test for file system occupation was inverted.
-
When using soffice to extract text, use a separate UserInstallation directory to avoid interfering with interactive sessions.
-
GUI: the result list date format preference was not saved.
-
GUI: add file size filter tool, and option to not show it.
-
GUI: fix simple search mode start preference (was not actually saved).
-
GUI: add option to open the query fragments tool when starting.
-
GUI: add help menu entry to show the query language cheatsheet.
-
Fix the OpenAI whisper interface module.
-
appimage: arrange to allow running recollindex instead of the GUI (RECOLL_APPIMAGE_COMMAND environment variable
-
Add configuration parameter with list of MIME wildcard exprs for MIME types which should use the stored text for previewing instead of re-extracting the text.
-
-
1.43.14
-
Python: queries would lose one Db object when iternext hit the end of results. Long running processes like the WEBUI would end up getting random open errors when the process ran out of descriptors.
-
rcldoc.py ms-word handler: retry misidentified wordperfect documents.
-
Image OCR: tesseract: add gif to supported formats.
-
-
1.43.13
-
GUI: interface languages: add English so that it can be forced in non-english locales.
-
GUI: result table: add progress indicator when saving big result lists to CSV.
-
GUI: semantic query interface code is enabled by default, menu entries appear at run time depending on local configuration.
-
Indexer: optimize getting entries attributes while walking a directory. Marginal performance improvement.
-
Mbox: also build the mbox cache during preview. Helps with external indexes, for which the local cache does not exist.
-
Tar handler: use filename and modification date from the tar entries.
-
Windows: use %USERPROFILE% as initial value of topdirs list. Less confusing than ~
-
Windows: use the Perl rclimg.exe instead of rclimg.py: much faster. The Python version is only used for image OCR, which is rarely needed.
-
Windows: add help menu entry to check the up to date version on the Recoll Web site (by starting a browser on the page).
-
GSSP 1.1.4: got rid of useless additional .desktop file which appeared in app searches. Now uses Recoll’s.
-
-
1.43.12
-
Python API: qresultstore: use surrogateescape for decoding urls.
-
rclaudio: avoid duplicating values when adding data to fields.
-
GUI: configuration switching: use a more standard dialog.
-
A version 1.43.11 briefly existed, with an rclpdf.py bug.
-
-
1.43.10
-
GUI: fix crash occuring on certain platforms (linux/qt6) when clicking one of the "Next" links inside the result list.
-
GUI: fixed bug which sometimes prevented using a phrase as search input to the external viewer.
-
Query language: accept empty query with "all documents" meaning. Avoids the "mime:*" hack when used from the python module or recollq.
-
rclrunsoffice.py: don’t generate exception when soffice is not found, just exit in error. Avoids system popups.
-
Indexer: avoid a hard dependency on the Perl JSON module, it’s only necessary for image OCR which is not enabled by default (rclimgp.py)
-
Merged the interface to the LLM embedding test scripts as inactive code. use meson -Dsemantic=true to activate. See this page for more details.
-
-
1.43.9
-
An 1.43.8 existed very briefly, and was replaced because of a spurious debugging log message at error level.
-
Indexer: fix thread safety issue leading to rare crashes.
-
opendoc-flat files: fix image hex strings being indexed by excluding <office:binary-data>.
-
Indexer: .ods: very big spreadsheets could cause huge memory usage in the XML parser, possibly leading to crashes. The configuration has been changed so that libreoffice spreadsheets bigger than 3000 KB (adjustable) are now processed by the soffice command, if present, else, not indexed.
-
Configuration: new "sofficecmd" parameter to point to the program if it is not found in the PATH.
-
Configuration: tesseract: allow adding default parameters to the command string.
-
Configuration: the parameter names are now case-insensitive. Only sections defined by paths remain case-sensitive on unix-like systems.
-
Configuration: Accept wildcards in indexedmimetypes and excludedmimetypes lists.
-
Windows: enable using OCR on image (non-PDF) files.
-
GUI: Linux: improve the real time indexing autostart management tool to work on non-default configuration directories.
-
-
1.43.7
-
GUI: updated SingleApplication to 3.5.4 for license change necessitated for Debian inclusion.
-
GUI: menu entry "query→ext dialog" was showing the wrong tab.
-
GUI: preview: work around webengine setHtml() 2Mb limit.
-
openxml indexing: properly process whitespace in Word docs with revisions/tracks.
-
-
1.43.6
-
Improve the code using xslt to process xml-based formats to process multiple body or meta members: allows processing pptx, vsdx and accessing numbers in multiple xlsx sheets.
-
recollindex --stop and --help options.
-
Fix crash caused by write beyond buffer when listing extended attributes on xxBSD
-
GUI: view actions editor: memorize the window and column sizes.
-
KDE krunner: remove spurious stderr messages.
-
-
1.43.5
-
GUI: Improve speed of building directory side filter contents.
-
Use locked global libmagic instance. Better performance and fixes crash on Mac OS ARM.
-
New configuration variable to add timestamps to log lines
-
GUI preview: use the common highlight color.
-
GUI webengine: improve blocking of chromium engine possible external requests.
-
-
1.43.4
-
GUI: HTML preview in dark mode: do not use a dark background by default, there is no way that this can work with many documents CSS, and often results in dark text on dark background. Add option to restore the previous behaviour.
-
MS-Word handler: when antiword fails on a small .doc, fall back to soffice, then only to wvWare, which is obsolete. Very few files trigger this anyway. Another possibility would have been to use abiword itself, which might be faster.
-
Added code to process Mac OS webarchive format on other platforms.hanks to Mansour Alghamdi. Needs the plistlib and bs4 Python3 packages.
-
Add .jp2 to the image formats considered for OCR (when enabled).
-
Fixed/improved the computation of the directory side filter entries.
-
PDF: added option to force OCR on files which are not pure images.
-
Handle unencoded utf8 email headers.
-
-
1.43.3
-
GUI: viewer execution: add %S to substitute with phrase if the query was a simple single phrase. Else substitutes as %s
-
GUI: preferences fix duplication of language list in interface language choice.
-
GUI: add alt+S shortcut to open the advanced search dialog (use ^W to close)
-
GUI: change the "Open parent" operation to desktop default instead of Dolphin. Using Dolphin when available allows highlighting the doc entry, but it was of course an issue on non KDE desktops.
-
Zip handler: detect encoding of metadata element when possible. Uses chardet.
-
GUI: result table save to CSV: improve performance for very big lists.
-
Repurpose the currently unused rclimg.py to perform OCR on image files. If rclimg.py is specified for image files in mimeconf (instead of the Perl+exiftool default rclimg), and if the imgocr recoll.conf parameter is true for the current location, OCR will be performed and the output used as document main text.
-
-
1.43.2
-
Fix static initialisation order bug causing an immediate crash with some gcc 15 builds.
-
-
1.43.1
-
Change PDF attachment indexing to using pdfdetach from
poppler-utils. No more need for the java-based pdftk. PDF attachment indexing is now enabled by default. -
Windows: automatically install VC_redist if needed.
-
Apple pages handler: tweak options to allow running headless.
-
Fix broken "remember sort state" feature.
-
Add default configuration for processing LISP files (as text).
-
Fix preview text mode bad formatting after the first chunk.
-
