The Recoll WebUI offers an alternative, WEB-based, interface for querying a Recoll index.
The koniu repository on GitHub has not been updated for some time, and you should now use the git clone on framagit.org.
The WebUI can be quite useful to extend the use of a shared index to multiple workstations, without the need for a local Recoll installation and shared data storage.
The Recoll WebUI is based on the Bottle Python framework.
The default setup of the standalone script is now to rely on the waitress
Python WSGI server,
which can handle several simultaneous requests and will probably have acceptable performance in most
cases. This depends on the waitress python3 module.
It is still possible to run the WEBUI on the Bottle internal HTTP server, by editing the startup script. However the built-in server is restricted to handling one request at a time, which is problematic in multi-user situations, especially because some requests, like extracting a result list into a CSV file, can take a significant amount of time.
In multi-user situations, you may get better performance and ease of use from the Recoll WebUI by running it under Apache rather than as a standalone process. With this approach, a few requests per second can easily be handled even in the presence of long-running ones, and you can use the Apache access control features.
However, neither Recoll nor the WebUI are optimized for high multi-user load, and it would probably be unwise to use them as the search interface to a busy WEB site.
The instructions about using the WebUI under Apache as given in the repository README are a bit terse, and are missing a few details.
Here follow the synopses of two WebUI installations on initially Apache-less Ubuntu (14.04) and DragonFly BSD systems. The first should extend easily to other Debian-based systems, the second at least to FreeBSD. rpm-based systems are left as an exercise to the reader, at least for now…
I am not checking these instructions very often, and you may have to change some details related to packages version numbers.
Caution
|
THE CONFIGURATIONS DESCRIBED HAVE NO ACCESS CONTROL. ANYONE WITH ACCESS TO THE NETWORK WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY DOCUMENT. |
Access control is feasible, but no instructions are given here as I am not competent.
Apache
On a Debian/Ubuntu system
Install recoll
sudo apt-get install recoll python3-recoll
Configure the indexing and check that the normal search works (I spent quite a lot of time trying to understand why the WebUI did not work, when in fact it was the normal recoll configuration which was broken and the regular search did not work either).
Take care to be logged in as the user you want to run the web search as while you do this.
Install the WebUI
Clone the github repository, or extract the master tar installation, and move it to '/var/www/recoll-webui-master/'. Take care that it is read/execute accessible by your user.
Install Apache and mod-wsgi
sudo apt-get install apache2 libapache2-mod-wsgi-py3
I then got the following message:
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using 127.0.1.1. Set the 'ServerName' directive globally to suppress this message
To clear it, I added a ServerName directive to the Apache config, maybe you won’t need it. Edit '/etc/apache2/sites-available/000-default.conf' and add the following at the top (globally). Things work without this fix anyway, this is just to suppress the error message. You probably need to adjust the address or use a real host name:
ServerName 192.168.4.6
Edit '/etc/apache2/mods-enabled/wsgi.conf', add the following at the end of the "IfModule" section.
Change the user ('dockes' in the example) taking care that he is the one who owns the index ('.recoll' is in his home directory).
WSGIDaemonProcess recoll user=dockes group=dockes \ threads=1 processes=5 display-name=%{GROUP} \ python-path=/var/www/recoll-webui-master WSGIScriptAlias /recoll /var/www/recoll-webui-master/webui-wsgi.py <Directory /var/www/recoll-webui-master> WSGIProcessGroup recoll Require all granted </Directory>
The Require line would have been the following with apache 2.2
Order allow,deny allow from all
Again: please do not take any hint about security from this document.
You can use SetEnv
directives with RECOLL_CONFDIR
and RECOLL_EXTRACONFDIRS
variable names inside <Directory>
sections to set up multiple indexes on
multiple URLs or query additional indexes from a single one. You need webui
code from 2022-06-01 or newer for this to work.
You can use a Setenv directive inside your Directory section to set the configuration directory with RECOLL_CONFDIR:
<Directory /var/www/recoll-webui-master> WSGIProcessGroup recoll Require all granted SetEnv RECOLL_CONFDIR /path/to/my/configdir </Directory>
All the directories in the path must be accessible to the user/group apache uses, which may not be the case if you are using your own configuration directory ($HOME is usually not be browsable by "other").
Another possibility is to set the corresponding os.environ
values by editing webui-wsgi.py
(see
the comments in there, which works with all versions.
Note
|
the Recoll WebUI application is mostly single-threaded, so it is of little use (and may actually be counter-productive in some cases) to specify multiple threads on the WSGIDaemonProcess line. Specify multiple processes instead to put multiple CPUs to work on simultaneous requests. |
Then run the following to restart Apache:
sudo apachectl restart
The Recoll WebUI should now be accessible. on 'http://my.server.com/recoll/'
Note
|
Take care that you need a '/' at the end of the URL used to access the search (use: 'http://my.server.com/recoll/', not 'http://my.server.com/recoll'), else files other than the script itself are not found (the page looks weird and the search does not work). |
Caution
|
THERE IS NO ACCESS CONTROL. ANYONE WITH ACCESS TO THE NETWORK WHERE THE SERVER IS LOCATED CAN RETRIEVE ANY DOCUMENT. |
Apache Variant for BSD/ports
Packages
As root:
pkg install recoll
Do what you need to do to configure the indexing and check that the normal search works.
Take care to be logged in as the user you want to run the web search as while you do this.
Then install apache. You may have to adjust the version number.
pkg install apache24
Add apache24_enable="YES" in /etc/rc.conf
pkg install www/mod_wsgi pkg install git
The package may be named ap24-mod_wsgi4 depending on the system.
On FreeBSD, you can also use:
cd /usr/ports/www/mod_wsgi4/ && make install clean
Thanks to D.Gessel for pointing out the errors in the previous version of this document.
Clone the webui repository
cd /usr/local/www/apache24/ git clone https://github.com/koniu/recoll-webui.git recoll-webui-master
Important: most input handler helper applications (e.g. 'pdftotext') are installed in '/usr/local/bin' which is not in the PATH as seen by Apache (at least on DragonFly). The simplest way to fix this is to modify the launcher module for the webui app so that it fixes the PATH.
Edit 'recoll-webui-master/webui-wsgi.py' and add the following line after the 'import os' line:
os.environ['PATH'] = os.environ['PATH'] + ':' + '/usr/local/bin'
Configure Apache
Edit /usr/local/etc/apache24/modules.d/270_mod_wsgi.conf
Uncomment the LoadModule line, and add the directives to alias /recoll/ to the webui script.
Change the user (dockes in the example) taking care that he is the one who owns the index (.recoll is in his home directory).
Contents of the file:
## $FreeBSD$ ## vim: set filetype=apache: ## ## module file for mod_wsgi ## ## PROVIDE: mod_wsgi ## REQUIRE:
LoadModule wsgi_module libexec/apache24/mod_wsgi.so
WSGIDaemonProcess recoll user=dockes group=dockes \ threads=1 processes=5 display-name=%{GROUP} \ python-path=/usr/local/www/apache24/recoll-webui-master/ WSGIScriptAlias /recoll /usr/local/www/apache24/recoll-webui-master/webui-wsgi.py
<Directory /usr/local/www/apache24/recoll-webui-master> WSGIProcessGroup recoll Require all granted </Directory>
Restart Apache
As root:
apachectl restart