The connect() function connects to one or several Recoll
index(es) and returns a Db object.
This call initializes the recoll module, and it should always be performed before any other call or object creation.
confdirdesignates the main index configuration directory. The usual system-dependant defaults apply if the value is empty.extra_dbsis a list of additional external indexes (Xapian directories). These will be queried, but supply no configuration values.writabledecides if we can index new data through this connection.
Example:
from recoll import recoll # Opening the default db db = recoll.connect() # Opening the default db and a pair of additional indexes db = recoll.connect(extra_dbs=["/home/me/.someconfdir/xapiandb", "/data/otherconf/xapiandb"])
A Db object is created by a connect()
call and holds a connection to a Recoll index.
- Db.query(), Db.cursor()
These (synonym) methods return a blank
Queryobject for this index.- Db.getdoc(udi, idxidx=0)
Retrieve a document given its unique document identifier, and its index if external indexes are in use. The main index is always index 0. The
udivalue could have been obtained from an earlier query as doc.rcludi, or would be known because the application is the indexer and generates the values.- Db.termMatch(match_type, expr, field='', maxlen=-1, casesens=False, diacsens=False, lang='english')
Expand an expression against the index term list. Performs the basic function from the GUI term explorer tool.
match_typecan be one ofwildcard,regexporstem.field, if set, restricts the matches to the contents of the specified metadata field. Returns a list of terms expanded from the input expression.- Db.setAbstractParams(maxchars, contextwords)
Set the parameters used to build snippets (sets of keywords in context text fragments).
maxcharsdefines the maximum total size of the abstract.contextwordsdefines how many terms are shown around the keyword.- Db.close()
Closes the connection. You can't do anything with the
Dbobject after this. If the index was opened as writable, this commits any pending change.- Db.setSynonymsFile(path)
Set the synonyms file used when querying.
A Query object (equivalent to a cursor in the Python DB API)
is created by a Db.query() call. It is used to execute index
searches.
- Query.sortby(fieldname, ascending=True)
Set the sorting order for future searches to using
fieldname, in ascending or descending order. Must be called before executing the search.- Query.execute(query_string, stemming=1, stemlang="english", fetchtext=False, collapseduplicates=False)
Start a search for
query_string, a Recoll search language string. If the index stores the documents texts andfetchtextis True, theDocobjects in the query result will store the document extracted text in doc.text. Else, the doc.text fields will be empty. Ifcollapseduplicatesis true, only one of multiple identical documents (defined by having the same MD5 hash) will appear in the result list.- Query.executesd(SearchData, fetchtext=False, collapseduplicates=False)
Starts a search for the query defined by the
SearchDataobject. See above for a description of the other parameters.- Query.fetchmany(size=query.arraysize)
Fetch the next
Docobjects from the current search result list, and return them as an array of the required size, which is by default the value of the arraysize data member.- Query.fetchone()
Fetch the next
Docobject from the current search result list. Generates aStopIterationexception if there are no results left.- Query.__iter__() and Query.next()
So that things like
for doc in query:will work. Example:from recoll import recoll db = recoll.connect() q = db.query() nres = q.execute("some query") for doc in q: print("%s" % doc.title)- Query.close()
Close the query. The object is unusable after the call.
- Query.scroll(value, mode='relative')
Adjust the position in the current result set.
modecan berelativeorabsolute.- Query.getgroups()
Retrieve the expanded query terms as a list of pairs. Meaningful only after executexx In each pair, the first entry is a list of user terms (of size one for simple terms, or more for group and phrase clauses), the second a list of query terms derived from the user terms and used in the Xapian Query.
- Query.getxquery()
Return the Xapian query description as a Unicode string. Meaningful only after executexx.
- Query.highlight(text, ishtml = 0, methods = object)
Will insert
<span "class=rclmatch">, and</span>tags around the match areas in the input text and return the modified text.ishtmlcan be set to indicate that the input text is HTML and that HTML special characters should not be escaped.methods, if set, should be an object having methodsstartMatch(i)andendMatch()which will be called for each match and should return a begin and end tag. Example:class MyHighlighter: def startMatch(self, idx): return "<span style='color:red;background:yellow;'>" def endMatch(self): return "</span>"- Query.makedocabstract(doc, methods = object))
Create a snippets abstract for
doc(aDocobject) by selecting text around the match terms. If methods is set, will also perform highlighting. See thehighlight()method.- Query.getsnippets(doc, maxoccs = -1, ctxwords = -1, sortbypage=False, methods=object)
Return a list of extracts from the result document by selecting text around the match terms. Each entry in the result list is a triple: page number, term, text. By default, the most relevants snippets appear first in the list. Set
sortbypageto sort by page number instead. Ifmethodsis set, the fragments will be highlighted (see thehighlight()method). Ifmaxoccsis set, it defines the maximum result list length.ctxwordsallows adjusting the individual snippet context size.
- Query.arraysize
(r/w). Default number of records processed by
fetchmany().- Query.rowcount
Number of records returned by the last execute.
- Query.rownumber
Next index to be fetched from results. Normally increments after each
fetchone()call, but can be set/reset before the call to effect seeking (equivalent to usingscroll()). Starts at 0.
A Doc object contains index data for a given document. The
data is extracted from the index when searching, or set by the indexer program when
updating.
Please note that a Doc should never be instanciated by its
constructor but instead by calling db.doc() or some other API method
returning a doc object. Otherwise, the object will lack some necessary references.
The Doc object has many attributes to be read or set by its user. It mostly
matches the Rcl::Doc C++ object. Some of the attributes are predefined, but, especially
when indexing, others can be set, the name of which will be processed as field names by
the indexing configuration. Inputs can be specified as Unicode or strings. Outputs are
Unicode objects. All dates are specified as Unix timestamps, printed as strings. Please
refer to the rcldb/rcldoc.cpp C++ file for a full description of the
predefined attributes. Here follows a short list.
urlthe document URL but see alsogetbinurl()ipaththe documentipathfor embedded documents.fbytes, dbytesthe document file and text sizes.fmtime, dmtimethe document file and document times.xdocidthe document Xapian document ID. This is useful if you want to access the document through a direct Xapian operation.mtypethe document MIME type.textholds the document processed text, if the index itself is configured to store it (true by default) and if thefetchtextqueryexecute()option was true. See also therclextractmodule for accessing document contents.Other fields stored by default:
author,filename,keywords,recipient
At query time, only the fields that are defined as stored either
by default or in the fields configuration file will be meaningful in
the Doc object.
- get(key), [] operator
Retrieve the named document attribute. You can also use
getattr(doc, key)ordoc.key.- doc.key = value
Set the the named document attribute. You can also use
setattr(doc, key, value).- getbinurl()
Retrieve the URL in byte array format (no transcoding), for use as parameter to a system call. This is useful for the filesystem indexer
file://URLs which are stored unencoded, as binary data.- setbinurl(url)
Set the URL in byte array format (no transcoding).
- items()
Return a dictionary of doc object keys/values
- keys()
list of doc object keys (attribute names).
A SearchData object allows building a query by combining clauses,
for execution by Query.executesd(). It can be used in replacement of
the query language approach. The interface is going to change a little, so no detailed doc
for now...
- addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', qstring=string, slack=0, field='', stemming=1, subSearch=SearchData)

