As explained in the Indexes page, the eXtremeDB , Trigram (
trigram) indexes are ideal for text searches when the exact spelling of the target object is not precisely known. It finds objects which match the maximum number of three-character strings in the entered search terms, i.e., near matches. A classicaltreeindex allows exact match queries (eg.x='qqq'), range queries (eg.x >= 10 and x <= 100) and prefix match queries (eg.x >= 'abc’).However, for more complex regular expressions like
“%xyz%”atreeindex would require a sequential search, which is significantly slower. In this case atrigramindex can be optimal.The idea of
trigramindex is rather simple. Assume that we have a string key'qwerty'. The string is split into trigrams: sequences of three subsequent characters:"qwe", "wer", "ert", "rty". These trigrams are stored in a normaltreeindex. Now consider a query looking for the substring"wert". The query condition is also split into trigrams:"wer"and"ert". The search then is looking for these trigrams in the index. The result of the lookup is two lists of objects -- so called inverse lists. The next step is to join these lists: i.e. to find objects which are present in both lists.There is no guarantee that the objects that were found through this algorithm actually contain the
"wert"substring yet. For example, it could be"werrert". So the next step in the lookup process is to recheck that the object really matches the given regular expression.The main limitation of the
trigramindex is clear from the description above: it cannot be used for substrings shorter than 3. For example the pattern'%01%'requires a sequential search even if thetrigramindex is present.The
trigramindex can be used to locate objects containing a specified substring and regular expression matching; similar to theLIKEoperator in SQL. In the case of theLIKEoperator the runtime just extracts the longest substring from the pattern excluding wildcards and performs a search for this substring, and then performs a regular expression match for all selected objects. It is also possible to use thetrigramindex for an exact match lookup, especially if there are no other indexes declared for the data set.