As explained in the Indexes page, the eXtremeDB , Trigram (
trigram
) indexes are ideal for text searches when the exact spelling of the target object is not precisely known. It finds objects which match the maximum number of three-character strings in the entered search terms, i.e., near matches. A classicaltree
index allows exact match queries (eg.x='qqq'
), range queries (eg.x >= 10 and x <= 100
) and prefix match queries (eg.x >= 'abc’
).However, for more complex regular expressions like
“%xyz%”
atree
index would require a sequential search, which is significantly slower. In this case atrigram
index can be optimal.The idea of
trigram
index is rather simple. Assume that we have a string key'qwerty'
. The string is split into trigrams: sequences of three subsequent characters:"qwe", "wer", "ert", "rty"
. These trigrams are stored in a normaltree
index. Now consider a query looking for the substring"wert"
. The query condition is also split into trigrams:"wer"
and"ert"
. The search then is looking for these trigrams in the index. The result of the lookup is two lists of objects -- so called inverse lists. The next step is to join these lists: i.e. to find objects which are present in both lists.There is no guarantee that the objects that were found through this algorithm actually contain the
"wert"
substring yet. For example, it could be"werrert"
. So the next step in the lookup process is to recheck that the object really matches the given regular expression.The main limitation of the
trigram
index is clear from the description above: it cannot be used for substrings shorter than 3. For example the pattern'%01%'
requires a sequential search even if thetrigram
index is present.The
trigram
index can be used to locate objects containing a specified substring and regular expression matching; similar to theLIKE
operator in SQL. In the case of theLIKE
operator the runtime just extracts the longest substring from the pattern excluding wildcards and performs a search for this substring, and then performs a regular expression match for all selected objects. It is also possible to use thetrigram
index for an exact match lookup, especially if there are no other indexes declared for the data set.