eXtremeDB supports hash indexes, and tree indexes of the following types:
tree
(b-tree),patricia
(Patricia Trie),rtree
(R-tree spatial),kdtree
(kd-tree multi-dimensional),trigram
(text substring) anduser-defined
(“custom”).Hash
andunique tree
indexes can also be used to uniquely identify objects in the database. Also, specialized hash indexes of typeoid
andautoid
are provided as unique object identifiers. Unlike oids and autoids, however,hash
andtree
index values are only required to be unique within a class; thenonunique
declaration can be used to allow duplicate values forhash
ortree
indexes.(See the Indexes and Cursors page for detailed descriptions and implementation details for each of the index types described below.)
Tree Index
An eXtremeDB
tree
index is a tree-like structure in the application’s memory that contains a collection of keys. A key can be simple (based on a single field) or compound (containing more than one field). The tree-index stores the keys in sorted order. In order to support sorting, the tree-index must have a compare function. Regardless of the key type (simple or compound), the compare function must be able to compare two keys and determine whether the first key is less than (compare returns -1), equal to (0) or greater than (1) the second key. The tree-index algorithm sorts the keys based on this comparison.Search Methods
Search APIs are generated by the
mcocomp
schema compiler to locate desired objects or groups of objects by unique identifier or by index. While exact match lookups by unique identifier (unique tree
orhash
index,oid
andautoid
) using the generated_find()
functions are extremely efficient for locating individual objects, the rich set of specialized indexes mentioned above employ cursors to navigate a group of objects as an ordered or unordered result set.Find
By definition, an exact match lookup on a unique index returns exactly one result or zero if no match is found. So for C and C++ applications, the mcocomp schema compiler generates a
_find()
function for each unique index. Java and C# applications use the Cursor.Find() method.Search
A cursor is essentially an iterator over the collection of objects in a result set. For each index declared for a class in the schema definition, the DDL compiler generates functions to instantiate a cursor, position it based on some value(s), and obtain a handle to the database object from the cursor.
List
andnonunique
hash-based cursors allow navigation in sequential order (first to last, or last to first), though the order of the sequence is not defined; it is simply a mechanism to iterate over the unordered list of objects of a class.For C or C++ applications, mcocomp generates a
_search()
function for eachnonunique
index in the schema. Java and C# applications use the Cursor.Search() method. The search function is used whenever it is necessary to establish:
- a starting position in a sorted list with a known starting value and optionally retrieve subsequent results in ascending or descending sorted order.
- a starting position in a sorted list when only part of the starting value is known, find the closest match, and optionally retrieve subsequent results in ascending or descending sorted order.
- a starting position as above, iterate over the sorted list in ascending or descending sorted order until a lower/upper bound is reached, using the _compare() method to determine when the range limit is reached.
- a starting position with a known starting value, iterate over the list to retrieve each duplicate, using the
_compare()
method to determine when the last duplicate was retrieved.Pattern Search
eXtremeDB also supports "wild card" pattern matching ability. This is the capability to search tree index entries matching patterns specified with wild card characters for single character and multiple character matches. By default, the question mark “?” will match any single character in the specified position within the pattern, and the asterisk “*” will match any combination of characters (including no characters) in that position. If a match on the characters “?” or “*” is desired, the wild card characters themselves can be modified by specifying different characters in the pattern search policy (see below).
For example, “G*e*” would return “Graves” and “Gorine”, while “Gr?ve*” would match “Graves”, “Grove”, “Grover” and so on... In this example, ‘*’ matches zero, one, or more characters, while ‘?’ matches exactly one character. Further, the pattern “G*E*” would match all uppercase entries like “GRAVES”, “GORINE”. However, because the standard compare functions used to match index values with search keys use case-sensitive compare functions, the case specified in the search pattern will affect the search results.
Key-Value-Inclusive Indexes
eXtremeDB allows a “key-value-inclusive” or “cache-oblivious” implementation for in-memory (
transient
) database classes. This expedites access for certain access patterns. In particular experiments with very large B-Tree indexes show significant performance improvements. Also, the trigram index (described below) is implemented via the “key-value-inclusive” tree because it is much more efficient to keep 3 extra bytes (trigrams) on the index page than to simply store an 8-byte object address which requires further access steps.Covering Indexes
If all the fields requested in an index search are available in the index tree node, then the runtime doesn't have to look up the object covered by the index again. This can significantly increase the performance of searches. Since all the requested fields are available within the index, the index is called a “covering index”. In addition, a “covering” index can be also declared inclusive to leverage CPU cache usage.
Patricia Trie Index
The eXtremeDB
Patricia
index uses a Patriciatrie
structure that is based on a radix tree using a radix of two. The term "trie" is derived from the word "retrieval". Patricia stands for "Practical Algorithm to Retrieve Information Coded as Alphanumeric”. This algorithm is optimized to perform quick IP address prefix matching for IP subnet, network or routing table lookups. So Patricia Trie indexes are particularly useful for network and telecommunications applications.A Patricia index can be declared over scalar and boolean data types as well as arrays and vectors of those types. In fact, the boolean data type was introduced to facilitate Patricia index implementation where bit arrays are used to store IP addresses.
RTree Index
RTree
indexes are commonly used to speed spatial searches, for example, find the rectangle that contains this point, find all rectangles that overlap this rectangle, or find all rectangles in the neighborhood of this point.All manner of shapes can be stored and searched with the
rtree
index. For example, a point is represented as a rectangle with width and height = 1 and a line that has starting and ending coordinates of 15, 844 and 0, 3647 is stored as rectangle with its upper left corner at 15, 844 and its lower right corner at 0, 3647.To determine if two lines intersect, or if a point is within a given area (described by a circle, rectangle, etc.), an
rtree
search is performed to find all overlapping rectangles.Kd-Tree Index
A
kdtree
is a data structure for organizing points in a k-dimensional space.kdtrees
are a useful data structure for several applications, such as lookups that involve a multidimensional search key. Thekdtree
is a binary tree in which every node is a k-dimensional point. Every non-leaf node generates a splitting hyperplane that divides the space into two subspaces. Points left of the hyperplane represent the left sub-tree of that node and the points right of the hyperplane represent the right sub-tree. The hyperplane direction is chosen in the following way: every node split to sub-trees is associated with one of the k-dimensions, such that the hyperplane is perpendicular to that dimension vector.The
kdtree
uses a Query-By-Example approach to locate objects that match a given search condition. The application creates pattern object(s) and assigns values to the fields that are included in the search criteria. Thekdtree
supports simple exact matches as well as range lookups. In the latter case, two pattern objects will be specified: one for the lower and one for the upper bounds of the search condition. If a field value is defined only for one boundary, it is considered an open interval that corresponds to a greater-than-or-equal-to or less-than-or-equal-to search condition.Trigram Index
Trigram search is a method of searching for text when the exact syntax or spelling of the target object is not precisely known. It finds objects which match the maximum number of three-character strings in the entered search terms, i.e. near matches. A classical B-Tree index allows exact match queries (eg. x='qqq'), range queries (eg. “x >= 10 and x <= 100”) and prefix match queries (eg. “x >= 'abc’).
However, for more complex regular expressions like “%xyz%” a B-Tree index would require a sequential search, which is significantly slower. In this case a trigram index can be ideal.
The idea of
trigram
index is rather simple. Assume that we have a string key 'qwerty'. The string is split into trigrams: sequences of three subsequent characters: "qwe", "wer", "ert", "rty". These trigrams are stored in a normal B-Tree index. Now consider a query looking for the substring "wert". The query condition is also split into trigrams: "wer" and "ert". The search then is looking for these trigrams in the index. The result of the lookup is two lists of objects -- so called inverse lists. The next step is to join these lists: i.e. to find objects which are present in both lists.OID
An eXtremeDB
oid
is a unique object identifier which is implemented internally as a unique hash index. Generally, anoid
is a user-defined structure which has fixed-length fields corresponding to identifying characteristics of a principle database class. C and C++ applications can useoids
to quickly retrieve the object it identifies.Autoid
An
autoid
is similar tooid
in that it is a unique object identifier implemented internally as a unique hash index. However it differs in that its value is determined by the eXtremeDB runtime, and the value is unique for the class, not the entire database. Autoids are 8-byte signed integer values which means that the value can be incremented every microsecond for nearly 300,000 years without overflowing.LIST
A sequential navigation method is available for unordered lists of objects by specifying a
list
index. This allows applications to iterate over the objects of a class without regard to any particular order.