Hash and Autoid Indexes

As explained in the Indexes page, the eXtremeDB hash indexes are more efficient for exact match searches than tree indexes within in-memory databases. eXtremeDB provides hash index algorithms modified for efficient operations in memory when the associated class is transient. Hash indexes will normally exhibit better average performance, for both insert and lookup operations, than a tree index, but this can depend on the initial hash table size and on key distribution.

In the ideal case, the hash table will be sized to contain one entry (sometimes called a “bucket”) for each database object being indexed which means that a single hash table lookup (a simple hash function calculation) will suffice to access each database object. But in reality, regardless of the table size, the hash function does not guarantee a unique value and consequently multiple index values can map to the same bucket (called a “conflict”) and a linked list of object pointers (called a “collision chain”) will result.

So if the number of database objects is known beforehand, this number can be specified in the schema declaration for the index and will result in optimal performance. But if the number of database objects cannot be known with a reasonable degree of confidence, the eXtremeDB runtime will dynamically allocate additional hash table space when needed.

When declared unique (which is the default), hash indexes can only be used for exact match searches or unordered list retrieval. Exact match searches are performed with the find operation for the chosen API. However, hash indexes can be declared nonunique to allow duplicate values to be stored with the same hash table entry. When declared nonunique, search operations can be performed to select all objects with the specified hash value.

 

Memory Consumption

Memory consumption is comparable for tree and hash indexes of transient classes. A rough estimate for a tree index is 10 bytes per entry (exact size depends on the order of insertions or deletions); and H + 8 bytes per entry for a hash index, where the constant H is fixed size space taken by the hash table and can be calculated as E / 5 * 4 where E is the estimated number of hash entries provided by you in the database schema and 5 is a constant hash factor used by eXtremeDB. If reallocation of the hash table is necessary, then the size will be H * 2.

 

Autoid indexes

The autoid declaration specifies that a class is stored with a system-generated unique identifier. The autoid is a guaranteed unique value generated by the eXtremeDB runtime and stored in an internal hash index. The autoid value for a new database object is inserted into the index hash table at the moment it is created and is never changed. (Note that this is different from other types of indexes. Normally new values are inserted into the index (hash table or tree structure) when the transaction within which these values are created or updated is committed or when a transaction checkpoint is performed.)

The autoid field is based on an unsigned 64-bit integer. Initially, during the database creation phase, all counters are initialized with zeros. During the database lifetime the counters increase each time a class object is created. eXtremeDB never decreases autoid values and never re-uses the same value. However technically it is possible for the counter to overflow and in this case, the counter would start to repeat the values. However note that if autoids are created every millisecond, the counter will not rollover for 6,405,119,470,038 days (over 17 million years)!

Object Identifier indexes

The C API allows definition of unique Object Identifier (oid) structures that can be highly efficient. Please refer to the C API Indexes page for further details.