ToC path:

eXtremeDB Definitions

A database is a collection of related data organized for efficient storage and retrieval. Beyond this, any attempt to more specifically describe a database inevitably involves individual features of one or more specific database implementations. The following definitions describe the eXtremeDB implementation.

Each database has groupings of elements. We call the definition of a group of elements a class; other terms commonly used are “table” and “record definition”. Instances of the class stored in the database are called objects, analogous to records and rows. As will be explained in detail in this documentation, an eXtremeDB class is much more than a relational database table or other database record definitions. Our purpose here is not to equate eXtremeDB elements to those of relational or other databases, but rather to outline the hierarchy of elements and contrast them with elements of familiar database architectures.

The term class is used in most object-oriented languages, such as C++, Java or C#. The class defines the properties of the object and the methods used to control the object’s behavior. This definition is correct for eXtremeDB classes as well - a database class defines object fields and access methods.

Elements are called fields in eXtremeDB. Other common terms are “attribute” and “column”. Fields have a type property. The type determines whether the element holds character, integer, real, or binary data. See the following link for a complete list of eXtremeDB data types. eXtremeDB also supports arbitrarily large fields through blob and vector types, and complex fields through the structure type.

A blob is an arbitrarily large stream of bytes; untyped “opaque” data. From the eXtremeDB perspective it has no structure. Classic examples are audio streams (.wav files), video streams (.mpg files), graphic files (.jpg files), and streams of text larger than 64 K.

A vector is an arbitrarily large stream of typed data (vector elements), such as a stream of 2-byte integers, strings or structures. In eXtremeDB you can define vectors of any type except blob. Vectors are useful when describing real-world complex objects such as tree-like data structures.

In addition to grouping fields into a class, eXtremeDB allows a sub-grouping called a structure. A structure declaration names a type and specifies elements of the structure that can have different types. Structures and simple types are building blocks to construct class definitions. Structures can be used as elements of other structures (i.e. nested structures). Like other element types, you can have a vector of structures.

Generally speaking, fields are either simple or complex. Simple fields are atomic types such as char, integer, string, and so on. Complex fields can be vectors of simple types, structures (which may in turn contain structures and vectors), vectors of structures, and blobs.

Note: eXtremeDB structures, in contrast to C or C++ structures, cannot be instantiated separately; they exist only as a part of an object of some class.

In the relational or hierarchical data models, records are constructed from basic data type fields. The collection of built-in data types and built-in operations were motivated by the needs of business data processing applications. However, in many engineering or scientific applications this collection of types is not adequate. For example, in a scientific application, a requirement could be to describe a time series and store and access it with appropriate operations. Another common example is tree-like structures that are widely used in engineering applications such as routing tables or “electronic program guide” implementations for set-top boxes. Historically, complex data types and operations on them have been simulated using basic data types and operations provided by the DBMS with substantial inefficiency and added complexity. Complex objects are represented by multiple basic tables or records and defining relationships between them.

When working with traditional relational or hierarchical data models, application developers represent their objects as records or rows in tables. In many cases objects cannot be represented with one record, and developers are forced to store parts of an object in different tables and define relationships between the object’s parts (in DBMS jargon, these steps are known as normalization and object-relational mapping). However, objects are entities, all parts of which are working as a whole. Consequently, developers usually introduce their own APIs to store and retrieve objects. These APIs shield the inner complexity within objects from the application, but at the same time, introduce extra layers of application code that must be written, debugged and executed.

It is possible to use eXtremeDB in the same manner as described in the previous paragraph. Alternatively, because eXtremeDB employs an object-oriented approach to database development, objects can have a more complex nature than merely a one-dimensional collection of fields. eXtremeDB fields can have complex structure themselves, for example, a field may be a dynamic array of structures; structures in this vector can also hold other structures or arrays. eXtremeDB class access methods allow for simple but highly efficient access to objects without extra layers of navigation code.

eXtremeDB extends C and C++ data types with ACID-compliant data access methods (see http://en.wikipedia.org/wiki/ACID) to provide developers with a simple, clean, efficient, and easily adjustable approach to database development. Similarly, Java and .NET Framework developers can add eXtremeDB persistence to Java and C# classes by simple annotation.

eXtremeDB, like other databases, provides indexes to provide access to objects based on key values. An index definition consists of any combination of fields, structure elements or vector elements from a given class. Indexes can be of various types, the most common of which are either hash or tree (BTree). Tree indexes can have a mix of ascending and descending components and, in addition to exact-match searches, can be used for sorting and range-based retrieval. Hash indexes are used for fast storage and retrieval but without sorting or range-based access. Both tree and hash indexes can be declared unique or nonunique to exclude or allow duplicate values.

eXtremeDB also provides a number of specialized indexes that optimize access to records and groups of records for particular types of applications, such as the following:

Patricia Trie: Particularly useful for network and telecommunications applications, these are often used to quickly perform IP address prefix matching for IP subnet, network or routing table lookups.

R-Tree: These indexes are commonly used to index latitude/longitude and speed spatial searches; for example to find the rectangle that bounds a given point, or all rectangles that overlap a specified rectangle.

Kd-Tree: Using a data structure for organizing points in a k-dimensional space, kd-trees speed lookups that involve a multidimensional search key, and are most commonly used in query-by-form and query-by-example cases.

Indexes can also be used to establish relationships between objects of different classes (in addition to the use of oid and autoid, which are discussed later) as in relational databases which employ the concept of primary and foreign keys. By definition, a primary key is a column or combination of columns that uniquely identify a row in a table. Correspondingly, a foreign key is a column or combination of columns whose values match the primary key of some other table. In eXtremeDB a primary key is normally implemented as a unique index on a field or combination of fields within a class, while a foreign key is normally implemented as a non-unique index (allowing duplicates), whose elements (fields) correspond to a primary key of some other class in the database.

In addition to indexes, classes in eXtremeDB can be declared to have an object identifier, or oid. These are different from primary keys in that the composition of an oid is the same for every class in the database. A class can also have a reference to object(s) of another class, called a reference, by declaring a field of data type ref. A ref is a reference in one object to the oid of another object.

At first, it may seem strange that oids must have an identical composition for every class in the database. In actuality, this models the real world in many application environments. For example, a system that receives data from some automated source will receive objects that have an identifier already supplied by the source. An example could be a network of sensors for which the oid of every sensor class is sensor-type + sensor-id + measurement-timestamp. Note that not every class is required to have an oid.

Oids and refs are often better alternatives to indexes for establishing inter-object relationships. An object can have a vector of references, as one means to implement a one-to-many relationship across classes. Indexes can and should be used to implement fast random access to objects by one or more key fields, for sorted access, and for range retrieval.

eXtremeDB also provides the autoid type. An autoid is similar to what other DBMS variously call sequence, serial, auto-increment, etc. Autoid is similar to oid, except that the structure and value of autoid fields are determined by the eXtremeDB runtime system. An application uses the autoid_t typedef in C and C++ applications to declare program variables used to store autoid values and the AutoID attribute in Java and C# applications, to create a reference in one object to the autoid value of another object. Autoid and autoid_t can be used as an alternative to oid and ref respectively, whenever a natural oid does not exist, is deemed too cumbersome, or an automatically incrementing identifier is desired.

To express the content and organization of a database for C and C++ projects, the database designer uses some or all of these components in a data definition language (DDL) to create a database schema. The schema is a textual description of the data model. It is processed by the eXtremeDB schema compiler, which ensures that the schema is syntactically correct and then generates the application programming interface (API) header (.h) and implementation (.c) files. When the application is compiled the database dictionary for the database is produced from the implementation file. The database dictionary is simply a binary form of the schema that the database runtime can use more efficiently.

In contrast, in Java and C# projects, the “schema” consists of native language classes that are marked with the Persistent attribute and the database dictionary is generated at run-time when the attributes of database objects are obtained through the reflection mechanism N.B.: “Persistent” in this context only means that objects are stored in an eXtremeDB database, which might be in-memory or on a file system. The Persistent annotation serves only to distinguish these classes from ordinary transient Java or C# classes whose objects are not stored in the database.

Note: It is possible also to build eXtremeDB application architectures that combine Java or C# class implementations with modules or libraries that interface with an eXtremeDB database through the C or C++ API. In this case the schema file necessary for the C or C++ modules must be generated from the persistent Java or C# classes by calling the Database method GenerateMcoFile() to generate an eXtremeDB data definition language (DDL) schema file that can then be processed by the schema compiler mcocomp.

A C or C++ eXtremeDB application uses the API generated by the schema compiler to store, read, and manipulate objects in an eXtremeDB database. This is in contrast to many database products that offer a static proprietary navigational API, or a static standard API (like SQL). The eXtremeDB API is always tailored to the application, so the integration with the application happens naturally. It is very much like the API would be if a developer wrote a database interface specifically for the needs of the application. And that is what is intended. In the world of embedded systems, the most common alternative for any commercial database product is the “homegrown” database. eXtremeDB offers all the advantages of “homegrown”, i.e. optimal performance and small footprint, without excess baggage, and an API that fits seamlessly with the rest of the application while also delivering the advantages of an off-the-shelf database solution: lower development cost, lower maintenance cost, and shorter time to market.

Related Topics