Creating and Modifying Database Objects in C

As explained in the following links, the static application-independent functions are used for eXtremeDB runtime control, database control (opening, connecting to and closing databases), transaction management and cursor navigation. But to create and modify individual database objects, C applications use the strongly typed object interfaces generated by the mcocomp schema compiler.

The schema compiler generates the following types of C interface functions depending on the specific DDL class definitions: new and delete object creation and removal methods, put and get methods for storing and retrieving object field data, oid and autoid access methods, find and search methods for looking up objects by indexes, and event methods for responding to database events.

Some functions operate on entire database objects, while others on fields within the object. To generate function names for object-action functions, the compiler uses the class name followed by _new(), _delete() or _delete_all(). To generate function names for field-action functions the compiler uses the class or structure name, followed by the field name and then the action word, all separated by underscores. Action words can be any of the following: put, get, at, put_range, get_range, alloc, erase, pack and size.

New and Delete

In C applications, creating and deleting objects are accomplished by calling the generated _new() and _delete() functions. The _new() function reserves initial space for an object and returns a reference (handle) to the freshly created data object. These functions are generated only for classes; no _new() or _delete() functions are generated for structures because structures are never instantiated by themselves in the database; they always belong to some class.

For classes declared without an oid:

 
    MCO_RET classname_new ( /*IN*/ mco_trans_h t, /*OUT*/ classname *handle);
     

For classes declared with an oid, the oid must be passed:

 
    MCO_RET classname_new ( /*IN*/ mco_trans_h t, /*IN*/ databasename_oid * oid, /*OUT*/ classname *handle);
     

The _delete() function permanently removes the object whose handle is passed while the _delete_all() function removes all objects of the class from the database. Storage pages occupied by the object(s) are also returned back to the storage manager for reuse.

 
    MCO_RET classname_delete (/*IN*/ classname *handle);
     
    MCO_RET classname_delete_all ( /*IN*/ mco_trans_h t );
     

Put and Get

For each field of an object and for each element of a structure declared in the schema file, _put() and _get() functions are generated. The _put() functions are called to update specific field values. Depending on the type of field, the generated _put() function will be one of the following forms.

For scalar type fields it will be of the form:

 
    MCO_RET  classname_fieldname_put( /*IN*/ classname *handle, /*IN*/ <type> value);
     

For char and string fields a pointer and length argument are required:

 
    MCO_RET classname_fieldname_put( /*IN*/ classname *handle, /*IN*/ const char *value,
                        /*IN*/ uint2 len);
                         

It is important to understand how the _put() operation copies data to the database field and any associated indexes. Consider the following schema:

 
    persistent class MyClass
    {
        char<10> buf;
        tree<buf> idx;
    }
     

Now the following code snippet puts “abc” into the char<10> field buf:

 
    MyClass cls;
    char buf[10] = "abc"
    ...
    MyClass_new(t, &cls);
    MyClass_buf_put(&cls, buf, strlen(buf));
     

The eXtremeDB runtime will copy the specified number of characters (3) into the field and fill the remaining 7 bytes with the MCO_SPACE_CHAR. The default value of MCO_SPACE_CHAR is \0. This normalizes the value of the unused part of the field for later sort operations. The runtime will not copy the extra null terminator from the input string.

If there is an index on this field, the index node is handled differently depending on whether this class is transient or persistent. For transient classes, no data is copied to the index node. For persistent classes, the entire contents of the field are copied from the field value (not from the input variable) to the index node. So for the example above the bytes [abc\0\0\0\0\0\0\0] will be copied into the field buf first, and then (when the transaction is committed) from the field buf to the index node (because this is a persistent class).

Note that if a character array of length 10 is copied into field buf, there is no null terminator in the field. If a character array of more than 10 characters is used as the argument to _put(), only the specified number of characters (obviously <= 10) is copied.

The _get() functions are called to bind a field of an object to an application variable and the function will be one of the following forms depending on the type of field.

For scalar type fields it will be of the form:

 
    MCO_RET classname_fieldname_get( /*IN*/ classname *handle, /*OUT*/ <type> *value);
     

For fixed length char fields the length must be specified:

 
    MCO_RET classname_fieldname_get( /*IN*/ classname *handle, /*OUT*/ char *dest,
                        /*IN*/ uint2 dest_size);
                         

If the field is a string then the function takes two extra parameters: the size of the buffer to receive the string, and an OUT parameter to receive the actual number of bytes returned. So the generated function will have the form:

 
    MCO_RET classname_fieldname_get( /*IN*/ classname *handle, /*OUT*/ char *dest, 
                        /*IN*/ uint2 dest_size, /*OUT*/ uint2 *len);
                         

Some things to note about the behavior of _get() on character and string fields (using the example of field) :

Numeric and Decimal generated functions

As stated in the Base Data Types page, the values for database fields of type decimal or numeric are stored internally as integers of a size determined by the specified width:

Width Storage type
1-2 signed<1>
3-4 signed<2>
5-9 signed<4>
10-19 signed<8>

For these fields, the standard _put() and _get() functions described above are generated and the argument passed to _put() and _get() will be an integer pointer or value of the corresponding size.

In addition to the _put() and _get(), the following functions are generated to allow specifying the field value as a character string:

 
    MCO_RET  CLASS_FIELD_put_chars( CLASS *handle, char const* buf);
     
    MCO_RET  CLASS_FIELD_get_chars( CLASS *handle, char* buf, int buf_size);
     

The _put_chars() function converts the input string of characters to an integer value and stores it in the database field. The _get_chars() function extracts the value from the database field and represents it as a string of characters. To facilitate conversion of integer values to character string and vice versa, two helper functions are also generated:

 
    MCO_RET  CLASS_FIELD_to_chars( TYPE scaled_num, char* buf, int buf_size);
     
    MCO_RET  CLASS_FIELD_from_chars( TYPE* scaled_num, char const* buf);
     

Consider a schema defining a decimal like:

 
    class A {
        ...
        decimal<10,3> dec;
        ...
    };
     

The following code snippet demonstrates how these functions might be used in practice:

 
    A     a;
    int8  i8;
     
    /* Allocate an object */
    A_new ( t, &a );
    ...
 
    /* put int8 value to numeric field */
    A_dec_from_chars( &i8, "123456");
    A_dec_put( &a, i8 );
    ...
     
    /* put char array value to decimal field */
    A_dec_to_chars( 987654321, buf, sizeof(buf));
    A_dec_put_chars( &a, buf );
    ...
     
    /* put char array value to decimal field */
    A_dec_to_chars( 987654321, buf, sizeof(buf));
    A_dec_put_chars( &a, buf );
    ...
     
    A_from_cursor(t, &csr, &a);
    printf("\n\tContents of first record A: \n");
    ...
     
    /* get values from numeric/decimal fields */
    A_dec_get( &a, &i8);
    A_dec_get_chars( &a, buf, sizeof(buf));
    printf("\t\tdec=%lld, chars(%s)\n", i8, buf );
     

Fixed _put() and _get()

Often database classes will contain many fields with the consequence that fetching and storing these objects require a long series of _get() and _put() function calls for each individual field. To simplify this work of coding, the schema compiler generates a C-language structure for all scalar fields and arrays of fixed length and additional <classname>_fixed_get() and <classname>_fixed_put() functions are generated that can significantly reduce the number of function calls required. But, as the name indicates, these functions can only be generated for the fixed size fields of a given class. If a class contains fields of variable length (e.g. string, vector or blob fields) then these fields must be accessed with their individual _get() and _put() functions.

For example, the following schema:

 
    struct B 
    {
        signed<1> i1;
        signed<2> i2;
        signed<4> i4;
 
        char<10>  c10;
 
        float  f;
    };
     
    class A 
    {
        unsigned<1> ui1;
        unsigned<2> ui2;
        unsigned<4> ui4;
        double         d;
        string  s;
 
        B       b;
 
        list;
    };
     

would cause the following “C” structures to be generated:

 
    /* Structures for fixed part of the classes */
    typedef struct B_fixed_ {
        int1 i1;
        int2 i2;
        int4 i4;
        char c10[10];
        float f;
    } B_fixed;
     
    typedef struct A_fixed_ {
        uint1 ui1;
        uint2 ui2;
        uint4 ui4;
        double d;
        B_fixed b;
    } A_fixed;
     

with the following access functions:

 
    MCO_RET  B_fixed_get( B *handle_, B_fixed* dst_ );
     
    MCO_RET  B_fixed_put( B *handle_, B_fixed const* src_ );
     
    MCO_RET  A_fixed_get( A *handle_, A_fixed* dst_ );
     
    MCO_RET  A_fixed_put( A *handle_, A_fixed const* src_ );
     

Using these functions, objects of the A class can be written with two function calls: A_fixed_put() for the fixed size portion and A_s_put() for the variable length field of type string s. Similarly, the objects of this class can be read with two function calls: A_fixed_get() and A_s_get().

The following code snippet illustrates the use of the fixed length structures and the fixed_get() function:

 
    int main(int argc, char* argv[])
    {
        MCO_RET rc;
        ...
        mco_trans_h t;
        mco_cursor_t csr; /* cursor to navigate database contents */
        A        a;   /* object handle */
        A_fixed _a;   /* fixed size part of class A */
        B        b;   /* struct handle */
        B_fixed _b;   /* fixed size part of class B */
        uint1  ui1;   /* value place holders */
        uint2  ui2;
        ...
     
        /* Open a READ_ONLY transaction, read object A and display its contents */
        rc = mco_trans_start(connection, MCO_READ_ONLY, MCO_TRANS_FOREGROUND, &t);
        if ( MCO_S_OK == rc ) 
        {
            rc = A_list_cursor(t, &csr);
            if ( MCO_S_OK == rc ) 
            {
                A_from_cursor(t, &csr, &a);
                A_fixed_get(&a, &_a);
                A_s_get( &a, buf, sizeof(buf), &ui2 );
                printf("\n\tContents of record A: s (%s), ui1 = %d, b.i1 = %d\n",
                    buf, _a.ui1, _a.b.i1 );
            }
        }
        rc = mco_trans_commit( t );
    }
     

Nullable Fields

If a scalar class element (int, float, double) has been declared nullable in the database schema, the following interfaces are generated:

 
    MCO_RET classname_fieldname_indicator_get( classname *handle, uint1 *result );
     

The argument result will have a value of 1 upon return if the field is null, otherwise result will be 0.

     
    MCO_RET  classname_fieldname_indicator_put( classname *handle, uint1 value);
     

Pass a value of 1 to set the null indicator, 0 to clear the null indicator.

Note that setting or clearing the null indicator has no effect on the underlying field’s value. In other words, if a nullable uint2 field has a value of 5 and <classname_fieldname>_indicator_put( h, 1 ) is called for the field, it will still have a value of 5 after the call.

 

For fields of all types, the respective forms of <classname_fieldname>_get() can return MCO_S_NULL_VALUE. <classname>_fixed_get() can also return MCO_S_NULL_VALUE, indicating that one or more constituent fields are null; a further examination with <classname_fieldname>_indicator_get() will be necessary to determine which field(s) are null.

Please see the Nullable Fields and Indexes page for an explanation of the behavior of nullable fields included in indexes.

Checkpoint and Size

When an indexed field is modified by a transaction, the object is removed from all indexes defined for that class. Regardless of the whether the modified field is present in other indexes. Once the object is removed from indexes, there is no way to locate the object based on any search function. The object is put back into the indexes upon the transaction commit.

The _checkpoint() API is the only way to put the object back into indexes within the transaction and thus make it visible through search functions.

 
    MCO_RET classname_checkpoint ( /*IN*/ classname *handle);
     

If a unique index constraint is violated, _checkpoint() will return status code MCO_S_DUPLICATE.

For fields of type string and vector, an additional _size() function is generated to return the actual length of the string value or the number of elements in the vector, so that you can allocate space for it:

 
    MCO_RET classname_fieldname_size( /*IN*/ classname *handle, /*OUT*/ uint2 *size);
     

Autocompaction of dynamic objects

If a class contains dynamically extended components (fields of type vector or string), an object of this class that is frequently updated can develop memory holes. To prevent this kind of fragmentation, an autocompaction feature is provided. To enable autocompaction, specify a non-zero value for the database parameter autocompact_threshold in the mco_db_params_t passed into mco_db_open_dev().

If the size of an object exceeds this autocompact_threshold value, then the autocompaction algorithm reallocates objects, eliminating any internal fragmentation. However, note that object compaction is not a cheap operation, and should not be performed frequently. So a recommended value for this threshold is a number of bytes several kilobytes larger than a normal object’s expected size.

Vectors and Fixed-length Arrays

eXtremeDB vectors are by definition of variable length, whereas arrays are fixed length. For C applications, vectors and fixed-length arrays require a number of special functions. Fixed-length arrays are given the specified number of bytes of static memory in the record layout, but vector fields are initially just references that must be allocated storage at runtime. The _alloc() function reserves space for the vector field’s elements within a data layout. The application must call the _alloc() function to supply the size of the vector before values can be stored in the vector field. Otherwise the vector reference will remain null. Invoking the _alloc() function for a vector field of an existing object will resize the vector. If the new size is less than the current size the vector is truncated to the new size.

 
    MCO_RET classname_fieldname_alloc( /*IN*/ classname *handle, /*IN*/ uint2 size);
     

The functions that operate on vector and array fields require an index argument but are otherwise functionally equivalent to their scalar counterparts. The _put() function for fields declared as vector or fixed-size array have the form:

 
    MCO_RET classname_fieldname_put( /*IN*/ classname *handle, /*IN*/ uint2 index, /*IN*/ <type> value);
     

Fields declared as vectors of strings have the form:

 
    MCO_RET classname_fieldname_put( /*IN*/ classname *handle, /*IN*/ uint2 index,
                        /*IN*/ const char * value, /*IN*/ uint2 len);
                         

For convenience, _put_range() methods are generated to assign an array of values to a vector or array. (Note that the size of the IN array should be less than or equal to the size of the vector as specified in the vector’s _alloc() function call, or the size of the array as defined in the <classname_fieldname>_size constant in the generated header file.

 
    MCO_RET classname_fieldname_put_range( /*IN*/ classname *handle, /*IN*/ uint2 start_index,
                            /*IN*/ uint2 num, /*IN*/ const <type> *src );
                             

Note that _put_range() methods are only generated for vectors that consist of simple scalar elements. For vectors of structures and vectors of strings this method is not generated. The reason is that for simple type vector elements the schema compiler can generate optimized methods to assign values to them. This optimization is only possible if the size of the vector element is known at compile time. Also note that it is never necessary to use a _put_range() method to set the vector; the _put() function can always be iterated to assign individual vector element values for the desired range.

To access a specific element of a vector the _at() function is generated. The form of the _at() function will vary depending on the type of elements stored in the vector. For vectors of fixed-length fields it will have the form:

 
    MCO_RET classname_fieldname_at( /*IN*/ classname *handle, /*IN*/ uint2 index, /*OUT*/ <type> *result );
     

If the vector consists of strings or fixed length byte-arrays (char<n>), the _at() function takes two extra parameters: the maximum size of the buffer to receive the string and the actual length of the string returned:

 
    MCO_RET classname_fieldname_at( /*IN*/ classname *handle, /*IN*/ uint2 index,
                        /*OUT*/ char *result, /*IN*/ uint2 bufsize,
                        /*OUT*/ uint2 *len);
                         

When allocating memory (for host variables) for vectors of variable length elements, it may be necessary to first determine the actual size of the vector element. The _at_len() functions are generated for vectors of strings for this purpose:

 
    MCO_RET classname_fieldname_at_len( /*IN*/ classname, /*IN*/ uint2 pos, /*OUT*/ uint2 *retlen);
     

The _get_range() function returns a range of vector elements, for vectors of scalar elements:

 
    MCO_RET classname_fieldname_get_range( /*IN*/ classname, /*IN*/ uint2 startIndex,
                            /*IN*/ uint2 num, /*OUT*/ const <type> *dest);
                             

The _erase() function is generated for vectors of structures, vectors of strings, as well as for optional struct fields. The _erase() function removes an element of a vector from the layout and from all indexes the element is included in. Note that the vector size remains unchanged. If an attempt is made to get the erased element, the runtime returns a null pointer and MCO_S_OK. (Also note that the _erase() function is only generated for vectors of structures, not for vectors of basic types. For vectors of basic types, the application should _put() a recognizable value in the vector element that it can interpret as null.)

 
    MCO_RET classname_fieldname_erase( /*IN*/ classname *handle, /*IN*/ uint2 index);
     

The use of the _erase() function can leave unused elements (“holes”) in vector fields. For this reason, the _pack() function is generated for vector fields to remove “holes” so that the space occupied by the deleted element is returned to the free database memory pool.

Likewise, if an application had a non-empty string allocated and then modified the string value to null, the _size() function would return 0, but the actual space for the string would not be automatically reclaimed. The application needs to call the generated _pack() function to return that space to the storage pool.

 
    MCO_RET classname_pack ( /*IN*/ classname *handle, /*OUT*/ uint4 pages_released );
     

Character string collation

The eXtremeDB core and UDA programming interfaces for C applications include support for collations. A collation, as defined in Wikipedia, “is the assembly of written information into a standard order. One common type of collation is called alphabetization, though collation is not limited to ordering letters of the alphabet.”

Collation is implemented as a set of rules for comparing characters in a character set. A character set is a set of symbols with assigned ordinals that determine precise ordering. For example, in the segment of the Italian alphabet consisting of the letters “a, a`, b, c, d, e, e`, f” the letters could be assigned the following ordinals: a=0, a`=1, b=2, c=3, d=4, e=5, e`=6, f=7. This mapping will assure that the letter “a`” (“accented a”) will be sorted after “a” but before “b”, and “e`” will follow “e” but precede “f”.

In some character sets, multiple-character combinations like “AE” (“labor lapsus” in the Danish and Norwegian alphabets or “ash” in Old-English) and “OE” (an alternate form of “Ö” or “O-umlaut” in the German alphabet) are treated as single letters. This poses a collation problem when strings containing these character combinations need to be ordered. Clearly, a collation algorithm to sort strings of these character sets must compare more than a single character at a time.

“Capitalization” is also a collation issue. In some cases strings will be compared in a “case sensitive” manner where for example the letters “a-z” will follow the (uppercase) letter “Z”, while more often strings will be compared in a “case insensitive” manner where “a” follows “A”, “b” follows “B”, etc. This can be easily accomplished by treating uppercase and lowercase versions of each letter as equivalent, by converting upper to lower or vice versa before comparing strings, or by assigning them the same ordinal in a case-insensitive character set.

eXtremeDB enables comparison of strings using a variety of collations, and to mix strings and character arrays with different character sets or collations in the same database; character sets and collations are specified at the application level.

Collation Data Definition Language and API function definitions

As explained in page Custom Collations, eXtremeDB DDL language provides a collation declaration for tree and hash indexes on string-type fields as follows:

 
    [unique] tree<string_field_name_1 [collate C1] 
            [, string_field_name_2 [collate C2]], …> index_name;
 
    hash<string_field_name_1 [collate C1]
            [, string_field_name_2 [collate C2]], …> index_name;
             

If a collation is not explicitly specified for an index component, the default collation is used. Based on the DDL declaration, for each collation the DDL compiler will generate the following compare function placeholders for tree indexes and/or hash indexes using this collation:

 
    int2  collation_name_collation_compare( mco_collate_h c1, uint2 len1,
                            mco_collate_h c2, uint2 len2 );
    {
        /* TODO: add your implementation here */
        return 0;
    }
     
    uint4 collation_name_collation_hash (mco_collate_h c, uint2 len)
    {
        /* TODO: add your implementation here */
        return 0;
    }
     

For each defined collation, a separate API is generated. The actual implementation of the compare functions, including the definition of character sets, is the application’s responsibility. To facilitate compare function implementation, eXtremeDB provides the following set of functions:

 
    mco_collate_get_char(mco_collate_h s, char *buf, uint2 len);
    mco_collate_get_nchar(mco_collate_h s, nchar_t *buf, uint2 len);
    mco_collate_get_wchar(mco_collate_h s, wchar_t *buf, uint2 len);
    mco_collate_get_char_range(mco_collate_h s, char *buf,
    uint2 from, uint2 len);
    mco_collate_get_nchar_range(mco_collate_h s, nchar_t *buf,
        uint2 from, uint2 len);
    mco_collate_get_wchar_range(mco_collate_h s, wchar_t *buf,
        uint2 from, uint2 len);
         

Note that three different versions of the mco_collate_get_*char() and mco_collate_get_*char_range() functions are required because, in order to use the same collation, the arguments must be of the corresponding type for the field being accessed. In other words: for fields of type string and char<n>, the *char version (mco_collate_get_char()) will be called; for fields of type nstring and nchar<n>, the *nchar version; and for fields of type wstring and wchar<n>, the *wchar() version.

The C application registers user-defined collations via the following function:

 
    mco_db_register_collations(dbname, mydb_get_collations());
     

This function must be called prior to mco_db_connect() or mco_db_connect_ctx() and must be called once for each process that accesses a shared memory database. The second argument mydb_get_collations() is a database specific function similar to mydb_get_dictionary() that is generated by the DDL compiler in the files mydb.h and mydb.c. In addition, the DDL compiler generates the collation compare function stubs in mydb_coll.c. (Note that if the file mydb_coll.c already exists, the DDL compiler will display a warning and generate mydb_coll.c.new instead.)

Please see page Custom Collations for further details and examples using custom collations.

Blob Support

BLOB fields are useful when it is necessary to keep streaming data, with no known size limits. C applications use the generated _get() function to copy BLOB data to an application’s buffer; it allows specification of a starting offset within the BLOB.

 
    MCO_RET classname_fieldname_get( /*IN*/ classname *handle, /*IN*/ uint4 startOffset,
                        /*OUT*/ char *buf, /*IN*/ uint4 bufsz,
                        /*OUT*/ uint4 *len);
                         

The bufsz parameter is the size of the buffer passed by the application in the buf parameter. The len output parameter is the actual number of bytes copied to the buffer (which will be <= bufsz).

The _size() function returns the size of a BLOB field. This value can be used to allocate sufficient memory to hold the BLOB, prior to calling the _get() function.

     
    MCO_RET classname_fieldname_size( /*IN*/ classname *handle, /*OUT*/ uint4 * result);
     

The _put() function populates a BLOB field, possibly overwriting prior contents. It allocates space and copies data from the application’s buffer; the size of the BLOB must be specified.

 
    MCO_RET classname_fieldname_put( /*IN*/ classname *handle, /*IN*/ const void *from,
                        /*IN*/ uint4 nbytes);
                         

The _append() function is used to append data to an existing BLOB. This method is provided so an application does not have to allocate a single buffer large enough to hold the entire BLOB, but rather can conserve memory by writing the BLOB in manageable pieces.

 
    MCO_RET classname_fieldname_append(/*IN*/ classname *handle, /*IN*/ const void * from,
                        /*IN*/ uint4 nbytes );
                         

To erase (truncate) a BLOB, pass a size of 0 to the _put() method.

Binary Data

While blob fields are useful for large binary data, they are intended only for large data fields (greater than 1 Kb). It is recommended to use string fields for character or binary data (less than 64 Kb). String fields can hold arbitrary binary data when not used for indexes (because index comparisons require a null terminator). Bu unlike blob fields, binary and varbinary fields can be added to both simple and compound indexes.

Date, Time and Datetime Fields

Please refer to the Datetime FIelds page for a detailed description of the eXtremeDB date, time and datetime database field types. The C APIs for determining precision and accessing date, time and datetime fields are described in the Managing Datetime Fields in C page.