Database Concurrency and Transaction Management in C

As explained in the Transaction and Concurrency control pages, applications use transaction blocking for all database access. This allows the eXtremeDB transaction managers to schedule and process all database operations, whether they involve simple READ_ONLY access or READ_WRITE operations that modify database objects.

Transaction blocking

An eXtremeDB transaction block consists of a set of database operations enclosed within a transaction start and commit or rollback. In C applications, transactions are started by calling one of two eXtremeDB functions mco_trans_start() or mco_trans_start_ex(). The second differs only in that it allows setting the isolation level for the transaction. To commit a transaction call mco_trans_commit(); to discard any database operations since the mco_trans_start(), call mco_trans_rollback().

Transaction Managers

As explained in the Fundamental Concepts page, eXtremeDB offers three transaction managers to meet varying application demands and concurrency strategies. The choice of transaction manager can have a significant performance impact on applications. But fortunately, changing transaction managers is simply a matter of changing the linker directive and rebuilding the application. Please use these links to view implementation details of the MURSIW and MVCC transaction managers. To link with MURSIW use the mcotmursiw_debug library for development, and mcotmursiw for release versions of your application. To link with MVCC use the mcotmvcc_debug library for development, and mcotmvcc for release versions of your application.

The EXCL(usive) transaction manager is intended only for single-process, single-threaded applications, and thus does not actually manage concurrency. To link with EXCL use the mcotexcl_debug library for development, and mcotexcl for release versions of your application.

Setting the Isolation Level

When mco_trans_start() is called, the transaction isolation level is set to MCO_DEFAULT_ISOLATION_LEVEL, which is by default MCO_SERIALIZABLE for MURSIW and MCO_REPEATABLE_READ for MVCC. (Note that if MURSIW is used the only possible level is MCO_SERIALIZABLE.) It is possible to redefine the default transaction isolation level for the database session (connection). In C applications this is done by calling function mco_trans_set_default_isolation_level() which returns the previous default isolation level. The application can determine the current isolation level by calling function mco_trans_isolation_level() and can inspect what transaction isolation levels are supported by the currently running transaction manager by calling function mco_trans_get_supported_isolation_levels().

Setting the Transaction Priority and Scheduling Policy

As explained in the Transaction Priorities and Scheduling page, applications can adjust the transaction priority and scheduling policy at runtime. The transaction priority is specified in the call to mco_trans_start(). Applications can explicitly define the MURSIW scheduling policy by setting the desired MCO_TRANS_SCHED_POLICY flag in the trans_sched_policy in the mco_db_params_t passed into mco_db_open_dev().

MVCC Conflict management

When MVCC is used with other than MCO_SERIALIZABLE, then MCO_READ_WRITE transactions are executed concurrently. Sometimes concurrent transactions modify the same objects, thus creating transaction conflicts. The transaction manager resolves those conflicts by aborting one of the conflicting transactions and letting the other one commit its updates to the database. When a transaction is aborted, the application receives the MCO_E_CONFLICT error code. It is the application’s responsibility to manage this possibility with logic similar to the following:

 
    do {
        mco_trans_start( db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &t);
        ...<update database>...
        rc = mco_trans_commit(t);
    } while ( rc == MCO_E_CONFLICT );
     

Note that when MVCC is used, the application must be able to tolerate transaction rollbacks due to conflicts.

If the number of conflicts is too high, it could lead to sharp performance degradation due to the need to retry transactions. When this occurs, the transaction manager temporarily changes the isolation level to MCO_SERIALIZABLE. The application can set the conflicts threshold over which the optimistic control is disabled. This is done by calling mco_trans_optimistic_threshold().

If the percentage of transactions that have been aborted because of the transaction conflict exceeds the max_conflicts_percent, the transaction isolation level is changed to SERIALIZABLE for disable_period successive transactions. SERIALIZABLE permits a single MCO_READ_WRITE transaction at a time (eliminating the potential for conflicts), that can run in parallel with MCO_READ_ONLY transactions. By default, the optimistic threshold is set to 100, which means “never disable optimistic mode no matter how many conflicts occur”.

Adjusting Transaction Locking

Note that when MVCC is used it is possible to adjust the maximum number of active write transactions when locking of B-Tree indexes is being performed. This threshold is set by parameter index_optimistic_lock_threshold in the mco_db_params_t passed into mco_db_open_dev().

Accelerating MVCC Performance

It has been verified that performance of the MVCC transaction manager can be accelerated by use of an internal bitmap. By default this feature is disabled. An application can enable this feature by specifying a non-zero value for database parameter mvcc_bitmap_size in the mco_db_params_t passed into mco_db_open_dev(). (Note that the value specified should be a power of 2.)

Two-phase commit

Some applications require more elaborate control of transaction commit processing; specifically, committing the transaction in two steps (phases). The first phase writes the data into the database, inserts new data into indexes and checks index restrictions (uniqueness) (all together, the “pre-commit”) and returns control to the application. The second phase finalizes the commit.

One example of such an application is the case where multiple eXtremeDB databases need to synchronize the updates performed within a single transaction. Another example could be that the eXtremeDB transaction commit is included in a global transaction that involves other database systems or external storage. In this case, the application coordinates the eXtremeDB transaction with the global transaction between the first phase and the second phase.

To facilitate these and similar application scenarios, eXtremeDB provides the following two API functions for C applications: mco_trans_commit_phase1() and mco_trans_commit_phase2().

To perform the two-phase commit, the application needs to call the commit phases sequentially instead of calling one mco_trans_commit(). After the first commit phase is returned, the application cannot perform any activities against the database except initiating the second commit phase or rolling back the transaction. This process is illustrated in the following code segment:

 
    mco_db_h db;
    mco_trans_h t;
     
    ...
    mco_trans_start(db, MCO_READ_WRITE, _&t);
    ...
    if ( (mco_trans_commit_phase1(t) == MCO_S_OK) && global_transaction() == SUCCESS )	)
    {
        mco_trans_commit_phase2(t);
    }
    else
    {
        mco_trans_rollback(t);
    }
     

Note that the two-phase commit API is not supported when using the MVCC transaction manager with a persistent database.

Transaction Upgrade

After navigating the database to find a desired object or group of objects, sometimes an application will need to update the found object(s). A MCO_READ_WRITE transaction is required to update the database. To allow applications to optimize transaction performance when using the MURSIW transaction manager, a MCO_READ_ONLY transaction can be upgraded to MCO_READ_WRITE by calling mco_trans_upgrade(). This function attempts to elevate a MCO_READ_ONLY transaction to the MCO_READ_WRITE access level.

With MVCC, this call always succeeds.

When MURSIW is used, the upgrade can either succeed, in which case MCO_S_OK is returned, or fail with the MCO_E_UPGRADE_FAIL error code. This error code indicates that the runtime was unable to upgrade the transaction because another upgrade has been requested and granted by another MCO_READ_ONLY transaction - MURSIW only allows a single MCO_READ_WRITE transaction at a time.

In order to guarantee that the upgrade will be successful, the transaction should be started as a MCO_UPDATE transaction ("read with intent to update") instead of MCO_READ_ONLY as demonstrated in the following pseudo-code snippet:

 
    mco_trans_h t;  // transaction handle
 
    mco_trans_start(db, MCO_UPDATE, MCO_TRANS_FOREGROUND, &t);
    read_data(t);
    if (some_condition() == TRUE)
    {
 
        // it is necessary to make modifications in the database
        rc = mco_trans_upgrade(t);
        if ( MCO_S_OK == RC ) 
        {
            // the transaction is READ_WRITE now
            write_data(t);
            rc = mco_trans_commit(t);
        } 
        else 
        {
            // upgrade failed – take appropriate action
            rc = mco_trans_rollback(t);
        }
    }
     

The MURSIW transaction manager uses a queue to process transactions. When mco_trans_upgrade() is called in the context of the MCO_UPDATE transaction, the upgrade is guaranteed to succeed. The actual logic for upgrading transactions in MURSIW is quite complex; see here for a detailed explanation.

Getting the Transaction Type and Error State

Sometimes an application module that performs an update may be called from different points in the application. If passed a transaction handle, this function might need to first determine if the transaction is MCO_READ_WRITE before proceeding with the update. The function mco_trans_type() serves this purpose.

If an error occurs during a transaction, the transaction enters an error state and subsequent operations within that transaction will return MCO_E_TRANSACT. In this case, to obtain the error code of the operation that initially caused the error condition, C applications call function mco_get_last_error().

Examining the content of a transaction

The mco_trans_iterate() function provides C applications the ability to iterate over all modifications made by a transaction, read the modified objects, and determine what kind of modifications were applied (new, update, delete). The function signature is:

 
    extern MCO_RET mco_trans_itertate (mco_trans_h trans, 	
                        mco_trans_iterator_callback_t callback, 
                        void* user_ctx);
     

The second parameter is an application-defined callback function that inspects each object, and determines whether it is subject to an external transaction, etc. The callback receives a handle to the modified object, the class_id of the object, the opcode of the modification operation and some application-specific context user_ctx (anything that the application needs to pass into the callback). The callback will return MCO_S_OK to indicate that the application can continue iterating through the transaction, or any other value to indicate a problem, in which case the application rolls the transaction back. This function is especially useful when used together with the two-phase commit as demonstrated in the following code snippet:

 
    mco_trans_start(db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &trans);
    ...
    rc = mco_trans_commit_phase1(&trans);
    if (rc == MCO_S_OK) 
    {
        /* commit to external database */
        rc = mco_trans_iterate(&trans, &my_iterator_callback, my_iterator_context);
        if (rc == MCO_S_OK) 
        {
            /* external commit succeeded */
            mco_trans_commit_phase2(&trans);
        } 
        else 
        {
            mco_trans_rollback(&trans);
        }
    }
     

Note that a transaction may read, insert, update or delete a single object or many objects, even thousands, depending on the application’s needs. When processing blocks of objects in a single transaction it might happen that an object is deleted, but then before committing the transaction the same object is accessed again. If this occurs the read (get) or update (put) operation, or any other access such as locating the object in a cursor or generating XML on the object, will cause a fatal error code of MCO_ERR_OBJECT_HANDLE+N where N is a line number in the source code that identifies the exact point where the invalid handle was detected. In order to avoid this fatal error C applications can call the function mco_is_object_deleted() to determine if the object was deleted within the current transaction.

Pseudo-nested Transactions

Nested transactions might be necessary when two different application functions may be called separately or call each other. To facilitate transaction nesting eXtremeDB allows a C application to call mco_trans_start() or mco_trans_start_ex() before the current transaction is committed or aborted. The eXtremeDB runtime maintains an internal counter that is incremented each time mco_trans_start() or mco_trans_start_ex() is called, and decremented by mco_trans_commit() and mco_trans_rollback(). A transaction commit in an inner transaction does not perform any actions except to reduce the nested transaction counter, and the transaction context remains valid until the outer transaction performs a commit or rollback. The runtime will not actually commit the transaction until the counter reaches zero.

If an “inner” transaction calls mco_trans_rollback(), the transaction is put into an error state, and any subsequent calls to modify the database in the scope of the outer-most transaction will return immediately. Object handles become invalid and a subsequent attempt to use them will return an error.

Outer and inner transactions will be assigned the stricter transaction type without requiring the application to upgrade the transaction type; each transaction code block should simply call mco_trans_start() with the appropriate transaction type for the operation being performed within its own body. Note, however, that the inner block’s mco_trans_start() might fail in the same manner as described for mco_trans_upgrade() above. The following code snippet illustrates a nested transaction implementation:

 
    /* Schema definition for class ‘BankTransaction’ */
    class BankTransaction 
    {
        unsigned<4> from;
        unsigned<4> to;
 
        hash<from> hFrom[10000];
        nonunique hash<to> hTo[10000];
    };
     
    /* insert two BankTransaction records  */
    int insert_two(mco_db_h db, uint4 from1, uint4 to1, uint4 from2, uint4 to2)
    {
        MCO_RET rc;
        mco_trans_h t;
        BamkTransaction  b2;
         
        rc = mco_trans_start(db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &t);
        if ( MCO_S_OK != rc )
            return 0;
 
        /* call nested transaction in insert_one() to insert first object */
        insert_one(db, from2, to2 );
        /* insert second object */
        rc = BankTransaction_new(t, &b2);
        if ( MCO_S_OK != rc )
        {
            mco_trans_rollback(t);
            return 0;
        }
 
        /* put values in first 'new' object */
        BankTransaction_from_put(&b2, from1);
        BankTransaction_to_put(&b2, to1);
 
        /* now commit the transaction to complete the insert of the first object */
        return mco_trans_commit(t);
    }
 
    /* insert one BankTransaction record within a read-write transaction */
    MCO_RET insert_one(mco_db_h db, uint4 from, uint4 to )
    {
        MCO_RET rc;
        mco_trans_h t;
        BankTransaction b1;
 
        rc = mco_trans_start(db, MCO_READ_WRITE, MCO_TRANS_FOREGROUND, &t);
        if (rc ) return 0;
 
        rc = BankTransaction_new(t, &b1);
        if ( MCO_S_OK != rc )
        {
            mco_trans_rollback(t);
            return 0;
        }
        BankTransaction_from_put(&b1s, from);
        BankTransaction_to_put(&b1, to);
        return mco_trans_commit(t);
    }
 
    int main(int argc, char* argv[])
    {
        MCO_RET rc;
        mco_db_h db;
        ...
        /* perform a simple nested transaction... */
        uint4 from1 = 11, to1 = 16, from2 = 7, to2 = 17;
         
        rc = insert_two(db, from1, to1, from2, to2);
        ...
    }
     

Note that if the transaction type in module insert_two() had been MCO_READ_ONLY, the nested transaction in insert_one() would automatically promote the transaction type to MCO_READ_WRITE causing the outer transaction to complete successfully even though it would otherwise fail on the attempt to instantiate a new object (the line rc = Transaction_new(t, &trans) ) within a MCO_READ_ONLY transaction.

Unfortunately the C language provides no safe way of enforcing the scope of a transaction. Consequently applications can make the mistake of not closing transactions, unintentionally creating pseudo-nested transactions. Debugging unclosed transactions can be challenging. For this reason, eXtremeDB can provide two additional libraries to aid developers in tracing transaction start and close calls within their source code. These libraries and methods for using them to aid debugging are available by request from McObject Support.