Database Recovery from Failed Processes in C

As explained in the Database Recovery page, eXtremeDB provides the “sniffer” utility to allow C and C++ applications to detect and remove “dead” connections.

Using the Sniffer API

Because there is no system-independent way to detect when a process has failed, the “sniffer” API mco_db_sniffer() is provided. Usually mco_db_sniffer() will be called periodically in a separate thread or from specific places in the application to check for “dead” connections. A user-supplied callback function is then called by mco_db_sniffer() to actually detect if a given connection is “alive”, and if not to terminate it.

To perform this check, some identifying information (typically the process identifier) is added to each connection context with code like the following:

 
    int pid ;
    #ifdef _WIN32
    pid = GetCurrentProcessId();
    #else
    pid = getpid();
    #endif
    ...
 
    mco_db_connect_ctx(dbName, &pid, &db);
     

Note that it is also necessary to specify the size of this connection context in the database parameters passed to mco_db_open_dev(). For example:

 
    db_params.connection_context_size  = sizeof(int);
     

The “sniffer callback” function could then be implemented as follows:

 
    MCO_RET sniffer_callback(mco_db_h db, void* context, mco_trans_counter_t trans_no)
    {
        int pid = *(int*)context;
        #ifdef _WIN32
            HANDLE h = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, pid);
            if (h != NULL) 
            {
                CloseHandle(h);
                return MCO_S_OK;
            }
        #else
            if (kill(pid, 0) == 0) 
            {
                return MCO_S_OK;
            }
        #endif
        printf("Process %d is crashed\n", pid);
        return MCO_S_DEAD_CONNECTION;
    }
     

If the user callback function returns MCO_S_DEAD_CONNECTION, recovery will be performed for this connection. Now mco_db_sniffer()iterates through database connections and will call the user callback function depending on the policy specified (third parameter). The possible values for this policy parameter are as follows:

A “watchdog” thread could then be implemented in the application as follows:

 
    THREAD_PROC_DEFINE(sniffer_loop, arg)
    {
        mco_db_h db;
        int pid = get_pid();
        mco_db_connect_ctx(dbName, &pid, &db));
        while (1) 
        {
            mco_db_sniffer(db, sniffer_callback,
                    MCO_SNIFFER_INSPECT_ACTIVE_CONNECTIONS));
            sleep(SNIFFER_INTERRVAL);
        }
        mco_db_disconnect(db);
        THREAD_RETURN(0);
    }
     

Recovery actually consists of two stages. In the first stage the dead connection is “grabbed”. Each connection has private (process specified) pointers which must be adjusted to be used in the context of the process performing recovery. In the second stage, internal functions are called to rollback any transactions that might have been in progress and to release the dead connections’ data structures. (Please see SDK sample 19_recovery_sniffer for an example.)

NVRAM database support and recovery

eXtremeDB allows C and C++ applications to re-connect to databases created in non-volatile memory (NVRAM, or battery-backed RAM) after a system restart, or similar activities. The database can be created either in conventional or shared memory. If the database is corrupted, the eXtremeDB runtime makes an attempt to recover the database based on the content of the memory buffer specified in the call to mco_db_open_dev().

In order to reconnect to a database in NVRAM, the application specifies the memory device to mco_db_open_dev() and sets flag MCO_DB_OPEN_EXISTING as a parameter (in the mco_db_params_t.mode_mask argument). For example:

 
    mco_db_params_t db_params;
    ...
    mco_db_params_init(&db_params);
    ...
    if (...) 
    {
        db_params.mode_mask |= MCO_DB_OPEN_EXISTING;
    }
    ...
    rc = mco_db_open_dev(db_name... , &db_params);
     

The database runtime performs the necessary steps to ensure consistency of the database metadata and the database content. If mco_db_open_dev() returns MCO_S_OK, the application can proceed to connect to the database normally by calling mco_db_connect().

Note that database recovery can fail under certain conditions (such as application errors that corrupt the database runtime metadata). If recovery fails, mco_db_open_dev() returns an error code. (Please refer to the “Recovery from failed processes” section above for further discussion about eXtremeDB recovery procedures. Also refer to the SDK sample 02-open_nvram).