Scheduling transactions in a real-time database system is not a simple task. The database must guarantee the database's logical consistency, and schedule transactions to meet their deadlines while minimizing the number of transactions that miss their deadlines. There are several scheduling policies that use different criteria to prioritize transactions.
We provide two alternative implementations:
High Priority Earliest Deadline First (EDF) algorithm: transactions are sorted in the queue based first on their priority and then within the same priority - by deadline. The corresponding transaction manager is based on MURSIW.
Priority Inheritance (PI) algorithm: the transaction manager (PI-TM) relies on the real-time operating system's (RTOS) scheduling and provides necessary hints via specific usage of OS synchronization primitives. A fixed-size set of synchronization primitives is allocated at the time a database is created. These synchronization primitives are used as mutexes: Upon the start of a transaction, the thread grabs one or more mutex (one for a Read-Only transaction, chosen randomly; all mutexes for a Read-Write transaction, in the order that prevents deadlocks) and releases them when the transaction ends. Thus, the OS scheduler is fully aware which thread holds a mutex that is needed for a high-priority transaction and may apply "priority inheritance" when needed and if available (for example: in Linux and FreeRTOS™), i.e., temporarily raise the priority of a thread that is "in the way" to let it finish the transaction "faster" so it can release the "highly needed" mutex. See "EDF vs PI-TM" below for more explanation.
The number of synchronization primitives is set by mco_db_params_t.max_pi_readers
and effectively defines the maximal number of Read-Only transactions running in parallel. The default value is 1 meaning that PI-TM will behave as EXCL (mcotexcl
).
It is generally recommended to choose a value for max_pi_readers
no greater than the number of CPU cores.
If many READ-ONLY transactions are started, max_pi_readers
of them will work, the rest of them will wait until at least one working transaction finishes.
PI-TM (mcotmpi
) is available on Linux, LynxOS-178®, FreeRTOS, embOS, INTEGRITY™ and VxWorks™.
Once scheduled, a transaction's execution is controlled by the transaction manager that ensures proper serialization ("read-write" transactions are executed sequentially, "read-only" transactions are executed in parallel while no "read-write" transactions are executing)
The preemption rules are as follows:
A higher priority Read-Write transaction preempts all running lower priority Read-Only transactions (causing them to rollback, first) unless there is also a running Read-Only transaction with even higher priority than this read-write transaction has. In this case, the Read-Write transaction will be placed at the appropriate location in the queue according to its priority and deadline.
[Note: With PI-TM, if the higher priority Read-Only transaction completes before the lower priority Read-Only transactions that are executing in parallel, the higher priority Read-Write transaction will preempt the remaining Read-Only transactions. With MURSIW, once a transaction placed in the queue it will just wait its turn.]
A higher priority Read-Only or Read-Write transaction preempts a lower priority Read-Write transaction (causing it to rollback, first) if the elapsed running time (and, consequently, the time of termination via rollback ) of the running transaction is less than the new transaction's deadline. In other words, if the time required to preempt the running transaction, including rolling it back, would preclude the higher priority transaction being able to commit before its deadline, then the currently running transaction will not be preempted because the result would be two rolled back transactions.
The eXtremeDB/rt transaction manager has many verification checkpoints at which a transaction's elapsed time is tested against the deadline. The frequency of the verifications eliminates the possibility of going beyond the set deadline. If the control point is reached (the transaction used up the allotted time slice), the transaction is assigned a special "transaction interrupted" status (MCO_E_INTERRUPTED) and control is returned to the application. The application is then expected to rollback the transaction. The transaction manager ensures that all database runtime internals are in a "recoverable" condition, and that a subsequent transaction rollback will restore the database to the consistent state that existed prior to the start of the transaction. Furthermore, the transaction manager guarantees that the rollback is completed within the deadline, provided that the application initiates the rollback when signaled to do so by the database runtime. Thus, the transaction would miss the deadline but not be "late", and the internal consistency of the database is preserved.
PI-TM is advantageous in the situation with a high priority thread (let's call it H_DB) that performs transactions with the database, a low priority thread that also performs transactions (L_DB) and a mid-priority thread that does not work with the database (M).
Let's consider the following scenario. L_DB starts a transaction. After that M comes in and (having a higher priority) preempts the L_DB taking it "off the CPU" and putting it on hold. Next, H_DB arrives but is unable to start a transaction because L_DB is in the way. Essentially, with EDF the highest priority thread H_DB will have to wait until the lowest priority thread L_DB completes its transaction. However, with PI_TM the operating system will be able to raise L_DB's priority up to H_DB's priority at the moment when H_DB arrives allowing L_DB to preempt M, allowing L_DB to complete its transaction and free the way for H_DB.
The following heuristics should be taken into consideration for PI-TM used for CPU-intensive transactions:
- The number of initially allocated synchronization primitives should be the same as the number of CPU cores. (It does not make much sense to allow more Read-Only transactions than can be run simultaneously using different cores as this would just make each transaction longer.)
- The greater the number of allocated PI-TM synchronization primitives, the harder it is to start a Read-Write transaction (the Read-Write transaction potentially must wait until every competing Read-Only transaction releases its primitive). Therefore, in practice the PI-TM is applicable to hardware configurations with 1-4 CPU cores.
When the following option is set
db_params.mode_mask |= MCO_DB_PREEMPTIVE;
a higher priority (in the sense of MCO_TRANS_PRIORITY
) transaction will preempt active lower priority transactions terminating them immediately with MCO_E_INTERRUPTED
.
This option is applicable both to EDF and PI transaction managers.