eXtremeSQL Sequence Parallelism

eXtremeSQL provides support for the parallel processing of sequences implemented through a thread pool in the SQL engine that can optimize performance for sequence operations in multi-processor environments. The thread pool allows sequence functions to split sequences into equal parts across threads and perform calculations on the threads in parallel reducing the total execution time.

Thread pool size

The size of the thread pool can be regulated by the environment variable MCO_CPU_NUMBER. If no MCO_CPU_NUMBER environment variable is present, then the total number of CPU cores on the system is used as the pool size. If the hardware allows Hyper-Threading (HT) - and most modern CPUs do - the number of threads in the pool is equal to the number of HT cores (normally two per physical core). So if the machine has 4 real cores, the number of threads is 8. This can actually hurt performance sometimes.

Enabling parallelism

The xSQLconfiguration file option sql_use_connection_pool can disable the use of the thread pool. Note that the default value of this parameter is true (meaning the thread pool is created on startup).

At runtime the SQL function seq_parallelism() can be used to limit the number of threads utilized to run queries. The function accepts a number of threads to use as a parameter and returns the total number of threads in the pool. For example:

     
    XSQL>select seq_parallelism(4);
    #1
    ------------------------------------------------------------------------------
    4
     
    Selected records: 1
     

Note that function seq_parallelism() affects the entire server, not just a single client session. So seq_parallelism() should not be executed from different clients running concurrently. Also note that if the number of parallel threads requested by seq_parallelism() exceeds the size of the pool specified in the environment variable MCO_CPU_NUMBER (the default is the number of available cores), the SQL engine will use the number specified in MCO_CPU_NUMBER.

Thread pool management on Windows

Support for the parallel processing of sequences is implemented through the class ThreadPool, which instantiates a global variable ThreadPool. Thus the destructor for the ThreadPool class is called upon exit from the main() automatically. The destructor obviously frees up all resources associated with the pool: memory, synchronization primitives, and tasks. This implementation presents a problem with the Windows C++ runtime. The Windows C++ runtime is implemented in such a manner that it stops all threads except for the main thread when the application exits from its main(). The C runtime then calls the destructors for all global objects, which in the case of the ThreadPool tasks is too late - those tasks were already stopped by the ThreadPool destructor.

A partial solution has been implemented to simply not call the ThreadPool destructor (only when running on Windows). It is remotely possible to introduce a resource leak in a scenario in which the database is created in the context of the xSQL server (or a .NET Framework or JNI server application) that loads the database DLL (mcoxsql.dll or mcoxsql_debug.dll) dynamically at runtime. If the database makes use of sequences and the server reloads the DLL multiple times, each time a resource leak is created.

A more resilient solution for this problem is forthcoming in a future release, but at present Windows developers should be aware of this potential problem.