Memory Devices

C and C++ applications specify storage devices at runtime via the array of mco_device_t structures: the argument (named devs in most SDK samples) passed to mco_db_open_dev().  Typically, an array of device structures is stack-allocated and initialized prior to calling mco_db_open_dev(). Each memory device is defined by a mco_device_t structure that specifies:

type

The type of memory region; must be one of the following values defined in mco.h:

MCO_MEMORY_CONV Conventional (non-shared) memory
MCO_MEMORY_NAMED Named (shared) memory
MCO_MEMORY_INT_DESC Some operating environments address shared memory regions through an integer descriptor. This option accommodates these operating systems through the memory device
MCO_MEMORY_FILE A file device
MCO_MEMORY_MULTIFILE A multi-file device
MCO_MEMORY_RAID A RAID device
MCO_MEMORY_CYCLIC_FILE_BUF A pair of files used to implement a cyclic buffer. This option is intended to overcome the limitation of some file systems that do not provide fsync() or some equivalent file system call to make all changes persistent. It is used with the MCO_COMMIT_DELAYED transaction commit policy where log_params.max_commit_delay is set to the persistent media manufacturer’s specifications.  (See Writing Data to Persistent Storage for explanation of the meaning of these transaction policies).

 

assignment The use for this device; must be one of the following values defined in mco.h:
MCO_MEMORY_ASSIGN_DATABASEA memory region for meta-data and user-data, indexes and other database structures
MCO_MEMORY_ASSIGN_CACHEA memory region for the disk manager cache (page pool)
MCO_MEMORY_ASSIGN_PERSISTENTA persistent storage device (can be file, multi-file or RAID). Note that if a persistent device is defined and the application does not explicitly assign disk_page_size (see below) the runtime will assign it the default value of 4096 bytes. If multiple applications or tasks attempt to assign different disk page sizes for the same database the runtime returns an error code.
MCO_MEMORY_ASSIGN_LOGA device that contains the database log file
MCO_MEMORY_ASSIGN_HA_ASYNC_BUFAn asynchronous buffer device for eXtremeDB High Availability
MCO_MEMORY_ASSIGN_PIPE_BUFA buffer for an eXtremeDB Transaction Logging pipe


size The size of this device (in bytes). 
dev

A union containing a pointer to pre-allocated memory for conventional or shared memory devices or a filename and flags for persistent storage devices. This union is defined as follows in mco.h:

 
    typedef struct mco_device_t_ {
        unsigned int type;
        unsigned int assignment;
        mco_size_t   size;
        union {
            struct {
                void * ptr;
                int flags;
            } conv;
            struct {
                char name[MCO_MAX_MEMORY_NAME];
                unsigned int flags;
                void * hint;
            } named;
            struct {
                int flags;
                char name[MCO_MAX_FILE_NAME];
            } file;
            struct {
                int flags;
                char name[MCO_MAX_MULTIFILE_NAME];
                mco_offs_t segment_size;
            } multifile;
            struct {
                int flags;
                char name[MCO_MAX_MULTIFILE_NAME];
                int level;
                mco_offs_t offset;
            } raid;
            struct {
                unsigned long handle;
            } idesc;
        } dev;
    } mco_device_t, *mco_device_h;
     

(See the descriptions below for dev parameter settings for different types of memory management. Note that the idesc portion of the struct is reserved for future use, intended for all other custom embedded memory driver implementations. Currently this capability is not used by the runtime.)

Conventional memory parameters

For in-memory (conv) databases, the struct contains a pointer to allocated memory and flags. The ptr element is set to the address of the memory block allocated for the database. The flags element is used internally by the runtime and not used by the application; it can safely be set to zero or ignored by the application.

Shared memory parameters

For a shared memory (named) device, it is necessary to specify a name, flags and a hint address where the shared memory block is located in the operating system shared memory pool. Setting the dev.named.hint parameter to zero causes eXtremeDB to determine the actual shared memory segment address. But this could fail when called from a second process attempting to open the shared database. In this case it is the application's responsibility to provide a valid hint address. (Please see Shared Memory Runtime Options for a detailed explanation of the shared memory hint address implications on Windows and Unix-Linux systems.)

On Unix-Linux systems using the POSIX mcompsx memory driver, the named.flags can be set to MCO_RT_POSIX_SHM_SHARED (for an “all process accessible” database) or MCO_RT_POSIX_SHM_ANONYMOUS (for a database “private” for the process). For all other memory drivers this element is not used and should be set to zero initially.

Persistent storage parameters

For persistent storage devices a name must be specified; for multifile optionally segment_size and flags may be specified; and for raid devices level and offset must be specified and optionally flags. The file open mode is defined by the flags parameter which can have values as follows (different combinations of flags can be or’ed with the pipe ‘|’ operator):

MCO_FILE_OPEN_DEFAULT The standard, unified and fail-safe way to open a file in the manner supported by target platform file system. Depending on the file-system wrapper linked into the target executable it does the following:
mcofcblk.cFile implementation for the KDSA storage system. The code uses a cblk_open() call to create the file descriptor for IO operations.
mcofkdsa.cFile implementation for the KDSA storage system. The code uses xpd_init_device_name(), xpd_connect_via_link() and xpd_set_handle_flags() calls to initialize the device header for successive IO operations.
mcofucos.cFile implementation for the uCOS FAT32 file system; use (FILE_MODE_READ | FILE_MODE_WRITE) for the FILE_OpenShort() call. This opens the file for reading and writing. There is no check for file existence, so if the file exists the content will be lost.

Note the following meanings for the two Windows APIs below:

  • GENERIC_READ: Read access
  • GENERIC_WRITE: Write access
  • FILE_SHARE_READ: share for reading
  • FILE_SHARE_WRITE: share for writing
  • OPEN_ALWAYS: Opens a file, always
mcofw32.cFile implementation for the Win32(NT family). The wrapper calls the CreateFile() routine passing flag combinations (GENERIC_READ|GENERIC_WRITE), (FILE_SHARE_READ|FILE_SHARE_WRITE) and OPEN_ALWAYS. This creates the specified file unconditionally and opens it for write and read access.
mcofwrt.cFile implementation for WinRT. The wrapper calls the CreateFiles2() routine passing flag combinations (GENERIC_READ|GENERIC_WRITE), (FILE_SHARE_READ|FILE_SHARE_WRITE) and OPEN_ALWAYS. This creates the specified file unconditionally and opens it for write and read access.
mcofmem.c"In-memory file"; as this implementation does not open any real files but simulates them with memory there are no file system calls made.
The following file system wrappers share the same POSIX interface to open the requested file. They initialize the file handle by a call to open() with the (O_RDWR|O_CREAT) flags combination. This call creates the file if it does not exist and opens it for reading and writing. Note from POSIX specs: O_RDWR - open for reading and writing O_RDONLY - open for reading only O_CREAT - create file if it does not exist O_LARGEFILE - Allow files whose sizes cannot be represented in an off_t (but can be represented in an off64_t) to be opened.
mcofecos.cFile implementation for eCos
mcofintp.cFile implementation for Integrity OS (POSIX)
mcofose.cFile implementation for OSE Embedded File System. EFS is a file system for the OSE Real Time Kernel.
mcofuni.cFile implementation for Unix without pread and pwrite support
mcofvx.cFile implementation for VxWorks
The following file system wrappers do a similar call to open() as above but in addition they use flag O_LARGEFILE to enable operations on large files (>4G).
mcofu98aio.cAsynchronous file implementation for Unix supporting pread and pwrite (Unix98 standard)
mcofu98.cFile implementation for Unix supporting pread and pwrite (Unix98 standard)
mcofu98zip.cCompressed file implementation for Unix supporting pread and pwrite (Unix98 standard)
mcofu98ziplog.cCompressed file implementation for Unix supporting pread and pwrite (Unix98 standard)
MCO_FILE_OPEN_READ_ONLY Set read-only mode for a file, no modifications of the file will be allowed for eXtremeDB. Depending on the target file system it does the following:
mcofcblk.c, mcofkdsa.c, mcofmem.cDoes nothing, no implementation required.
mcofecos.c, mcofintp.c, mcofose.c, mcofu98aio.c, mcofu98.c, mcofu98zip.c, mcofu98ziplog.c, mcofuni.c, mcofvx.cAdds flag O_RDONLY (POSIX: open for reading only) into flags combination for open() call.
mcofucos.cExcludes FILE_MODE_WRITE from flags combination for FILE_OpenShort() call (leaves FILE_SHARE_READ flag only).
mcofw32.c, mcofwrt.cExcludes GENERIC_WRITE flags from normal (GENERIC_READ | GENERIC_WRITE) flags combination for calling CreateFiles() or CreateFiles2()
MCO_FILE_OPEN_TRUNCATE

Instructs eXtremeDB to truncate the file being opened to zero length before use. Depending on the implementation of file system wrapper:

mcofcblk.c Calls cblk_file_truncate() on the file.
mcofkdsa.c Calls kdsa_file_truncate() on the file.
mcofucos.c Calls FILE_Truncate() on the file.
mcofmem.c No calls required.

mcofecos.c, mcofintp.c, mcofose.c, mcofu98aio.c, mcofu98.c,

mcofu98zip.c, mcofu98ziplog.c, mcofuni.c, mcofvx.c

Adds the O_TRUNC flag (POSIX: truncate size to 0) to flags combination for the open() call.
mcofw32.c, mcofwrt.c Sets flag CREATE_ALWAYS (The WinAPI creates a new file, always) instead of OPEN_ALWAYS (which opens a file, always.) for the call of the CreateFiles() / CreateFiles2() routine.
MCO_FILE_OPEN_NO_BUFFERING

Tells the underlying file system do not cache the file content but run IO operations immediately. Depending on the file system wrapper implementation, it:

mcofcblk.c, mcofkdsa.c, mcofucos.c, mcofmem.c Does nothing - nothing is required
mcofecos.c, mcofintp.c, mcofose.c, mcofu98aio.c, mcofu98.c, mcofu98zip.c, mcofu98ziplog.c, mcofuni.c, mcofvx.c Includes the O_DIRECT flag (POSIX: Try to minimize the cache effects of the I/O to and from this file) into the flag combination for the open() call. For SunOS: calls the directio( DIRECTIO_ON ) routine (SUNOS: provides advice to the system about the expected behavior of the application when accessing the data in the file).
mcofw32.c, mcofwrt.c Adds flag FILE_FLAG_NO_BUFFERING (WinAPI: The file or device is being opened with no system caching for data reads and writes) into the flags combination for the call of the CreateFiles() / CreateFiles2() routine.
MCO_FILE_OPEN_EXISTING

Requires eXtremeDB to open the file only if it exists, eXtremeDB will not create a new file. Depending on the file system wrapper implementation, it:

mcofcblk.c, mcofkdsa.c, mcofmem.c, mcofucos.c Does nothing - nothing is required
mcofecos.c, mcofintp.c, mcofose.c, mcofu98aio.c, mcofu98.c, mcofu98zip.c, mcofu98ziplog.c, mcofuni.c, mcofvx.c Excludes the O_CREAT flag from the initial flags combination for the open() call. This means that open() will open the file only if it exists. No new file will be created.
mcofw32.c, mcofwrt.c Adds flag OPEN_EXISTING (WinAPI: Opens a file or device, only if it exists) into the initial flags combination for the call of the CreateFiles() / CreateFiles2() routine.
MCO_FILE_OPEN_TEMPORARY Instructs the underlying file system to open the file as temporary. Depending on the file system wrapper the implementation, does the following:
mcofcblk.c, mcofkdsa.c, mcofmem.c, mcofucos.c, mcofecos.c, mcofintp.c, mcofose.c, mcofu98aio.c, mcofu98.c, mcofu98zip.c, mcofu98ziplog.c, mcofuni.c, mcofvx.cDoes nothing and works with a file marked as MCO_FILE_OPEN_TEMPORARY as with a typical file.
mcofw32.c, mcofwrt.cSets flag FILE_ATTRIBUTE_TEMPORARY (WinAPI: The file is being used for temporary storage) to the flag combination for the call of the CreateFiles() / CreateFiles2() routine.
MCO_FILE_OPEN_FSYNC_FIX Enables the flush-operation fix for UNIX-98 file system wrappers. The fix confirms that the underlying file system (ext3 file system specifically) really puts the file content modification to the storage. This usage is required for ext3 only and in some rare conditions when it is confirmed that the underlying file system has flush-realted issues.
MCO_FILE_OPEN_SUBPARTITION Works for the RAID-implementation of database persistent storage. This flag specifies that the file system wrapper must take the value of the offset field of the mco_device_t struct into account for all segments of the RAID.
MCO_FILE_OPEN_FSYNC_AIO_BARRIER Works for the AIO (u98aio) file system wrapper. This flag specifies that the wrapper must execute a barrier for flush operations - all IO operations must be finished before the flush operation.
MCO_FILE_OPEN_COMPRESSED Works for the compressed (u98zip and u98ziplog) file system wrappers. This flag specifies that the file must be compressed. Otherwise the file system wrapper will work with the file transparently.
MCO_FILE_OPEN_LOCK Works for the u98 implementation only. This flag instructs the file system wrapper to setup rules for inter-locking the file. The implementation calls the flock() routine (POSIX : A call to flock() may block if an incompatible lock is held by another process.)
MCO_FILE_OPEN_NO_READ_BUFFERING and MCO_FILE_OPEN_NO_WRITE_BUFFERING Works for the u98 implementation only. Controls usage of pre-read and pre-write hints for the underlying file system. The implementation uses posix_fadvise() (POSIX : Programs can use posix_fadvise() to announce an intention to access file data in a specific pattern in the future, thus allowing the kernel to perform appropriate optimizations.) Use these flags only if it is advised by McObject Support .
MAP_HUGETLB Most modern Linux systems allow configuring some part of their virtual memory space to use by huge pages. The huge pages feature enables the Linux kernel to manage large pages of memory in addition to the standard 4KB (on x86 and x86_64) or 16KB (on IA64) page size. When the system needs to access a virtual memory location, it uses the page tables to translate the virtual address to a physical address. Using huge pages means that the system needs to load fewer such mappings into the Translation Lookaside Buffer (TLB), which is the cache of page tables on a CPU that speeds up the translation of virtual addresses to physical addresses. Enabling the huge pages feature allows the kernel to use hugetlb entries in the TLB that point to huge pages. The hugetlb entries mean that the TLB entries can cover a larger address space, requiring many fewer entries to map the memory. On systems with more than 16GB of memory running eXtremeDB databases, enabling the huge pages feature can improve database performance. Specifying the MAP_HUGETLB flag (for mmap) makes it possible to use the huge page feature.
SHM_HUGETLB This flag has the same effect as MAP_HUGETLB but for eXtremeDB shared memory databases (i.e. for shmget).