Modern SoC platforms often have heterogeneous remote processor devices in asymmetric multiprocessing (AMP) configurations. Two widely accepted architectures are NXP®'s i.MX 8 QuadMax and i.MX 8 QuadPlus series that include varying counts of ARM® Cortex®-A and Cortex-M CPU cores. Another popular alternative includes STM’s STM32MP157F devices that offer dual-core ARM® Cortex®-A7 in combination with a Cortex®-M4 32-bit RISC core.
The heterogeneous multicore architecture allows the offloading of critical hard real-time tasks to the Cortex-M processors for extremely low latency processing, while using the Cortex-A cores for high-performance tasks.
Different CPUs usually run instances of different operating systems, e.g., Linux on the A cores and real-time OS (FreeRTOS, MICROSAR, etc.,) on the M core(s).
The shared memory regions integrated with the AMP hardware are directly accessible allowing for communications and/or message passing between the “clusters” (a cluster refers to several cores capable of independent instruction execution and running a separate operating system). Traditionally, the shared memory was used for communications / message passing only. Yet due to the high complexity and critical nature of applications utilising the i.MX8 hardware, and the amount of external memory, real-time data can be shared between the Cortex-A and Cortex-M cores.
A storage subsystem in the AMP systems can be organised in two ways:
eXtremeDB/rt provides the implementation of the AMP shared memory database approach which has the following benefits:
AMP shared memory database storage is described by a MCO_MEMORY_AMP
memory device and a corresponding element of mco_device_t::dev
union
struct {
unsigned int flags;
char *address;
char *hint;
} amp;
MCO_RT_AMP_SHM_NONCACHE
. The latter indicates that CPU caching mechanism for the database shared memory region should be disabled (if possible). Typically, disabled caching affects performance negatively. So, by default, caching remains on and eXtremeDB/rt invalidates it at the beginning and the end of a transaction to maintain data integrity. This should be beneficial for long transactions. However, for short transaction, disabled caching (via MCO_RT_AMP_SHM_NONCACHE
) may be preferable.
Important note: MCO_DB_AMP_CACHE_SYNC
must be added to mco_db_params_t::mode_mask
to ensure aforementioned invalidation when caching is disabled (flags == 0) and there is no hardware-managed cache coherency.
Unlike SMP systems, AMP clusters are controlled by different operating systems. Therefore, from the standpoint sharing of data across clusters, the AMP system can be viewed as a distributed network in which "nodes" communicate with each other via shared memory. It follows that a mechanism is needed to synchronize access to metadata structures and consequently shared storage that is independent of each node’s operating system. We call this mechanism a Distributed Semaphore. The distributed semaphore is the component that controls multiple tasks’ synchronization across the RTOS running atop of Cortex-M cores and Linux running on top of the Cortex-A.
On Linux the distributed semaphore is implemented via the kernel module. Change the working directory to target/sal/sync/rpmsg_sem/, adjust makefile to set correct paths to kernel sources and cross-compiler, run "make" to build the module. Load it via "insmod mco_rpmsg_sem.ko".
On FreeRTOS the distributed semaphore is implemented via library rpmsg_lite. It should be linked in.
See samples/amp