Security and the Cortex-M MPU, part 2: MPU multitasking

By Ralph Moore


Micro Digital

November 23, 2016

Security and the Cortex-M MPU, part 2: MPU multitasking

As noted in my previous blog, Security and the Cortex-M MPU, embedded systems are being drawn more into the Internet of Things (IoT), and consequently...

As noted in my previous blog, Security and the Cortex-M MPU, embedded systems are being drawn more into the Internet of Things (IoT), and consequently, security in the form of protection of critical system resources is becoming increasingly important. Furthermore, effective protection can only be achieved via hardware means.

The Cortex-M Memory Protection Unit (MPU) is difficult to use, but it is the main means of hardware memory protection available for Cortex-M processors. These processors are in widespread use in small- to medium-size embedded systems. Hence, it behooves us to learn to use the Cortex-M MPU effectively in order to achieve the reliability, security, and safety that modern embedded systems require.

This is the second in a series of blogs presenting a software approach to satisfy the need for security using an MPU. A point of emphasis in this approach is to make the MPU simple to use but consistent with strong security (if the MPU is too hard to use, the cure could be worse than the disease). In this blog we examine an effective means to support multitasking with the MPU. Read the previous blog for general orientation and definitions of terms used herein.

Task control

As shown in the Figure 1, there is a Task Table consisting of a task control block (TCB) for every task created. This table is not in a fixed order, but rather in the order in which tasks are created.

[Figure 1 | The diagram shown here depicts how various task control blocks (TCBs) are mapped to a memory protection array (MPA) blocks, which are loaded into a memory protection unit (MPU) when tasks begin or resume, defining an MPU region for each task.]

Also shown is a memory protection table ( mpt[]). This table has a memory protection array (MPA) for each task. The MPAs are in the same order as the TCBs in the task control table, and each TCB contains an index into mpt[] to access its MPA. A task’s MPA is loaded into the MPU when the task is started or resumed. Hence, each task has its own regions in the MPU when it is running. This applies to both privileged tasks (ptasks) and unprivileged tasks (utasks).

Each MPA is an array of structures. The size of an MPA array is governed by a configuration constant, MPU_SIZE. Setting MPU_SIZE less than 8 for Cortex-M0/1/3/4 devices, or 16 for a Cortex-M7 reduces the memory used and task switching time, if fewer regions are needed. Unused MPU slots are set to 0 during initialization, so they are inactive. An element of an MPA array is a structure consisting of two 32-bit fields, named rbar and rasr. These are exact copies of the MPU RBAR and RASR registers in each MPU slot, except that the valid flag is set in rbar, but not in RBAR.

When a task switch occurs, its MPA is copied to the MPU. By using the alias regions of the MPU, copying is done up to four regions at a time. This plus using a reduced MPU_SIZE results in an overhead in task switching time of about 25 percent. This switching overhead applies to all task switches, whether the MPA changes or not and whether the task is a utask or a ptask. Normally, each MPA has at least one dynamic region, the task stack. Other dynamic regions are planned in the future. Thus, loading the MPU on every task switch is necessary, in most cases. In addition, it provides uniformity and universality in order to simplify MPU usage.

MPA templates

MPA templates determine the initial contents of MPAs. As shown, a template may be shared between MPAs. This would be the case for a group of tasks associated with the same application function (e.g., networking). Alternatively, a task may have its own template, as also shown in Figure 1. An example of this would be a utask that is isolated from all other tasks (i.e., it can access only its own code and data regions.)

After a task has been created, its MPA is loaded with an MPA template as follows:


const MPA mpa_temp_ut2a = {

/*0*/ { V | 0, RW_DATA},

/*1*/ {RA(“ut2a_data”) | V | 1, RW_DATA | RSI(“ut2a_data”) | EN},

/*2*/ {RA(“ut2a_code”) | V | 2, UCODE | RSI(“ut2a_code”) | EN},

/*3*/ {RA(“ucom_code”) | V | 3, UCODE | RSI(“ucom_code”) | EN},


The macros used above are defined as follows:


#define RA(“s”) ((u32)__section_begin(“s”)) /* region s address */

#define RSI(“s”) (30 – __CLZ(__section_size(“s”)) << 1) /* region s size index */

The above template has only four regions, because MPU_SIZE = 4. In the above array of structures, the first structure field is rbar (between the { and the ,). Looking at rbar for region 1 (/*1*/), we see that it starts at ut2a_data, the V (valid) flag is set, and region 1 is selected (the V flag is necessary to override the MPU’s RNR register in order to perform rapid loading, as described above). The second structure field is code>rasr (between the , and the }). Looking at rasr for region 1, we see that it is a RW_DATA region, its size index follows, and EN means that the region is active. The size index is the power-of-two exponent – 1 for the region (e.g., 4 for a 32-byte region). The RSI() macro automatically calculates the size index. This helps to reduce errors.

Region 2 is similar, except that it starts at ut2a_code and it is a code region; region 3 is also a code region and it starts at ucom_code. This region contains the system service routine (SSR) shell functions, which will be discussed in a future blog; it also contains other common functions needed by utasks. Region 0 is the protected task stack region. It is loaded dynamically by the scheduler and thus only its attributes are part of the template for region 0 (see the “Task stacks” section below for more discussion of region 0).

Before task ut2a starts running, the above template is loaded into the MPU. As a consequence, ut2a can access only the regions shown and only as permitted by the region attributes. If it tries to access an address outside of the four regions, a memory manage fault (MMF) will occur. Or, for example, if it tries to execute code from the data region or from the stack (a common malware ploy), an MMF will occur. Handling of MMFs will be discussed in a future blog.

Region attributes are defined as follows:


#define RW_DATA XN | RW

#define UCODE RO

XN, RW, and RO are MPU attributes that mean execute never, read/write, and read-only, respectively. These and other attributes are defined in Yiu. Hence ut2a_data cannot be executed, and ut2a_code cannot be written by the ut2a task. Nor can ut2a access system code or data, nor the code or data of other tasks unless it shares a region with them, such as ucom_code. As a result, system code and mission-critical code are protected from ut2a, which may be running untested code, software of unknown pedigree (SOUP), or code vulnerable to malware, such as networking code. This level of protection is as good as that achievable with a memory management unit (MMU).

ut2a_data, ut2a_code, etc. are block names defined in the linker command file, which will be discussed in the next blog.

MPA template design

Template design is dependent upon the security strategy for the system. For example, the strategy might be to just isolate utasks from all pcode. In that case, a single ucode template for all utasks would suffice, and pcode would use the MPU background mode with possibly a pcode template. On the other hand, it might be desirable to isolate groups of utasks from each other or even individual utasks from other utasks. In both cases, utasks would also be isolated from pcode.

Note that the MPU is of value for ptasks as well as for utasks in that it allows assigning attributes to specific regions. The background mode is considered to have priority -1. Hence all MPU regions take priority over it, as long as they do not violate default memory attributes (see Yiu). Even though pcode and ptasks are trusted, they are still susceptible to bugs and to single event upsets (e.g., high-energy neutrons from gamma rays).

The foregoing illustrates the need for advanced region planning in new designs, even though MPU support need not be implemented until later in the project. It also illustrates potential problems for porting legacy code to the MPU. These problems may require code restructuring or ingenuity in defining MPA regions. However, these are not necessarily big efforts and they are well worthwhile if increased security is needed.

Task stacks

MPU region 0 is reserved for protected task stacks. Protected task stacks must come from the stack pool so that they can be correctly sized and aligned as required by the MPU. Stack pool stacks are allocated to tasks when they are dispatched by the scheduler. Following stack allocation, MPA[0] is automatically created and loaded by the scheduler. These stacks are not only protected, but also, stack overflows are detected and reported as MMFs as soon as they occur. This can be very helpful while debugging.

The stack pool stack size is controlled by STACK_SIZE, which must be a power of two from 32 bits to 4 GB. Stack alignment is controlled by STACK_ALIGN, which aligns the stack pool and must be a multiple of STACK_SIZE. If a task is using a heap stack, the MPU[0] slot is inactive. In this case, the task stack will be located in the region containing the heap and will be subject to that region’s attributes. Overflow of a heap stack does not result in an MMF, but normal real-time operating system (RTOS) overflow detection applies.

Next up

The blogs that will follow are:

Ralph Moore, President and Founder of Micro Digital, graduated with a degree in Physics from Caltech. He spent his early career in computer research, then moved into mainframe design and consulting.

Micro Digital

[email protected]

Ralph Moore, Micro Digital

I am no longer running the daily business at Micro Digital. Instead, I have been involved for the past four years in improving the smx RTOS kernel. smx is a hard-real-time multitasking kernel, which is intended for embedded systems that require high efficiency and high performance.

More from Ralph