MPU Security Part 1: Introduction

By Ralph Moore


Micro Digital

April 17, 2019


MPU Security Part 1: Introduction

This is the first part of a four-part series of articles presenting a unique product MPU?Plus? and a methodology for using the Cortex-M Memory Protection Unit (MPU) to achieve improved MCU security.

This is the first part of a four-part series of articles presenting a unique product MPU‑Plus® and a methodology for using the Cortex-M Memory Protection Unit (MPU) to achieve improved Microcontroller Unit (MCU) security.

MMUs vs. MPUs

For decades, large OSes have used isolated processes on microprocessors with MMUs to achieve separation and system security. Our recently enhanced product, MPU‑Plus adds similar capabilities to SMX RTOS using a microcontroller with an MPU. This has been done for the Cortex-M architecture and will be extended to others in the future.

Memory Management Units (MMUs) are used with full Operating Systems (OSs) to provide isolated virtual memories to processes. Processes are independently compiled and linked, then individually loaded and run by OSs like Windows and Linux. Process to process isolation is good enough that if a process becomes infected by malware or starts malfunctioning for any other reason, the OS can usually shut it down with little or no damage to other processes. Hence, security is good. The use of MMUs is well-studied and well-understood. The downside to MMU usage is that it typically requires high-performance processors and very large memories.

Fast, power-hungry processors and huge memories are not compatible with the requirements for most embedded systems. In addition, full OSs do not have the real-time response times needed for many applications. Consequently, most embedded systems use low-power, moderate-performance Microcontroller Units (MCUs) with Real Time Operating Systems (RTOSs). These systems usually have small memories compared to full OS systems. For security many MCUs offer Memory Protection Units (MPUs), which are not capable of creating virtual address spaces as MMUs do.

In most embedded systems, we deal with partitions rather than processes. The idea is basically the same – partitions include one or more tasks and perform specific functions for a system. Just like processes, it is desirable to isolate partitions from each other so that if one partition is penetrated or begins malfunctioning, it can be stopped with minimal damage to the rest of the system. Unfortunately, this is not as easy to accomplish in embedded systems using MPUs as it is in full OS systems using MMUs. The biggest challenge here is that all MCU software is compiled and linked into a single executable. Also, MPUs impose limitations of their own. The use of MPUs for security is not well-studied, nor well-understood, and there are many complex tradeoffs involved, especially since embedded systems usually have small memories, moderate performance processors, power limitations, and the need for deterministic real-time performance.

Increasing Need for Security

If it is so hard, why bother? This seems to be the approach adopted by the embedded systems industry over the past several decades. Unfortunately, it will not suffice for long because embedded systems are being connected into the Internet of Things (IoT) at a rapid rate. As a consequence, once isolated, defenseless embedded systems are becoming accessible via the Hacker's Highway (AKA the Internet). This means TROUBLE.

Protection Goals

The main goal of protection is to protect trusted, critical software and data from less-trusted, non-critical software, which may become infected with malware or which is buggy. Examples of trusted software are the RTOS, exception handlers, security software (e.g. crypto, authentication, secure boot, secure update, etc.), and mission-critical software. Examples of less-trusted software are code vulnerable to malware attacks such as protocol stacks, device drivers, SOUP, and insufficiently tested code.

The second goal is to detect intrusions and bugs and shut them down so that critical system operation is not imperiled, and sensitive data is not stolen. Dealing with intrusions and bugs may be handled by stopping and restarting penetrated tasks or may require stopping and rebooting the entire system.

The third goal is to minimize the amount of trusted code since it is easier to carefully write small amounts of code. Running more code in unprivileged mode increases partition isolation, thus improving reliability.

The degree of protection that must be implemented depends upon the security and safety requirements of the specific system and the threats to which it is exposed. MPU-Plus provides a range of security services that enable achieving a level of protection appropriate for a given system. In addition, protection can be increased in future releases as a system becomes more widely distributed and therefore more vulnerable. MPU-Plus is designed to foster progressive protection improvement.

MPU-Plus Snapshot

The main purpose of MPU-Plus is to enhance the security of multitasking systems based upon the SMX® RTOS. The main things MPU-Plus does to help achieve better security are to:

  1. allow defining different MPU templates for each task;
  2. handle MPU switching during task switches;
  3. provide an SVC API to allow unprivileged code to call privileged services;
  4. limit which services can be called from unprivileged code and what they can do;
  5. allow allocation of protected blocks and messages;
  6. run the SMX RTOS and system code in privileged mode and middleware and application code in unprivileged mode;
  7. allow tasks to be privileged or unprivileged.
  8. serve as a platform upon which to build security.

MPU-Plus had two principal design goals:

  1. Easier conversion of legacy code to use the MPU and the SVC API.
  2. Allow developers to focus on protection strategies without being tripped up by MPU peculiarities and other hardware details.

Implementing a good protection strategy is difficult enough. Excessive complication at the detailed level is not only frustrating but may result in adopting less-than-ideal system protection.


The Cortex-v7M processor architecture was introduced in 2005 and was intended for medium-size embedded systems. Since then, thousands of different Cortex-v7M Microcontroller Units (MCUs) have been developed by the semiconductor industry; they are used in tens of thousands of products developed by device manufacturers; and billions of chips have been shipped to date. It is by far the dominant MCU architecture (70 percent market share) and hence the one we support. If you are not familiar with it, the following references will be helpful:

  1. The Definitive Guide to ARM Cortex‑M3 and Cortex‑M4 Processors, Memory Protection Unit chapter, by Joseph Yiu, Elsevier Inc, 2014.
  2. ARM v7-M Architecture Reference Manual, Arm Ltd. 2014, Chapter B3.5 Protected Memory System Architecture, PMSAv7.
  3. ARM Platform Security Architecture Overview, Revision 1.2 Arm Ltd. 2018. See for security term definitions and security discussion.

The Cortex‑v7M architecture offers the following security features:

  1. Privileged Mode.
  2. Supervisor Call (SVC) Instruction.
  3. Memory Protection Unit (MPU).

The first is implemented via the three modes of processor operation:

  1. Handler Mode: Privileged mode for ISRs, fault handlers, the SVC handler, and the PendSV handler. This mode can be entered only via an interrupt or an exception.
  2. Privileged Thread Mode: Privileged tasks run in this mode. It can be entered only from handler mode, by setting CONTROL.nPRIV = 0.
  3. Unprivileged Thread Mode: Unprivileged tasks run in this mode. It can be entered from either of the above two modes, by setting CONTROL.nPRIV = 1.

To reduce wordiness in the discussions that follow, we use an abbreviated terminology as follows: the first two modes are collectively called pmode and the third mode is called umode. The p prefix can be interpreted either as privileged or protected; the u prefix can be interpreted as either unprivileged or user. Code and tasks that run in pmode are called pcode and ptasks; code and tasks that run in umode are called ucode and utasks

The critical aspect of the Cortex-M architecture is that pmode can be entered only via an interrupt or an exception. The SVC N instruction causes such an exception. It can be executed from umode with an 8-bit argument, N. This allows making a system call from umode where N specifies the function to call. The called function executes in pmode. SVC can also execute from pmode.

The MPU provides M slots for M regions. Each region has a starting address, a size, and access parameters, such as Read-Only (RO), Read/Write (RW), eXecute Never (XN), etc. If a memory access is not permitted by at least one region in the MPU, a Memory Manage Fault (MMF) is generated. The MMF is an exception that causes the MMF Handler to run, which normally stops the offending task and determines what to do in order to recover.

Figure 1 shows an example MPU with only four slots. (Most MPUs have eight slots, some have 16.) The region stored in MPU[0] is an RO region for task code. The MPU[1] region is an RO region for code that is common with other tasks. The region stored in MPU[2] is a RW and XN region for data. The final region is the task stack, which is also RW and XN. Shown below it is the optional Task Local Storage (TLS) that can be part of the task stack region and which can be used for local task data. Since only a pointer to the TLS is provided, the data must be either a structure or an array. It has the same attributes as the task stack. The white regions of memory are inaccessible to this task.

A serious drawback of the v7 MPU is that region sizes must be powers-of-two, and regions must be aligned on their size boundaries. This requirement is probably the major reason for low usage of the v7 MPU, but it can be overcome. Actually, the biggest drawback of the MPU is insufficient slots. Nearly all Cortex-M processors have MPUs with eight slots. When one gets into creating regions for real partitions, eight just is not enough – 12 might be just right. A few processors have 16 slots; these are strongly recommended for new designs. We support both eight and 16 slot MPUs, but our emphasis is on the former since they are, by far, the most common.

Although the Cortex‑v7M MPU has significant limitations, which make it difficult to use, it is the only means of hardware memory protection available for Cortex‑v7M processors, and hardware protection is the only protection that is effective against hacking. Therefore, it is important to find methods to use it effectively in order to achieve the reliability, security, and safety that modern embedded systems require.


The Cortex-v8M architecture was announced a few years ago. To date, there are very few processors using it. The v8 MPU has greatly reduced the region size and alignment requirements to multiples of 32 bytes. This is helpful for limited-memory embedded systems. However, the number of MPU slots is unchanged in the v8 architecture. Hopefully, semiconductor vendors will opt for 16 rather than eight slots, at least in the non-secure state. We will be supporting the v8 MPU soon.

The Cortex-v8M architecture also offers a new feature called TrustZone, which permits secure and non-secure states. TrustZone secure state is good for storage of keys and other private data. However, running code in secure state may not be worth the extra code complexity and debug difficulty. Arm says otherwise, so we will have to see. In addition, we expect that most device manufactures will want a security solution that works with both v7 and v8 processors.

The next part of this series will discuss partitioning, secure boot, MPU control, and system calls. For more information see

Ralph Moore is President of Micro Digital. A graduate of Caltech, he has served many roles at Micro Digital since founding it in 1975. Currently, he is lead architect and programmer for MPU-Plus, eheap, and smx.

I am no longer running the daily business at Micro Digital. Instead, I have been involved for the past four years in improving the smx RTOS kernel. smx is a hard-real-time multitasking kernel, which is intended for embedded systems that require high efficiency and high performance.

More from Ralph