Using virtualization to maximize multicore SoC performance
October 01, 2011
Using virtualization techniques to leverage the potential of multicore SoCs.
Whether software developers like it or not, and whether they’re prepared for it or not, virtually every semiconductor maker worth their salt is producing multicore Systems-on-Chip (SoCs). These SoCs typically pair two or more CPU cores with additional application-specific hardware accelerators to provide a complete system. For example, Cavium Networks, NetLogic Microsystems, and Freescale Semiconductor produce SoCs for network processing, while Texas Instruments and Broadcom make SoCs for digital media devices.
For software folks, this presents the interesting challenge of enabling applications to obtain all the processing power available from these multicore SoC environments. How can developers make sure their applications scale linearly with the available cores, as well as fully utilize the other SoC hardware components such as media accelerators and packet engines? To be clear, the scalability question is still a real science project for many applications; however, there are systems to build and products to ship, so developers can’t wait for the theoretically perfect solution.
In the past year, MontaVista Software examined numerous customer use cases in a wide range of applications including network processing, digital TV, in-vehicle infotainment, super low-power server Web hosting, and more. The goal was to understand how a Linux-based software solution could make full use of the underlying SoC hardware across a wide range of application requirements. The study identified the following high-level requirements that any solution must meet.
The demands of modern embedded systems are hastening the adoption of multicore SoCs. These demands are further accentuated by the requirements to run multiple systems simultaneously; thus, the solution must provide an efficient way of using and managing multicore environments.
Anything downloaded to a device is insecure by definition. The solution must effectively isolate anything downloaded from the core device functions, and the downloaded applications must not be allowed to contaminate other applications.
Downloaded applications must be prevented from hogging system resources. The goal is to effectively share resources such as memory, CPU time, and I/O. This sharing must allow more important system functions to have priority over less important downloaded applications.
Foreign system integration
Many environments run on top of a Linux kernel. However, these environments might require different userland libraries, as well as different kernel patches. For example, the Android system has its own device drivers and kernel patches. Ideally, the system could run any userland that runs on a Linux kernel. The kernel patches and userlands associated with these environments must be integrated with security and resource sharing in mind.
This analysis led to the development of a Linux-based architecture that maximizes the underlying power of today’s powerful multicore SoCs.
To understand the overall architecture of this software, it is necessary to know a bit about modern Operating System (OS) environments, most notably virtualization technology. But be careful; there’s a lot of hype around virtualization (or, as we like to say, a lot of hype around hypervisors).
Virtualization is a method for dividing a computer’s resources into multiple execution environments. There are three major categories of virtualization in use today, with the key difference among them being the layer where virtualization occurs:
- Full virtualization and paravirtualization: These types of virtualization are used to host multiple guest OSs that are isolated from one another. While highly functional, the performance (without a great deal of optimization) is very low due to the overhead of the hypervisor and multiple OSs. Examples include QEMU, Kernel-based Virtual Machine (KVM), Zen, and VMware.
- OS resource virtualization: This type of virtualization is used to isolate and scale applications using a single OS. The advantage here is a single OS and lower overhead, typically less than 1 percent in most cases. Because there is so little overhead, the ability to scale and/or optimize performance is a huge benefit. Examples include Linux Containers and BDS Jails.
- Hardware segmentation (Asymmetric Multi-Processing or AMP): This high-performance configuration dedicates hardware to specific applications running in user mode for maximum performance. This can be achieved using a simple runtime executive or leveraging OS resource virtualization and processor core affinity capability to dedicate cores and I/O to processes with almost no overhead.
These types of virtualization offer different performance characteristics, require different setup and maintenance overhead, introduce unique levels of complexity into the runtime environment, and address different problems.
While the industry is currently focused on pushing fully virtualized hypervisors as the one-size-fits-all solution to multicore optimization, the reality is that embedded developers need a range of options that can be tailored to specific application needs. Developers will require some combination of one or more of these virtualization technologies to deliver products that fit within hardware constraints and meet design performance characteristics. In short, the trick is to match the application with the right OS services to meet the overall system requirements, which can include performance, reliability, and security.
MontaVista provides three methods of virtualization based on nonproprietary, open-source Linux technology and supported across multiple processor architectures. Because it is a single runtime, there is one compiler and one set of tools that can be used for any use case or combination of use cases. Figure 1 shows an overall picture of this approach. These three methods are:
- KVM Hypervisor (full virtualization)
- Linux Containers (OS resource virtualization)
- MontaVista Bare Metal Engine (OS resource virtualization and SoC hardware segmentation)
Microserver use case
The idea behind microservers is to utilize smaller, more energy-efficient processors to lower the physical and energy consumption footprint of a class of Web-centric IT applications. For certain workloads, several low-power processors can be more efficient than fewer, more powerful processors. Cavium Octeon processors and those from other semiconductor suppliers are well-suited to meet the density and power efficiency requirements underlying the microserver concept on the basis of the power efficiency of the cores themselves. These SoCs also include dedicated hardware to handle the front-end security and encryption/decryption processing that Web-based applications require.
From a software perspective, MontaVista Linux Containers and Bare Metal Engine technology help complete the picture. Containers are used to provide OS-level virtualization, allowing very efficient virtualization of the workload requirements. For example, Containers can be used to host thousands of independent websites, each securely isolated from each other. Containers allow the precise control of runtime resources allocated to each container, so each website can be limited to the performance levels the customer has purchased. Or, more importantly, a rogue website can be stopped from over consuming resources using the same mechanisms, thus thwarting a denial-of-service type of attack.
Bare Metal Engine provides the runtime environment for the security and encryption/decryption operations each of these hosted websites requires. For example, a 32-core SoC can utilize most of the cores for application processing with a few dedicated to packet processing, all controlled by one Linux instance.
Linux offers a simple solution
It is a widely held misconception that a combination of Linux and either a Real-Time Operating System (RTOS) or simple runtime environment must be utilized to fully realize the high performance available with multicore processors. Fueling this misconception is the thought that Linux itself is incapable of meeting the requirements because it is too big, too slow, and not real-time. This fallacy also drives the requirement that hypervisors and/or virtualization must mediate and isolate the different runtime environments and facilitate intercommunication among them. Often it is the RTOS vendors themselves who perpetuate this erroneous belief.
In the end, these misconceptions about Linux drive added complexity and costs into the development process. Complexity increases due to multiple runtime and development environments (one each for Linux, the RTOS, and possibly the hypervisor). Costs increase because of royalties for the proprietary RTOS and hypervisor, not to mention the added costs created by the development complexity itself, with more developers needed for a longer period of time.
The approach to use Linux everywhere and fix it where it might not meet some requirements results in a single OS environment, single tool chain, and common development and debugging tools for all aspects of the application. As Einstein said, “Make everything as simple as possible, but not simpler.”