Nonintrusive visibility into multicore SoCs

November 01, 2009

Nonintrusive visibility into multicore SoCs

Developers must choose a processor architecture with a seamless development environment that includes compilers, debuggers, and performance and profil...

Today, with goals of achieving the best computing power per unit area (MIPS/square millimeter) and computing power per unit power (MIPS/mW), many processor designs are moving to multicore technology to distribute the processing load across cores running at lower clock frequencies.

As multicore designs proliferate, one of the biggest challenges for developers is gaining nonintrusive visibility into the processor cores and their interactions. Furthermore, access to intelligent peripherals is important to successfully debug system-level issues for these highly integrated chips.

Adding multicore-specific On-Chip Instrumentation (OCI®) into the silicon can offer advantages that can help developers meet System-on-Chip (SoC) debugging challenges. Whether developers are bringing up hardware, porting an operating system, writing middleware, debugging application code, or trying to optimize system-level performance, multicore OCI can provide a “hardware assist” to shorten the design cycle.

Multicore processors like the MIPS32® 1004K Coherent Processing System (CPS) place a whole new set of demands on development tools. No longer can a processor be debugged as an isolated entity. With one to four processor cores, two Virtual Processing Elements (VPEs) per core, and a hardware-based Coherency Manager, the 1004K core and other multicore processors require specific debug IP to support everything from hardware debugging to system performance optimization.

Real-time trace, a fundamental feature for serious SoC debugging, illustrates how things have changed. Multicore trace algorithms require more system trace “tags” to be 100 percent accurate. In the 1004K core, in addition to collecting the traditional address, data, and status information, it is also necessary to collect trace details identifying the processor, VPE, and Thread Context (TC) for each trace cycle. A trace block added to the Coherency Manager captures synchronization messages as well as request addresses and stall information. A trace funnel combines all of the trace information from the multiple cores and the Coherency Manager and adds a time stamp to ensure 100 percent accurate real-time trace. Without this level of integrated debug IP, visibility into a processor this sophisticated would be impossible.

Once code algorithms are debugged and the system is up and running, profiling system code becomes critical to optimize performance. One technique is Zero Overhead PC Sampling, which samples an executed address when it is retired from the Program Counter (PC). At a rate of 50,000-plus times per second, this information can be sorted and displayed at the module, function, or line number levels, providing fast and accurate results. System-level profiling information collected without the “hardware assist” of OCI can take orders of magnitude more time to collect and sort.

The PC sampling feature can also be used to address Linux kernel performance and profiling. PC sampling enables developers to quickly identify performance “hot spots” in these critical areas. With the ability to handle large numbers of module and function symbols, users can accumulate counts for each address range represented by the symbols. This information allows users to quickly identify program bottlenecks restricting system performance and understand the best methods for making kernel system calls.

The kernel profiling capability can be extended to Linux device drivers, which usually take the form of loadable modules. Using symbolic information from loadable modules, developers can profile the Linux kernel driver code and determine what demands the device drivers might place on the kernel.

Complementing Linux profiling tools are Linux event analyzers that capture user-selected Linux events occurring on the target and then graphically display the events over time. Captures can collect up to 20 seconds of Linux system activity, enabling real-world application debugging.

Today, multicore system developers must ensure that the processor architecture they choose includes a seamless development environment that includes compilers, debuggers, and performance and profiling tools. Tools of this nature are vital to meet time-to-market requirements and achieve maximum performance with ever-more integrated SoCs and sophisticated embedded devices. By choosing a core that implements multicore OCI, developers can ensure they have the highest possible visibility into their systems and the best chance to get to market on time and on budget.

Rick Leatherman was the founder of First Silicon Solutions, which was acquired by MIPS in 2005. He has more than 20 years of experience in development tools at companies including Intel, Microtek, and Microcosm. Since 1999, Rick has been a champion of On-Chip Instrumentation (OCI®), a concept that has become widely adopted in the embedded tools industry. He has an EE degree from Virginia Tech and an MBA from the Tuck School of Business at Dartmouth College. He has written numerous articles on development tools and is listed as coinventor on both issued and pending patents.