CacheQ's New Compiler Supports Multi-Threading CPU Acceleration

By Tiera Oliver

Associate Editor

Embedded Computing Design

July 09, 2021


CacheQ's New Compiler Supports Multi-Threading CPU Acceleration

CacheQ Systems announced support for multi-threading acceleration for CPUs with multiple physical cores through its new compiler, offering software developers the ability to "write once, accelerate anywhere."

In the CacheQ Compiler Collection, the compiler eliminates manual code rewriting and the use of threading libraries or complex parallel execution APIs such as OpenMP or MPI. It takes single-threaded C code and generates executables that can run on CPUs, leveraging physical x86 cores with or without hyperthreading, as well as Arm and RISC-V cores.

Using CacheQ's flexible compiler, users generate code for multicore processors on the same or different architectures and benchmark usage with runtime variables. They can add to the hardware for increased performance and power usage or reduce the number of cores and allocate other processes to achieve more optimal performance per watt of power consumed.

Per the company, the result is a speed-up of more than 486% over single-thread execution on X86 processors with 12 logical cores, based on benchmarks from the Black Scholes financial algorithm that simulates human behavior in stock trading. An Apple M1 processor with eight Arm cores is 400% faster than the single-threaded GNU Compiler Collection (GCC).

CacheQ enables software developers to develop and deploy custom hardware accelerators for heterogeneous compute systems including FPGAS, CPUs, and GPUs. Its CacheQ Compiler Collection is modelled after the gcc tool suite, including a user interface similar to common open-source compilers. Because it requires limited code modification, development time can be shortened and system quality improved.

The tool suite enables compilation, linting and error detection, performance prediction, profiling, debug and visualization of the generated virtual engine. It supports target hardware including single and multicore processors, as well as hterogeneous compute systems with FPGA accelerators connected to x86 and Arm processors.

The CacheQ Compiler Collection supports C code and C++ through "hybrid" access of an exported function call. 

Additional benchmarks of the CacheQ Compiler Collection highlight its ability to span high-end servers and consumer electronic devices. Execution of the M1 processor with two cores outperformed the x86 chip with 11 cores, demonstrating a cost-per-watt advantage. According to the company, the Apple M1 processor with four cores performed 210% faster than the x86 with 12 cores. Overall, it performed approximately 1,476% faster than the single-threaded GCC running on x86 using the CacheQ Compiler Collection.

All simulations were performed on the same code compiled for different targets. Benchmarks were performed on an Intel i7-8700k x86 CPU running at 3.7GHz with six physical cores and hyperthreading for 12 logical cores available running Ubuntu 18.04.  Apple M1 benchmarks were captured with a Parallels VM running native Arm Ubuntu 20.04 image.

The CacheQ compiler tools are shipping now through a limited access program. Pricing is available on request.

For more information, visit:

Tiera Oliver, Associate Editor for Embedded Computing Design, is responsible for web content edits, product news, and constructing stories. She also assists with newsletter updates as well as contributing and editing content for ECD podcasts and the ECD YouTube channel. Before working at ECD, Tiera graduated from Northern Arizona University where she received her B.S. in journalism and political science and worked as a news reporter for the university’s student led newspaper, The Lumberjack.

More from Tiera