Compute Express Link(CXL) Interconnects Memory and GPUs for Yielding High Performance

By Saumitra Jagdale

Freelance Technology Writer

August 31, 2021


Compute Express Link(CXL) Interconnects Memory and GPUs for Yielding High Performance

With advancements in cloud computing technology, the architectural memory arrangement of data centers has evolved significantly. To enhance the compute capacity and extensive data processing, there is a need to integrate accelerators that excel at processing specific workloads. Although these devices already connect over PCI Express, the optimization can further improve with the compute express link as it allows heterogeneous processing with various system components.

Image Credit: CXL Consortium

Recently, PCI Express is the most common technology used to connect host processors to accelerator devices. It is a high-performance, industry-standard serial I/O interface developed for its usage in enterprise, desktop, and embedded applications. 

Parallelly, PCIe has limitations in a system with big shared memory pools and numerous devices requiring high bandwidth. Because each PCIe hierarchy uses a single 64-bit address space, PCIe doesn't provide techniques to support coherency and cannot adequately maintain separate memory pools. Furthermore, PCIe links have too much delay to efficiently manage shared memory across several devices in a system.

Hence, compute-intensive workloads require a connection with faster data flow to scale heterogeneous computing in the data center effectively. The Compute Express Link (CXL) challenges some limitations by leveraging PCI Express 5.0’s physical and electrical interface. The new technology improves memory capacity and bandwidth and enables lower latencies. It is an interface for high-speed communications as accelerators are increasingly used to complement CPUs in support of emerging applications. 

What is CXL?

Compute Express Link (CXL) is an open industry standard memory interconnect. It aims to provide high-performance connections across the many types of memory used in today's data centers, such as CPUs, TPUs, GPUs, and other processor types.

Image Credit: CXL Consortium

CXL leverages PCIe 5.0's physical layer infrastructure to create a common memory space across the host and all devices. A cache-coherent standard ensures that the host processor and CXL devices see the same data when accessing it. The CPU host is primarily responsible for coherency management, allowing the CPU and device to share resources for higher performance and decrease software stack complexity, which leads to a reduction in total device costs.

Aside from raw performance, CXL has other advantages. The technology is open-source, unlike Micron’s 3D Xpoint, which was the closest approach to a high-performance memory interconnect until now. Micron and Intel are now hedging their bets on CXL rather than proprietary technology. Hardware systems by NVIDIA and AMD also use CXL for interconnecting memory and external processing units for optimized performance. The GPUs from NVIDIA have blended with CXL design for such extensive computations with some alterations in memory design. 

The most significant outcome of this transition will be the elimination of proprietary memory interconnects by CXL. It is essential because there are several types of enterprise storage, and each processor accesses them via a proprietary connection. This can make enterprise storage infrastructure complex, and it is impossible to forecast how well it will perform. 

Now, every processor will be able to access all types of memory thanks to CXL. This will allow separate processors to share the same memory pools. It is vital not only for data centers but also for developing technologies like AI neural networks and deep machine learning systems that generally use CPUs, GPUs, and TPUs.

CXL Transaction Layer Protocols

With similar bandwidth to PCIe 5.0, CXL uses three dynamically multiplexed transaction layer protocols:, CXL.cache, and CXL.memory to gain an edge over PCIe. 

  • The protocol is almost identical to PCIe 5.0. It is used for device discovery, configuration, register access, interrupts, virtualization, and bulk DMA.

  • The CXL.cache is an optional protocol that defines interactions between a Host and Device. This allows attached CXL devices to efficiently cache Host memory with extremely low latency using a request and response approach.

  • The CXL.memory is an optional protocol that provides a host processor with direct access to accelerator-attached memory using load and store commands. The accelerator-attached memory is viewed as an additional address space by the CPU, GPU, or TPU, improving efficiency and lowering latency.

Image Credit: CXL Consortium

Impact of CXL on Storage

CXL is reliant on storage and will have an impact on it. It will allow storage systems to use substantially larger memory pools for caching. The technology is beneficial to storage applications or systems that employ memory as a cache. It is crucial because while many ISPs are regularly restricting bandwidth, temporary data storage is becoming a critical aspect of the performance of data centers.

The largest DRAM cache currently available in a commercial storage system is 3 TB. Although using MemVerge or Formulus Black, specific software-defined storage can extend that cache to 4.5 TB with Intel Optane PMem. But this imposes a hard limit on cached memory for all but the wealthiest organizations. Fortunately, CXL addresses this issue by allowing storage software to cache data across multiple storage media.

For more information, visit Compute Express Link.

Saumitra Jagdale is a Backend Developer, Freelance Technical Author, Global AI Ambassador (SwissCognitive), Open-source Contributor in Python projects, Leader of Tensorflow Community India, and Passionate AI/ML Enthusiast.

More from Saumitra