white paper

Purging CXL cache coherency dilemmas

High-bandwidth, low-latency connectivity

Purging CXL cache coherency dilemmas

The massive growth in the production and consumption of data, particularly unstructured data, like images, digitized speech, and video, results in an enormous increase in accelerators’ usage. The growing trend towards heterogeneous computing in the data center means that, increasingly, different processors and co-processors must work together efficiently, while sharing memory and utilizing caches for data sharing. Hence sharing memory with a cache brings a formidable technical challenge known as coherency, which is addressed by the compute express link (CXL).

Why is cache coherency required?

For higher performance in a multiprocessor system, each processor usually has its cache. Cache coherence refers to keeping the data in these caches consistent.

Since each core has its cache, the copy of the data in that cache may not always be the most up-to-date version. For example, imagine a dual-core processor where each core brought a block of memory into its private cache, and then one core writes a value to a specific location. When the second core attempts to read that value from its cache, it won’t have the most recent version unless its cache entry is invalidated. So to accelerate next-generation data center performance. The CXL specification’s founding promoter members included: Alibaba Group, Cisco Systems, Dell EMC, Facebook, Google, Hewlett Packard Enterprise, Huawei, Intel, and Microsoft.

Both CXL and CCIX target the same problem. The major difference between them is that CXL is a master-slave architecture where the CPU is in charge, and the other devices are all subservient, while CCIX allows peer-to-peer connections with no CPU.

Possible shakeouts/convergence is needed to move things forward. Compute Express Link and Gen-Z Consortiums have already announced their execution of a memorandum of understanding (MoU), describing a mutual collaboration plan between the two organizations there is a need for a coherence policy to update the cache entry in the second core’s cache; otherwise, it becomes the cause of incorrect data and invalid results.

There are various cache coherence protocols in the multiprocessor system. One of the most common cache coherency protocol is MESI. This protocol is an invalidation-based protocol that is named after the four states that a cache block can have:

  • Modified: Cache block is dirty for the shared levels of the memory hierarchy. The core that owns the cache with the Modified data can make further changes
    at will.
  • Exclusive: The cache block is clean for the shared levels of the memory hierarchy. If the owning core wants to write to the data, it can change the data state to Modified without consulting any other cores.
  • Shared: Cache block is clean for the shared levels of the memory hierarchy. The block is read-only. If a core wants to read a block in this Shared state, it may do so; however, if it wishes to write, then the block must be transitioned to the Exclusive state.
  • Invalid: This state represents cache data that is not present in the cache.

The states’ transition is controlled by memory accesses and bus snooping activity. When several caches share specific data, and a processor modifies the shared data’s value, the change must be propagated to all the other caches that have a copy of the data. The notification of data change can be done by bus snooping. If a transaction modifying a shared cache block appears on a bus, all the snoopers check whether their caches have the same copy of the shared block. If they have, then that cache block needs to be invalidated or flushed to ensure cache coherency.

Share

Related resources