white paper

Mitigating long tail latency effects in AI chips with functional monitoring

As cloud services increasingly use complex, heterogeneous computing to handle the demands of artificial intelligence (AI) models, unpredictable and difficult to detect performance issues can arise, reducing the benefits derived from adopting custom or special-purpose silicon.

Datacenter and edge server developers can ensure end-user applications are as responsive as possible by utilizing functional monitoring infrastructure in their SoCs. The Tessent Embedded Analytics platform makes performance optimization of complex silicon systems possible with smart functional data collection and powerful analytics functionality.

Long tail latency challenges

As cloud services acquire greater functionality a question looms over them: can they satisfy demands under highly variable load conditions quickly and reliably? The services involved are complex and increasingly require the use of distributed, heterogeneous computing to handle the computational demands of AI models. The architectural complexity inherent to the processing model can lead to baffling performance issues that cause a small but significant minority of users to see unacceptably long delays, causing them to complain or cancel services.

These long tail latency events can be difficult to diagnose. They may be due to inherent issues in the application design or resource allocation strategy but only surface intermittently due to other transient conditions.

System developers or operators can look for early warnings of problems and react by culling problematic jobs quickly, or they can investigate the causes of intermittent but repeated tail latency peaks. The use of conventional techniques such as software-based sampling in diagnosis can be frustrating: it is only rarely that a sample of execution traces will return the data needed to diagnose the root cause of tail-latency and other performance issues.

The Embedded Analytics approach

An alternative approach is to use the information that the hardware in the server nodes can provide readily, if they are instrumented to support effective analytics. In many cases, a gross tail latency problem is the result of one or more micro-latency issues in a single node that cascades into a more significant problem. The ability to detect those micro-latency events, such as temporary memory-bus overloads, queues reaching capacity, or thrashing caused by cache contention between two tasks, can indicate where and when these issues arise.

Embedded instrumentation can provide support for both real-time reactions and post-event analytics.For example, threads that miss deadlines can automatically be identified to the management system and provide early warning of resource issues that can be addressed at that moment by moving low-priority tasks to a different node. If the events are the results of structural or architectural issues, the instrumentation can provide vital information that can be used to restructure the communications and synchronization logic of tasks to reduce the probability of them causing excessive delays.

The key to this smarter infrastructure lies in the application of hardware monitoring implemented within the SoCs in the servers that can filter and process execution information in real-time, overcoming the limitations of sampling- and log-based diagnosis. Using real-time and offline analytics supported by hardware monitors, datacenter and edge server operators can ensure their services are as responsive as possible, providing a clear incentive to obtain processor and accelerator silicon that can provide them with the instrumentation and insights they need.

Mitigating long tail latency effects in AI chips with functional monitoring

Long tail latency challenges

The Embedded Analytics approach

Share

Related resources