white paper

Implementing an efficient RTL clock gating analysis flow at AMD

Success story

Picture of Jaguar: AMD's Next generation Low Power X86 Core

This paper provides an overview of how AMD used PowerPro to improve clock-gating efficiency and shares the results and advantages of doing power analysis at the RTL stage rather than waiting until post-gate synthesis.

PowerPro® Power Analysis Solution

Lowering the power consumption of consumer products and networking centers is an important design consideration. The same goes for many of the processor cores that go into these devices. AMD is building a reputation for designing power-efficient processor cores, which helps its end customers deliver lower-power products.

For the new low-power X86 AMD core code-named Jaguar, AMD wanted to improve on the previous generation in terms of faster performance in a given power envelope, higher frequency at a given voltage, and improved power efficiency through clock gating and unit redesign. The AMD low-power core design team used the PowerPro® power analysis solution to analyze RTL clock-gating quality, find opportunities for improvement, and generate reports usable by the engineering team to decrease the operating power of the design.

Because PowerPro analyzes pre-synthesis RTL, it can be run more often and analyze a larger number of simulation cycles more quickly and with fewer machine resources than tools that rely on synthesized gates for power analysis. AMD selected a suite of 39 tests, which included a maximum-power condition (making as much of the core active at once as possible), a halt case (no instructions or interrupt activity occurring), and several actual application code snippets. The focus on clock gating and the quick turnaround of RTL analysis allowed us to achieve measurable power reductions for typical applications of the AMD Jaguar core.

Efficient RTL clock-gating analysis flow created

The flow and tool had these key features:

  • RTL analysis could run over the weekend and analyze key power benchmark tests.
  • Output format was easy to parse and summarize for designer use.
  • Recommended improvements had value as suggestions and showed possible optimizations.
  • Correlation between active clock count and total power used was good.
  • Ultimately, even given IPC and frequency improvements, PowerPro helped achieve an approximately 20% reduction in typical dynamic application power compared to an already-tuned low-power X86 CPU.

Share

Related resources