Power and performance analysis
Two imperatives drive the design of state-of-the-art chips in leading-edge market segments: achieve the highest performance with the lowest power consumption possible.
Today’s hardware emulators can assist a verification team meet the two objectives, making them unique in this endeavor. By tracking the internal activity of the DUT, undefeated by its size and size of the workload, the emulator can generate a database to map such activity in space and time on graphical charts, easy to browse and fast to analyze.
Following a hierarchical approach, the verification team can zero in on the design sections and time windows affected by high-energy consumption and quickly identify its causes.
It is acknowledged in the industry that power analysis at the register transfer level (RTL) produces results with approximate accuracy of 15 percent compared to real silicon, whereas at the gate level the accuracy is approximately 5 percent. Unfortunately, analysis at the gate level happens too late in the design cycle, leaving no room for effective design changes.
As an example, let’s consider performance and latency validation of a Solid-state Storage Device (SSD).
Performance/latency of SSD
While emulation is not timing accurate, it is cycle accurate, a requirement to establish how many cycles are needed to complete an operation or how many cycles are consumed between an input request and the corresponding output response. That’s what latency is all about.
As discussed previously, the ICE mode is not suitable for the task. The insertion of speed adapters between the relatively slow DUT (~1–2 MHz) and the fast target system (100MHz–1 gigahertz) changes the speed ratios between slow design clocks and fast target system clocks. Under these conditions, no performance/latency evaluation can be accomplished (figure 1).
To validate performance and latency, it is necessary to virtualize the host interface since both DUT and target systems are models and their operational frequencies can be set to any value without resorting to speed adapters. The setup preserves the clock ratio of the DUT and target system necessary to achieve a few percentage points of accuracy vis-à-vis actual silicon.
Virtual is the only solution that can validate SSD designs for hyper scale data centers with a high degree of accuracy compared to real silicon.
The evolution of the FPGA prototype
FPGA prototype technology has also evolved over time. All throughout its history, the FPGA prototype traded off debug capabilities and compilation automation in the quest for higher speed of execution than emulators.
From a board with one to four FPGAs, the prototype morphed into two classes of platforms: desktop and enterprise.
Desktop FPGA prototype platforms maintain the same basic characteristics of the early implementations, albeit now often enclosed in small boxes. While the compilation process still requires manual tuning, the performance may reach and even exceed 100MHz on a single FPGA.
Enterprise FPGA prototype platforms resemble an emulation system. Hosted in big boxes, they consist of multiple boards, stuffed with arrays of interconnected FPGAs sharing a backplane. The level of granularity, for example, the smallest resource available to a single user, is one FPGA.
In the past few years, FPGA prototyping vendors announced sharing the front-end of the compilation flow with that of the emulator to speed up the process. Similar announcements addressed DUT debug.
By combining prototype and emulation in virtual mode, verification engineers can quickly detect a bug via the prototype, switch to emulation running on the same DUT, and trace the bug via the emulator.
Instead of competing for the same verification job, emulation and prototyping co-exist and complement each other to accelerate the verification cycle and increase the quality of the DUT (figure 2).