Hold Derating Kicks Hold Margins’ Butt

October 19th, 2009

Holdtime is an emotive issue amongst SOC designers because, unlike setup timing problems, hold time problems kill chips. Designers generally want a high “hold margin” to ensure their design cannot have hold problems – justifiably, noone wants to have their chip be a brick and have to find out it was a hold problem.

However, overconservatism can easily backfire. We worked on one design where the backend team, having had issues with hold in the past, was requiring a very high holdtime margin in the supplied constraints. We did some simple math and determined that

required holdtime margin > flop clock->q delay + 1x buffer delay + D/SI holdtime requirement

what this meant was that every back-to-back flop, including almost the entire scan chain, would be an automatic hold violator once implemented. Although we pointed this out, the issue was not resolved until implementation was actually run and block area estimates blew up – because the number of hold buffers added was greater than the number of flops in the design, for this very reason.

To help understand this issue, it helps to understand how flops are characterized for holdtime.

HoldTime Characterization

Generally, characterization tools are limited to observing only the “external” nodes of a cell during analysis runs. So a common metric is that the “data” input is changed, up until the CLOCK->Q time degrades by some percentage (usually 10%). What this means is that the internaly flop storage node _is_ being disrupted, just not enough to cause a output node flip. To ensure this doesn’t happen, you add enough margin to your holdtime to account for two things

  1. the extent to which you’re uncomfortable with a 10% output timing degradation being “assumed”
  2. the extent to which you don’t trust 100% the results of the characterization process.

Usually we recommend 2-3 buffer delays, depending on technology. This covers pretty much everything. BUT WAIT, we hear you say, this is not nearly enough to ensure functionality! What about derating! Well, we’re coming to that.

(The other approach is to custom characterize all the flops in your design so that you observe the internal storage node & make sure IT doesn’t glitch my more than X%. You can do this if you’re a big company or a genius. Otherwise, use the supplied models & add a little margin.)

Now, our controversial assertion here is that, no matter how bad your clock tree is or your variation is, when two flops are driven by the same clock net (with transition times in the characterized range), if holdtime is “met” in STA between these two flops, then it’s impossible to have a true holdtime violation between these two flops in the lab.

This is because it’s _always_ the same edge that arrives at the two back-to-back flops. Here’s an example.

HoldTime-NoViolationPossible

Where true hold violations occur, it’s due to _variation_ in cell delays and parasitic delays (noise or not). This occurs when there is divergence in the driving CTS tree.

HoldTime-ViolationPossible

A few short years ago, only Synopsys PrimeTime supported this properly (with pessimism removal and clock/data derating). Back then, we did ECOs to fix these violations, but since we were in 130nm or the early stages of 90nm the number was relatively small so it was no matter. Nowadays, every major tool supports derating for both setup & hold natively, so unless you have severe correlation issues between implementation & signoff tools it’s not such a huge issue.

Now the critical question is how to choose a good derating factor? Interestingly, we often see incredible conservatism in hold “margin”/uncertainty, but much less conservatism in the derating factors applied during hold fixing. The range we’ve seen is anywhere from 5% (low) to 18% (high, but high-performance design with custom clock trees). In general, set it as high as you can while maintaining an appropriate lid on the number of buffers required to fix hold.

Where things really get interesting is multimodal and multicorner hold analysis and fixing. This article is too short for that, but some interesting things to consider, especially if you’re doing complicated SoCs with 10s-100s of clocks, or even if you’ve turned on “useful skew optimization” in your implementation tool, are

  1. don’t assume that BC model/BC parasitics covers every possible worst-case behavior for holdtime. We’ve found cases where it’s BC models and either WC or TT parasitics that cause additional holdtime problems (although, the right initial “optimization” corner is BC/BC).
  2. often, implementation is done with a functional SDC, and hold closure is done with a different constraint set. It’s possible to create a multimode constraint file – which is in general a tough problem. Probably the “easiest” thing to get 95% of the problem solved is to remove any case analysis that sets the value of the flop SE (ScanEnable), and replace it with a set_false_path. That will force the CP->Q->SI paths to be sensitized, while ignoring the timing on the scan control signals which are typically don’t-cares. Watch out when you do this though – all of a sudden a very large number of interclock paths will become sensitized (that were previously masked by the SI pins in the scan chain being disabled). Having a tool that automatically finds & fixes missing interclock false paths can help!
  3. back in the days of 90nm/130nm, we could generally assume that any hold violator had plenty of setup slack, and you could just drop buffers all over the place to fix hold violations. This is not the case anymore with the really small mincase delay values in sub-65nm technologies. We’re finding more and more cases where there are 100s to 1000s of pins that are BOTH setup and hold violators – which means you need far more sophistication to fix the hold violations – something we’ll discuss in the future.

Saving Money and Schedule with TimeVision

September 18th, 2009

Would you like to cut your schedule and cost by over 50%? YES – I would rather not spend days poring over 5 different tools’ check_timing reports!

Recently we worked on a chip where we started working on a fixed-cost, fixed-schedule model. The numbers are compelling – and we wanted to share them here. Now, obviously this is just one piece of the problem (it’s a big design). But we’re able to offer amazing cost & schedule advantages compared to our old way of doing things – as consultants and with a few 10000’s of lines of scriptware that came with us everywhere.

What we could offer using TimeVision Consulting on a large, complex System-On-Chip block at 65nm was….

  • Block -> 30+ clocks, 500K+ instances, 121k registers
  • Contents -> purchased soft & hard IP, internally-designed subsystems, ARM AMBA interfaces
  • DFT – scan & bist logic inserted
  • IOs – several complex & speed-sensitive IO interfaces at full-chip level
  • Modes – generate all mode constraints including functional, scan shift/capture, BIST
  • Requirements – IP documentation for hard/soft IP, several hours of RTL designer(s) time to discuss clocks & other design issues
  • Deliverables (1) qualified & debugged constraints for synthesis and place/route, and (2) qualified, debugged & validated signoff constraints for all modes
Task “Typical” Consulting TimeVision Consulting
Time Expended Cost @$100/hr Time Expended FIXED cost
Timing Constraint Development, Debug, Integration 4 weeks $16 000 4 days $8 000
Backend Collateral (Clock diagrams, balancing requirements, IO timing etc) 1 week $4 000 1 day $2 000
Major Iterative Respin (RTL change, new IP drop) 2 weeks $8 000 2 days $4 000
Prorated EDA tool cost $3 000
TOTAL TIME/COST 7 weeks $31 000 7 days $14 000

Compared to what we would have done previously, we were able to deliver

  • 55% cost reduction
  • 5x faster turnaround time ( 4 weeks -> 4 days)
  • 5x faster turnaround on iterations of the same block ( 2 weeks -> 2 days)

Our 4 day work included the review and integration of RTL desinger/IP Vendor provided point-to-point, register-level false/mcp paths and case_analysis.

Wow! Who wouldn’t want that!

The only constant is change…

September 9th, 2009

“The only constant is change, continuing change, inevitable change, that is the dominant factor” during the Implementation of complex chips today.

Before I go further, I would first like to acknowledge that the highlighted quote above is from one of the most famous writer’s of our time, Isaac Asimov. And it is a translated version of a quote by Heraclitus, a 500BC, greek philosopher. I am sure these great writers and philosophers certainly did not contemplate the use of the above phrase in context of Implementing chips !!! But do you relate to it in your everyday timing constraint/verification work ? I would go out on a limb, and say that each of you would say an emphatic “YES”.

Here is a typical examples of what I have recently experienced:

“I was working on a highly complex block, with 30+ clocks and soft IP components from several vendors. After carefully and diligently working on it for several weeks, all timing constraints (IO, inter-clock false paths etc.) were cleaned up, post-synthesis netlist was meeting pre-layout timing, and the backend was making decent progress towards timing closure. Out of nowhere, one of the soft IP vendor delivers a new drop with some “minor” bug fixes. And guess what happens? The block netlist is a disaster after synthesis. Why? It so happened that the new soft IP drop introduced some half-cycle paths, and some additional inter-clock paths from some config registers (on clk1) to all other clock domains (clk2 … clkn). It turned out that the half-cycle paths were legitimate, and I had to tweak synthesis variables/flows to meet timing. After carefully analyzing/reviewing each new inter-clock path using the Soft IP documentation, some vendor help, and Primetime analysis, it was concluded that all these were false path. I updated the constraints/SDC to put those set_false_path commands, and the synthesis was back to normal again. The entire process took about 1 week or so to finish.”

Can you relate to the this experience when working on large SOCs Have you ever experienced something similar before, and had to spend several days to identify and fully understand what changed in the design RTL, carefully analyze and review those changes, update the timing constraints appropriately, iterate a few times in synthesis, and make sure that everything is back to normal ? And just when you finished, guess what happens — there is yet another change that again causes havoc during implementation, and you repeat everything — and this repetitive cycle goes on several times during the entire duration of the project.

Now, in this example, I am not ranting and raving about the Soft IP vendor creating such a major bottleneck for implementation. They are just doing their job by providing a more robust IP to their customers, and not fully aware of what issues can come up during implementation. The key issue here is how to make meaningful progress and converge on the implementation phase of a complex design, in a chaotic, constantly changing environment ? And in many cases without a direct line of communication to the designer of the RTL code in question?

To do so, first, we must acknowledge, as Isaac Asimov and Heraclitus quote, that “The only constant is change”, and it is going to stay that way. Schedule and cost planning for chip implementation puts a heavy focus on when the RTL netlist will “freeze“. Now “freezing” the RTL should certainly be a very high priority, and everyone in the design team should strive towards that goal. However, the reality is that freezing a RTL netlist is beyond the control of implementation teams, and it is an unreliable, unpredictable metric to be totally dependent upon. Think about it — if a major bug is found in the RTL or some IP Vendor drops a new version with some critical fixes just days before tapeout, you have no choice but to accommodate that change, right? Or if the marketing team comes along in the 11th hour and say that a major customer wants to see some minor changes in the feature set of the chip, which would obviously result in RTL changes – and you have to suck it up, and accept that change. This is just a fact-of-life in this business – chips only succeed when they sell (hopefully in large volumes!).

How we respond to these unpredictable, random changes is something we do control – and having a solution that enables us to deal with these day-to-day, unpredictable changes in an efficient, targeted way will be a big factor in ensuring on-time delivery of projects while maintaining a high degree of flexibility. For example,

  • • Automatically identify (in minutes), any new clocks, clock logic, or inter-clock paths (false or not) that were introduced in a new netlist, compared to a previous netlist
  • • Ability to quickly Compare any two SDC’s, and get insight into meaningful changes.
  • • Identify if constraints have been modified or need to be modified on any critical IO Ports
  • • Find out if any new ½ cycle paths were introduced in the latest netlist, compared to the previous one
  • • Determine if register clock propagation has changed from one version to the next
  • • Quickly detect if there is a significant change in the no. of registers being driven by any clock in the design, that might impact CTS implementation
  • • In general, getting quick, upfront, automated visibility and understanding into what changed in the design netlist/constraints that would cause implementation problems in the downstream flow, before these problems bubble back up in a big, messy way

In absence of such robust capabilities, design changes today are dealt with using painstakingly, laborious methods of manually going through timing reports, browsing through IP documentation which are 100’s of pages long, debugging check_timing report files, interactively analyzing issues in native EDA tools using ad-hoc scripts, sending numerous back-n-forth emails and meetings with the RTL / IP Vendor / Physical Design teams etc. This process takes a tremendous amount of time, which kills schedules. Moreover, it requires a lot of bandwidth & engineering resources (especially for complex chips with numerous P&R blocks), and that kills budgets. Furthermore, this pseudo-manual process is error prone, causing engineers to miss real issues, which are identified much later in the design cycle, causing even more pain for everyone. Lastly, it involves a great amount of mundane, grunt work of repetitive nature.

How can we drastically minimize all of the above to handle design changes better ? Couple automation with good methodologies, and a large amount of pain and uncertainty will go away!

Introduction to Ausdia’s “Timing Blog”

September 8th, 2009

We started writing this blog since there is a lack of industry detail, practical knowledge and debate about timing constraint and closure issues. Even though EDA implementation and timing signoff tools are improving in features, capacity, and runtimes, engineering teams and project managers fully realize that “timing” issues continue to be a huge bottleneck in the implementation and closure of complex SOC/ASIC chips. There is a huge gap between tool “capabilities” and outcomes – often, based on experience of the engineers and managers involved. There is no training program for constraint development that we know of – it’s just hands-on experience based, and can often be viewed as a “black-art”.

We will describe, at least twice a month, critial timing constraint or closure issues we face today or have faced in the past, and how they have gotten addressed. This information comes straight from the front-lines of timing verification in some of Silicon Valley’s most demanding and complicated ASIC/COT designs.

Sometimes we will take a step back and look at the more global picture of timing verification. And sometimes, we will relate these problems to features of our software, TimeVision, which is intended to solve a lot of the issues timing engineers struggle with today.

We hope to make this blog informative, insightful and sometimes contentious. We hope you enjoy reading it and welcome your comments and discussions, even if you don’t agree with us!

Upcoming articles

  • • Handling design and constraint changes without working 100 hours a week
  • • How to analyze false paths between clocks & domains faster
  • • Five critical issues in developing block IO timing constraints
  • • Virtual clocks – is it love, or is it hate?
  • • How to interpret “it’s a false path, but ….” without needing a lifetime supply of Rogaine
  • • Why I’d take OCV derated hold over hold margin any day

Welcome to Ausdia Inc.

August 24th, 2009

TIMEVISION   
TimeVision is an analysis and optimization tool for accelerating timing constraint development, constraint validation and timing closure.