Author: Craig Maiman, Principal RTL Design Consultant
Welcome to the RTL Design Success series — Part 1 of 9
1. Introduction
Successful design that meets a typically tight schedule and often difficult requirements requires a combination of well-developed designer skills. These skills include writing clear, concise, correct, and unambiguous specifications, RTL and often unit-level testbenches.
In this paper I will summarize some of the best practices I have developed in about 40 years of design experience. Note that code examples in this paper use SystemVerilog. There are many advantages of SystemVerilog over Verilog, with the two big ones (for me) being simplified syntax and greater abstraction capabilities.
2. Micro-architecture
Whether you are involved with the top-level architecture or not, the micro-architecture or implementation is where the rubber meets the road in terms of RTL implementation. It’s where you define how you are going to partition and implement the design to meet the goals of the architecture: functionality, performance, timing, area, power, and cost.
It’s important when first starting to explore approaches that the architecture is clearly defined. One goal I’ve often found missing in architecture specifications is performance goals. They may have the functionality clearly defined but have left open the performance goals (e.g., throughput, latency, instruction/function processing rate, etc.). That might be OK if, for example, other specifications (e.g., interface specifications) define performance, but it’s important to clarify with the architect (if it’s not you) what the design expectations are, particularly if it’s not expected that the micro-architecture meet the max performance of the interfaces (yes, that sometimes happens, depending on the higher-level goals). An example of this would be a design that has to interface with a 100Gb Ethernet interface. To meet full line rate with minimum-sized frames is exceedingly difficult. But, if the goal of the architecture is to do line rate with larger sized frames, then jumping through hoops design-wise to get line rate with minimum frames is unnecessary (and usually costly).
Once the goals are clear, then I’ll start to explore several different micro-architecture partitions and consider the trade-offs before deciding on the best one to move forward with. Some logic partitions simply won’t meet one or more of the goals and while different approaches might achieve the functionality goals, they may not meet one of the other goals, such as performance.
It’s worth spending a good amount of time considering different partitions before moving forward as changing it later when you find an issue could end up being very difficult and time consuming.
An example might be sending requests from one logic block to another. Perhaps it’s the case that some requests will get invalidated, so you may have to decide where the request gets invalidated, in the sending block or the receiving block. In this case you will have to consider where it makes sense to do so given possible effects on efficiency/performance and perhaps other information that might only be available in the source or destination logic block.