For a long time, I thought Verification Engineers were smarter than Logic Designers. After all, we Verification Engineers could see their mistakes, each one nourishing the image of a group of people that we probably shouldn't be looking up to. And what happens when we debug the design under verification down to the offending line of HDL code? We report the issue along with the source file name and line number. The Logic Designer then types in the correction. It's a lot like dictation.

secretary catching up on dictation

But seriously, I've been doing a lot of logic design lately and I can tell you from firsthand experience that it's not easy. In my opinion, Logic Designers are worthy of our respect and admiration after all.

I'll get back to the logic design in a moment. But first, let me pick up where I left off with the development of FSS.

I was working on a "high-level implementation reference" of the integer multiplication unit. As expected, I quickly got it to the point where it was ready for to be tested, but the reference wasn't multiplying correctly either. The Java debugging tool, jdb, wasn't up to the task because it cannot easily be used with programs that read from standard input.

NetBeans IDE 5.5 failed to show the result of System.out.print( "anything" ).

"... a user was experiencing a bug in NetBeans 5.5.x which doesn't work well with System.out.print statements. NetBeans 6.0 does not have this same issue."

So I tried NetBeans IDE 6.1. Finally, I was debugging the high-level implementation reference.

A decision to shift at a certain instant was correct, but a decision should have been made at that same instant, based on the value of the multiplier bit, about the next state. Delaying the next-state decision until after the shift was the fault because the shift changed the multiplier bit. By 2008-09-25, the high-level implementation reference was multiplying correctly.

By this time, I couldn't stand to look at the atomic wires of a bus out of order in the waveform view any longer. The performance advantage of not sorting just wasn't worth it, the software engineering effort to fix it was minimal and most importantly, there was no way that you would tolerate such a "feature".

It was time to compare the Sputnik microprocessor's behavior to that of the known good high-level implementation reference.

The reference predicted that the product would be valid at Cycle 119 Phase 1. By that time, Sputnik had already turned off its productValid signal (I guessed that it might have started another multiplication) and the upper bits were now zero, clearly not a match with the expected value at that time.

Digging deeper, I found that Sputnik had the same problem that I had just fixed in the reference. But Sputnik was harder to correct. I reasoned that I needed to stimulate the microprocessor with three-phase control and datapath clocks rather than two-phase.

control 10 10 ...
datapath 01 01 ...

control 100 100 ...
datapath 001 001 ...

The reason was that I wanted the next-state decision to be made after the current state becomes stable (current state is changing in Phase 0) and before a possible disruptive shift in Phase 2). Phase 1 is the perfect time for such a decision, so in goes a register (to capture the next-state decision) sensitive to the falling edge of the control clock. On 2008-10-03, Sputnik was multiplying correctly for the first time!

But that was a short-lived triumph. The bottom of the test program looks like:

mulx %r1,%r2,%r3
setx product,%r5,%r4
stx %r3,[%r4]

The next step was to verify that MULX did its primary job (i.e. to deliver the product to %r3). I really thought that, now that Sputnik was multiplying correctly, getting the data into %r3 was going to be easy. It wasn't. Like I said, logic design is hard.

As an aside, let me discuss my thoughts on verification hierarchy and observation points. FSS is really a chip-level verification environment. And ideally, we want black box verification because FSS is designed to verify any SPARC-V9 implementation. At the chip level and without looking inside the black box, FSS told us that Sputnik was failing, because the product never got stored into memory. But in order to debug that failure, additional observation points were needed and these are provided by the FSS waveform view. This is an interactive process, with the Verification Engineer applying his/her knowledge of the SPARC-V9 implementation. At this point, by violating the black box, I know that the multiplication unit is outputting the correct product.

Note that the SRS requires FSS to provide a view of the registers. That would be useful now, but Version_0-007 is so late that I resist implementing features unplanned for Version_0-007. It will be necessary to think this through. My intention is to provide only a view of architectural registers. Nevertheless, how will a general SPARC-V9 implementation interface with FSS to provide this view?

Anyway, on 2008-10-06 the waveform view revealed that data was momentarily reaching the register file, but at the first opportunity to write that data to %r3 (the next datapath clock rising edge), the multiplication unit was already changing the product.

I couldn't write the register any earlier, so I had to hold the product stable longer. But there was a bigger problem that I wanted to fix first. The productValid signal remained asserted as the multiplication unit was changing the product!

The problem was that the product register was still shifting in the final state of multiplication. It needed to be quiescent in that state. To fix this, I added a hold port to the product register.

In accordance with Murphy's Law, the product changed even while the hold signal was asserted. I suspected that someone was telling the product register to initialize.

In Sputnik, there is a handoff of control from the integer unit to the multiplication unit, and back again at the end of multiplication. The integer unit is responsible for initializing the multiplication unit. This presents a challenge for the productValid signal, generated by the multiplication unit's control, because the product can be made invalid (e.g. initialization) under external control and without the knowledge of the multiplication unit's control.

I also noticed that the multiplication start signal (from the integer unit) was a source of trouble, changing in the same control clock domain when the multiplication unit's control would read it. To fix this, I inserted a datapath clock domain register so that, from the multiplication unit control point of view, the start signal can only change with the datapath signals. This guarantees that the start signal will be stable when the multiplication unit's control reads it.

Back to the productValid signal, I decided that a state machine was needed, which I call ProductValidAssistant. It controls a multiplexer that allows the productValid signal from the multiplication unit's control to either be transparently passed out of the multiplication unit, or suppressed.

By 2008-10-15, I had improved the productValid signal that emerges from the multiplication unit. The timing of a first pulse looked perfect, but there was a glitch in the form of a second pulse. More importantly, the product was no longer correct!

So I went back to the high-level implementation reference. It needed to be modified to support three-phase clocking and to better model initialization.

While working on the reference, I realized the counter in the multiplier unit's control no longer needed an enable signal. I could use the hold signal instead. I also wanted to hold in the initial state, with no shifting or adding, to support any delay between initialization of the product register and arrival of the start signal.

Something was still bothering me about the multiplier unit's control, which is a Mealy machine. I considered one particular state. We're sitting in that state, combinationally deciding whether to present a stable add or shift to the product register so that there will be no edge on that control at the next datapath clock positive edge. But the datapath clock positive edge can change the multiplier bit, and that affects the add/shift decision at that very instant!

To guarantee stability by design, for both the next-state decision and the product register's decision, I thoroughly considered the timing of the three signals: multiplier bit, start signal and the shift counter completion signal.

As for the multiplier bit, the product register is in the datapath clock domain, so the multiplier bit would change at the beginning of Phase 2. The multiplier bit can affect the next state, so it needs to be stable at the beginning of Phase 0 (positive edge of control clock). The multiplier bit can also affect the Mealy output logic as described above. I considered presenting changes to the multiplier bit to the multiplier unit's control only on the falling edge of the control clock (i.e. the beginning of Phase 1). That wouldn't work because the signal's source changes at the beginning of Phase 2. The multiplier unit's control wouldn't see that change yet at the beginning of Phase 0. This would cause an incorrect state change.

I was forced to consider four-phase clocking. The beginning of Phase 3 would be the falling edge of the datapath clock. Presenting the change in the multiplier bit to the multiplication unit's control only at the falling edge of the datapath clock would solve the problem. The multiplication unit's control would see that change at the beginning of Phase 0. This would also solve the problem of changing the multiplier bit and affecting the add/shift decision. Because the multiplication unit's control doesn't see the multiplier bit change right away, it holds the add/shift decision stable for the rising edge of datapath clock.

As for the start signal, as presented to the multiplier unit's control, it was now changing with the rising edge of datapath clock (Phase 2). The start signal was then latched by the multiplication unit's control at the beginning of Phase 0 causing a state change. The problem was that the add/shift decision in the initial state needed to be stable at the preceding edge of datapath clock. That decision had been inhibited before the start signal to allow for a possible delay after initialization. For reliable state change, the start signal needed to be stable on the rising edge of control clock (Phase 0). Therefore, Phase 1 (falling edge of control clock) is the correct time for the Mealy output logic to see the start signal activate. That makes the add/shift decision (i.e. unhold) stable by the time of the datapath clock positive edge (Phase 2). I also thought it would be acceptable to allow the start signal to be presented to the Mealy next-state logic at the falling edge of control clock.

As for the shift counter completion signal, I reasoned that no design action was required.

I also wanted to review my use of hold and enable signals.

control 1000 1000 ...
datapath 0010 0010 ...

Simulation on 2008-10-27 revealed that the product was not correct. I quickly pinpointed the time at which Sputnik deviated from the reference. And it was a failure to shift. Furthermore, it seemed that we were in the wrong state.

I investigated using the NetBeans IDE with the high-level implementation reference. Yes, we were in the wrong state. I then enhanced the reference to display state changes and looked into why Sputnik's state deviated from the golden model.

The nextState signal needed to be stable across the falling edge of control clock but wasn't because of the version of the start signal I was using. I fixed this by feeding the undelayed start signal into the Mealy next-state logic.

On 2008-10-29, multiplication was correct once again.

By 2008-10-30, all indications were that %r3 was being written with the product. But there was no activity on the instruction bus after multiplication completion.

Around this time, I was sick in bed so progress was slowed, but by 2008-11-12 I had determined that integer unit control was failing to send a write signal to the program counter. I made that happen during MULX Completion. Then I used another high-level model to predict exactly when the product would be stored to memory. Finally, I verified that with the FSS memory view.

On 2008-11-17, I released Sputnik005 to the verification team and turned my attention to automated verification.

I measured the speed of one simulation run on a Sun Fire V210 Server, estimated the size of an overnight workload and began to work on populating the test cases database.

I'm developing the in-house Random Test Program Generator (RTPG) in a NetBeans environment. On 2008-11-19, I upgraded to the newly released NetBeans IDE 6.5.

Comprehensive Functional Verification is teaching me some things. The book doesn't say much about a test case generator, but it does call it an expert system controlled by test case templates. RTPG already existed for the load-add-store sequence. It is now being enhanced to obey a test case template that controls whether those or load-multiply-store sequences are generated. As the level of expertise of the RTPG expert system increases, I will be tempted to productize the software.

I have also encountered very simple SPARC-V9 reference code in RTPG. I don't want it there. Such code belongs in the SPARC-V9 Standard Reference Model that will be part of FSS.

I'll be looking into refactoring FSS as I identify its checkers. The last thing I want to mention is that checking multiplication will be a challenge because the latency of the operation varies from one implementation to another. In Sputnik, multiplication latency varies with the operands. FSS must remain general enough to handle any SPARC-V9 implementation.