I said last time that "things will be picking up dramatically". I'm pretty satisfied with recent progress, which I detail in this blog item, along with my thoughts on what comes next after release of FSS Version_0-007.

The return to Austin began with such activities as those of 2008-07-15, when I was updating the CVS sandbox on the Sun workstation where I do most of my development. There were some lingering SSH issues, which I resolved. That same day, I was working on implementing the constrained random idea I had thought of nearly two months earlier ("I got the idea to generate each operand bit randomly."). The simplest such probability curve is linear from bit 31 through bit 63.

Any random variable whose only possible values are 0 and 1 is called a "Bernoulli random variable". And for each bit, what I have is a Bernoulli distribution.

I looked at some existing Java packages, and even spoke with Dr. Kevin J. Healy about com.threadtec.silk.random.Bernoulli, but realized that the algorithm is trivial and implemented the required functionality myself.

Keep in mind that this constrained random solution isn't intended to become part of any Cosmic Horizon product, although it is likely to be integrated with the in-house RTPG.

The linear probability curve was successful, randomly generating a pair of positive operands whose product would not overflow a 64-bit-output multiplication functional unit, in about two hours on a Sun Fire V210 Server. That performance is good enough for now.

My test case is based on:
0000000b63b77a76 multiplicand
0000000003f93193 multiplier
2d415885df91e7c2 expected product

I was ready to simulate, but some DOS line endings had been committed to CVS while I was in the Boston area and that was causing me some grief. As I said, "Windows is a Difficult Platform on which to Develop Software".

The most troubling result on 2008-07-20 was that the product_valid output from the multiplication functional unit was still moving around in time from one simulation run to the next. The time of product_valid is supposed to be deterministic for fixed operands.

With the FSS waveform view, I observed some unexpected behavior. I had been clocking the product register with a signal generated by the multiplier control. The idea (dating back to 1998) has been to deliberately separate those clock edges from transitions in traditional control signals to the product register, to avoid hold-time violations. Elsewhere in Sputnik's integer unit I am employing two clocks, 180 degrees out of phase with respect to each other, one for the datapath and one for the control. I had reasoned (with some discomfort, apparently anticipating this problem) that the multiplication functional unit was part of the datapath. Therefore, it should receive only the datapath clock. Guess where the multiplier's embedded controller was getting its clock from.

I never liked that and it was time to fix it. MultiplierControl needed to be out of phase with Multiplier's datapath.Otherwise, MultiplierControl could make decisions at the same clock edge when the signals it reads to make its decisions are changing. Likewise, output from MultiplierControl to Multiplier's datapath (e.g. to a mux select) must not be changing at the same clock edge when data is latched.

I decided to make MultiplierControl part of the control clock domain. I just needed to ensure that, at the Multiplier interface, all output was triggered by the datapath clock domain.

As for output from MultiplierControl to Multiplier's datapath, I had endeavored to guarantee by design of the add-shift control state machine that such output is not changing at the same clock edge when data is latched, even though all was orchestrated in the control clock domain. This idea originated in the 1998 design and I reasoned that it could be greatly simplified and accelerated (roughly doubling performance) by using dual clocks inside the Multiplier.

I think you get the idea.

On 2008-07-23, I still could not find the correct product.

Again, I went in with the FSS waveform view to debug. I found something that wasn't right and decided that I needed to look at the output of the shift counter. Just then I remembered that I had to deal with output from MultiplierControl to IntegerUnit's datapath and ensure that it only changes in the datapath clock domain. I found a signal that needed that type of adjustment and fixed the Sputnik model accordingly.

On 2008-07-25, the tape drive arrived. (Yes, I had decided that it was absolutely necessary.)

A large number of Solaris 10 patches were required to support the tape drive.

The tape drive lacked even a basic user guide. Sun really dropped the ball on this one.

I decided that backup software would make the task clearer and systematically evaluated each product listed on this page. I quickly got the list of finalists down to BakBone NetVault, CA BrightStor ArcServe and HP Data Protector. I then eliminated CA BrightStor ArcServe when I couldn't find a Solaris trial version.

HP Data Protector looked promising and I burned a set of CD-RWs. Its Installation Server readme revealed, however, that "Solaris 8/9/10 (SPARC)" were among the supported clients, but not Solaris 9 IA-32. One such computer was particularly important as it hosted my CVS repository. So I migrated the repository to a Solaris 10 SPARC computer and proceeded to evaluate HP Data Protector. Then I read in "hp OpenView storage data protector 6.0, Platform & Integration Support Matrices" that "Sun Solaris 10 (32-bit & x64)" were among the supported clients after all. The original host of my CVS repository could have become a supported client after an operating system upgrade. Oh well, I'm not going back. The problem that kills HP Data Protector for me is the list of platforms that support its GUI:

Windows XP HE
Windows XP PRO (32-bit)
Windows XP 64-bit (Itanium and x64)
Windows 2000
Windows 2003 (32-bit)
Windows 2003 (64-bit) (Itanium and x64)
HP-UX (PA-RISC) 11.0, 11.11, 11.23, 11.31
Solaris 8 (Sparc)

"For cell managers running on ... Solaris 9 [and Solaris 10], the GUI needs to be installed on a remote system." I don't have any of the above platforms. I certainly don't want to downgrade a SPARC workstation from Solaris 9 to Solaris 8. The only platform from the above list that I would even consider purchasing is Windows XP 64-bit (Itanium).

And then there was BakBone NetVault. "It is important to ensure that the backup device is connected and functional and that backups can be performed to them (e.g., through the use of any native O/S tools used for minimal backup operations on the device. If these native backup tools can't see an attached backup device, then neither will NetVault)." I was wrong. Backup software is not making the task clearer. I still had to figure out how to perform Step 1 on my own. I did (and I'll be happy to share those details with anyone who asks me).

My goal was to reach the milestone of having my CVS repository backed up to a tape that is located off-site, before putting tape backup activities on the back burner. And I achieved that milestone "through the use of ... native O/S tools". I still had to evaluate BakBone NetVault before the evaluation period expired, but I was anxious to get back to the fun stuff.

On 2008-08-12, I trying to checkout a new CVS sandbox. (Recall that the repository had moved.) I had to upgrade the CVS software on the Solaris 10 repository host to match the version the repository had last seen (1.12.7). Upgrading to the latest stable version was not successful (unrecognized keyword 'UseNewInfoFmtStrings'). Ultimately, I suppose I'm going to have to solve that problem, lest I be stuck on 1.12.7 forever. Why don't you leave me a blog comment with the solution?

Checking out my main project was insufficient for building FSS. I needed to first checkout and build Cosmic Horizon JHDL. (You can get it the easy way.) I also had to checkout a couple of local minor projects that I depend on (and that are bundled with FSS from your point of view).

By 2008-08-18, I was building and launching FSS again. I was not finding the correct product.

On 2008-08-19, I discovered that the shift counter was not zero at the beginning of multiplication.

The counter is cleared only by the reset wire. If the counter only sees shift pulses during multiplication, and all multiplications complete, then clearing the counter by the reset wire is sufficient. This is because that last shift pulse will roll over the counter, effectively initializing it.

The problem was that the datapath clock always reaches the counter, and on each clock we either add or shift. I decided to add a counterEnable signal from MultiplierAddShiftControl.

On 2008-08-20, I discovered that product_valid was still moving around in time from one simulation run to the next. The counter was successfully holding until multiplication begins, but failed to clear in response to power-on reset.

The counter expects a synchronous reset. If all conditions for seeing the clock inside the counter are not met, then the counter cannot be reset. The solution was to let the counter have shift and enable inputs separate from the clock. Actually, shift and enable can still be combined with an external AND gate, but they must be separate from the clock.

On 2008-08-22, it was looking like the product might be right, but it was impossible to be certain without a waveform zoom feature. The time had come for me to take a break from the joy of logic design and return for a while to the land of GUI programming.

My first inclination was to disable the zoom feature while the simulation is running. But a distinguishing feature of FSS is that the waveforms are drawn during simulation. Typical simulation runs are expected to be too long to yield readable waveforms when the entire simulation run is shown. Therefore, it would defeat the purpose of simultaneous drawing to disallow zooming during simulation. The feature is expected to be problematic, but not impossible. I decided to leave it enabled, but to avoid using it during simulation, and to regard it as a low-priority task to fix the expected simultaneous zoom issues.

Using a top-down approach, I started with the design of the user control. I envisioned a dual slider (i.e. two markers that move in one track but cannot pass each other) for inputing the interesting cycle range subset of the entire simulation run. I had never heard of such a thing, but went looking for one anyway. I found the component in the InfoVis project of Jean-Daniel Fekete. It gives me a pair of integers and I just had to figure out how to use those numbers to enact the zoom itself.

I don't think most of the zoom implementation details will be very interesting to my intended audience. I will say, however, that there were two problems to tackle: (1) zooming the waveforms and (2) adjusting the cursor on zoom

By the way, I don't think you've seen the waveform cursor yet. The FSS waveform view is much more powerful than it was in Version_0-006. It is one of many reasons that I am anxious to release Version_0-007 to you.

In the midst of implementing the zoom feature, I was continuing with my BakBone NetVault evaluation. I successfully completed some backups and was learning how to perform a restore, when I noticed a problem where the GUI was failing to identify the file and directory icons that I was browsing. I sent an email (with an attached screenshot) to support@bakbone.com on 2008-09-04. There has been no reply and I have put my evaluation on hold until there is.

By 2008-09-10, waveforms were zoomed. By the next day, I wrote in my notes, "Issues like asymmetry and difficulty in determining the exact position of an edge seem so be solved with the zoom feature, now that the cursor seems to be fixed as well."

On 2008-09-12, I determined that the product was not correct. Using the tools at hand (i.e. FSS in its present state of development), I would need to walk through the multiplication, looking for the point of failure. I did that for a few cycles, and all was perfect. I also know that, around the time of product_valid (73 cycles later), the correct product is not found. It is not reasonable to expect you to walk through the multiplication cycle-by-cycle, phase-by-phase, without any kind of assistance. Doing this manually will not endear FSS to its users.

I started brainstorming in my notes. I also wondered what my colleagues at IBM would have done, people like Bruce Wile, John Goss and Wolfgang Roesner. I bought Comprehensive Functional Verification: The Complete Industry Cycle (Systems on Silicon) and I'm waiting for it to arrive.

In the meantime, it seems to me that the fundamental issue is that FSS is about SPARC-V9 architectural verification. It can't be expected to know about implementation details. It has correctly indicated that Sputnik fails to multiply correctly. I'm asking myself what else FSS can do to help. I have a few ideas.

But what's really needed is something beyond FSS. I call that new thing a "high-level implementation reference". In fact, it's almost done now and hasn't taken very long to develop. This reference will tell me what the internal state of Sputnik's multiplication functional unit should be on each cycle (and on each phase within a cycle). Once I have proven that the reference multiplies correctly, I will use it to compare with data I'm seeing about Sputnik in the FSS waveform view.

I want to go light on "wish list" features because it has been about 16 months since the last FSS release and I'm just trying to get to the planned milestone. For Version_0-007, unless pressured by you for an earlier release, I want to not only have the correct MULX result stored into memory in interactive mode (positive operands whose products don't overflow 64 bits), but also have some automated verification in place for load-multiply-store.

As for finding the SPEC CPU2000 integer benchmark program that uses MULX the most, (it will be SPEC CPU2006 now) that does nothing to FSS, so Version_0-007 release can precede that activity. But I want to do that shortly after. On 2007-07-19, I wrote in my notes, "I want to fully implement SPARC-V9 in Spunik while minimizing re-implementation (with integer multiplication re-implementation as a likely exception)." Was I referring to the re-implementation that I just did (i.e. halving execution time) or something more interesting? I'll have to decide how long I want to remain on multiplication before moving on with Sputnik. In other words, Version_0-008 has yet to be defined.

It's really a question of business vs. pleasure. The product is FSS, not Sputnik. And FSS must be capable of verifying the entire SPARC-V9 ISA in a design under verification. From a business point of view, therefore, I should get on with any implementation of all instructions in Sputnik. But you're not paying me, and I really enjoy computer arithmetic. After Version_0-007 release, I'm leaning towards doing a little of both. I might first find the SPEC CPU2006 integer benchmark program that uses MULX the most, and then use that benchmark to motivate me to implement more instructions (releasing Version_0-008 along the way, sooner rather than later). Once that benchmark is running on Sputnik, I could then return to integer multiplication and try other implementations, measuring the performance of each implementation on the benchmark. After some fun with that, I would return to finishing Sputnik (and FSS support for its architectural verification).