Archives

You are currently viewing archive for August 2007
Category: General
Posted by: afeldstein
I hope all of you are enjoying FSS Version_0-006, which I released 2007-05-13.

Right after release, I went to work on the installation instructions and instructions for using the postsimulation application. I used a computer that had never seen the FSS application suite before to discover the dependencies. It took me until 2007-06-03 to complete those instructions.

I then made several software improvements before getting back to that with which I wanted to play, integer multiplication. You will see these improvements when I release Version_0-007.

I made the postsimulation application more robust and gave it the capability to transparently perform some database cleanup.

Attempting to Load 1 Program, then canceling, causes NullPointerException. I fixed that.

I observed an address sorting problem in the memory view with the high half of the address space (i.e. addresses whose most significant bit is 1) and fixed that. The solution had to do with using
org.apache.axis2.databinding.types.UnsignedLong
in more places. By the way, my accepted (but unacknowledged) contributions to that class make it usable, and Sun (if you are reading this), I advise you to consider acquiring that and related classes if you ever decide to add support for unsigned integers to Java.

I made the simulator's ELF loader more robust, explicitly checking EI_CLASS and EI_DATA.

The simulator application has one dependency that I really don't like. You will experience a fatal exception if you don't have Apache Derby installed. The reason I don't like that dependency is that I have given you the option to limit yourself to interactive use of FSS (i.e. without automated verification). If you don't intend to use automated verification, then you don't need Derby. But you do. So I fixed this false Derby dependency.

I suspected that there would be a problem selecting Automated Verification, deselecting it, then selecting it again. There was. That's fixed now.

I could no longer tolerate the fact that the postsimulation application always shows all results. This made it difficult to find any failing results by scrolling over tens of thousands of rows after an overnight automated verification run. So I added a feature allowing the user to select to display all results or only failing results.

Then I made a software architecture improvement by limiting which parts of the simulator application depend on the JHDL class library.

In the midst of that software architecture improvement, I prepared (in StarOffice Impress) and rehearsed a presentation on this Java project for the Java Austin Special Interest Group. Due to poor attendance and venue, however, the presentation was never delivered.

Finally on 2007-07-21, I got back into the Sputnik microprocessor and its implementation of the SPARC-V9 MULX instruction. I had been looking forward to this for quite some time.

Sputnik is currently part of the FSS application suite. To make FSS useful for its intended purpose (i.e. verification of your SPARC-V9 implementation), Sputnik will eventually have to be replaced by a virtual socket in the testbench. And the SPARC-V9 Standard Reference Model described in Requirement 1.7.2 will have to be developed. For now, all of the SPARC-V9 instruction execution capability of FSS is coming from Sputnik.

An implementation of an integer multiplier has been in place in Sputnik (e.g. FSS Version_0-006) and it is being clocked. However, Sputnik's control unit has not been babysitting the multiplier. In fact, Sputnik does not recognize MULX, regarding it as an illegal instruction. Implementing that is the next step.

On 2007-08-03, I observed that the product_valid signal out of the multiplier is asserted for the first time, so Sputnik is multiplying!

But it's not multiplying correctly at this point. It may be simple to debug this Sputnik problem using jdb, but I have to think about how you will debug your design under verification (DUV). When debugging Sputnik I will limit myself to the FSS user environment. You will (eventually) have your own DUV, described in JHDL. That DUV won't meet FSS until the testbench has a virtual socket. Even then, I have not thought through whether or not debugging with jdb will be possible.

More importantly, look at what the SRS promises.

Requirement 1.9.3: Those are interactive debugging features of the controller. They will be very useful and very time-consuming to implement.

Technically, "stop on product_valid" is not part of Requirement 1.9.3, although Requirement 1.9.3.3 must be satisfied even after "stop on product_valid" is implemented. It was foolish of me to omit "stop on product_valid" from the SRS. Requirement 1.9.3 provides features to stop on a MULX instruction in assembly language, either by stepping or by breakpoint, but it does not force me to provide users with the ability to step within the MULX instruction!

In the meantime, we have Requirement 1.8.2: These are waveform features of the view. I have already started work on this. The requirement doesn't say much about this, just that the user can monitor both internal and external signals. (He/she already can.) Postsimulation tools are outside the scope of the SRS, but as long as I am within the simulator application (and specifically within the view) and making its waveform features more useful, then it is fair to say that I am working on Requirement 1.8.2.

So I've put the hardware design on hold for the moment while I go back into the software to make the waveform viewer more powerful.
Category: General
Posted by: afeldstein
Also at the press conference in the Bay Area on 2007-08-07, Jonathan Schwartz said, "I'm actually very proud to announce today we're repeating [what we did with UltraSPARC T1] with the UltraSPARC T2. Niagara2's design files and test suites, the specifications for creating the microelectronics will now be available under the GPL license as well." David Yen, Ph.D. added, "... following what we have done with UltraSPARC T1, we are open-sourcing the UltraSPARC T2 technology. We are making immediately available the OpenSPARC T2 Technology Programmer Reference Manual and Microarchitecture Specification to jump-start the developer community. We are also rolling out a [NDA] Developer Beta Program in which we'll provide access to the Verilog RTL design files and the corresponding test suites."

Cosmic Horizon will be there to support not only the OpenSPARC T2 developer community, but also any other SPARC-V9 microprocessor development community. Please refer to "OpenSPARC progress toward FSS" to understand how Cosmic Horizon fits in with OpenSPARC. With the new announcement from Sun, Cosmic Horizon intends to join their OpenSPARC T2 Beta Program and to continue the work described in my blog, switching to OpenSPARC T2 as the guinea pig.
Category: General
Posted by: afeldstein
Sun Microsystems invited me, as a representative of Cosmic Horizon, to a press conference in the Bay Area on 2007-08-07 introducing the UltraSPARC T2 microprocessor. Sun is billing UltraSPARC T2 as "The Fastest Processor". Whenever a processor vendor makes such a claim, it makes a nice headline, but you need more information to put that claim in context.

David Yen, Ph.D. said "Based on the estimated results, the UltraSPARC T2 is setting new world records in two key industry-standard performance benchmarks. It achieves the highest single-chip SPECint_rate2006 world record number and it achieves single-chip SPECfp_rate2006 world record number." On SPECint_rate2006, IBM's POWER6 with a baseline result of 53.2 in a single-chip system is the microprocessor to beat. POWER 6 achieves this with 2 cores (UltraSPARC T2 has 8). On SPECfp_rate2006, POWER6 is again the microprocessor to beat, with a baseline result of 51.5. The POWER6 needs 4.7 GHz for these achievements; UltraSPARC T2 needs only 1.4 GHz for the estimated achievements. As Dr. Yen commented, "By maintaining pretty much the same CPU core frequency as its predecessor, the UltraSPARC T1, we more than doubled the performance in computing and networking performance without pushing the limit of power consumption. We didn't go to the extreme of CPU frequency, say 4.7 GHz, trying to squeeze that painful last drop of performance and yet at a huge cost of [power] consumption."

Once Sun publishes these results with SPEC, I will say that the "fastest processor" claim is justified. But as I suggested, such claims always require a footnote.

I had the opportunity to ask the following question:
Feldstein at UltraSPARC T2 launch

In "Computer Architecture" Fourth Edition, Hennessy and Patterson explain that the SPEC CPU benchmarks can be used "to construct a simple throughput benchmark where the processing rate of a multiprocessor can be measured by running multiple copies (usually as many as there are processors) of each SPEC CPU benchmark and converting the CPU time into a rate." SPECfp_rate2006 is an example of such a metric. On a single UltraSPARC T2, how many copies of a benchmark actually produce the best result? Would it be 16, because there are 2 simultaneous threads per core, or is it higher?

Rick Hetherington replied, "It's all 64 threads that are involved in the SPECfp_rate ... It was done with 64 threads internally. But ... eight threads will share one single floating-point unit."

Dave Patterson, Ph.D. replied, "It is clear. The more threads you've got the better, so think of it as 64 even though there's only eight dual cores. If you pretend that it's 64, that got back much better performance than if it was eight."

David Yen replied, "Actually, for UltraSPARC T2, you really should think about in terms of threads. The fact that it's 8 cores, it's a hardware choice for integration. And the whole virtualization concept is also based on thread as the resolution. That's why you can go all the way down to a single thread logical domain that's a 64 system total."

The main point of Jonathan Schwartz's reply was, "... the single biggest impediment to the adoption of all of the things we're talking about up here. It's not going to be the innovation, the performance and the value that can be delivered. It's going to be the accessibility of that value to the average developer ... So we have worked very very hard to make sure that all the innovation we just discussed is available to the broadest market possible ..."

Does that mean that each measurement was made with 64 copies of a benchmark and that no other number of copies would have produced a better SPECfp_rate2006 result? I don't know. I handed away the microphone too soon.