Sun Microsystems invited me, as a representative of Cosmic Horizon, to a press conference in the Bay Area on 2007-08-07 introducing the UltraSPARC T2 microprocessor. Sun is billing UltraSPARC T2 as "The Fastest Processor". Whenever a processor vendor makes such a claim, it makes a nice headline, but you need more information to put that claim in context.

David Yen, Ph.D. said "Based on the estimated results, the UltraSPARC T2 is setting new world records in two key industry-standard performance benchmarks. It achieves the highest single-chip SPECint_rate2006 world record number and it achieves single-chip SPECfp_rate2006 world record number." On SPECint_rate2006, IBM's POWER6 with a baseline result of 53.2 in a single-chip system is the microprocessor to beat. POWER 6 achieves this with 2 cores (UltraSPARC T2 has 8). On SPECfp_rate2006, POWER6 is again the microprocessor to beat, with a baseline result of 51.5. The POWER6 needs 4.7 GHz for these achievements; UltraSPARC T2 needs only 1.4 GHz for the estimated achievements. As Dr. Yen commented, "By maintaining pretty much the same CPU core frequency as its predecessor, the UltraSPARC T1, we more than doubled the performance in computing and networking performance without pushing the limit of power consumption. We didn't go to the extreme of CPU frequency, say 4.7 GHz, trying to squeeze that painful last drop of performance and yet at a huge cost of [power] consumption."

Once Sun publishes these results with SPEC, I will say that the "fastest processor" claim is justified. But as I suggested, such claims always require a footnote.

I had the opportunity to ask the following question:
Feldstein at UltraSPARC T2 launch

In "Computer Architecture" Fourth Edition, Hennessy and Patterson explain that the SPEC CPU benchmarks can be used "to construct a simple throughput benchmark where the processing rate of a multiprocessor can be measured by running multiple copies (usually as many as there are processors) of each SPEC CPU benchmark and converting the CPU time into a rate." SPECfp_rate2006 is an example of such a metric. On a single UltraSPARC T2, how many copies of a benchmark actually produce the best result? Would it be 16, because there are 2 simultaneous threads per core, or is it higher?

Rick Hetherington replied, "It's all 64 threads that are involved in the SPECfp_rate ... It was done with 64 threads internally. But ... eight threads will share one single floating-point unit."

Dave Patterson, Ph.D. replied, "It is clear. The more threads you've got the better, so think of it as 64 even though there's only eight dual cores. If you pretend that it's 64, that got back much better performance than if it was eight."

David Yen replied, "Actually, for UltraSPARC T2, you really should think about in terms of threads. The fact that it's 8 cores, it's a hardware choice for integration. And the whole virtualization concept is also based on thread as the resolution. That's why you can go all the way down to a single thread logical domain that's a 64 system total."

The main point of Jonathan Schwartz's reply was, "... the single biggest impediment to the adoption of all of the things we're talking about up here. It's not going to be the innovation, the performance and the value that can be delivered. It's going to be the accessibility of that value to the average developer ... So we have worked very very hard to make sure that all the innovation we just discussed is available to the broadest market possible ..."

Does that mean that each measurement was made with 64 copies of a benchmark and that no other number of copies would have produced a better SPECfp_rate2006 result? I don't know. I handed away the microphone too soon.