21264 information
Robert G. Brown
rgb@phy.duke.edu
Sat, 26 Sep 1998 23:56:27 -0400
On Sat, 26 Sep 1998, Greg Lindahl wrote:
> > While some I've talked to said FP
> > performace is inherently poor in RISCs,
>
> I would avoid generalizations like this as much as possible. The usual
> generalization is the reverse, but Intel continually surprises its
> critics by throwing lots of manpower at CPU design.
The generalization that FP performance is GOOD in RISC is really
pretty safe, since ALL CPUs are basically RISC these days. The alpha,
SPARC, MIPS and the P6-PII are all RISC core processors, with somewhat
different design philosophies and middleware layers. The alpha, for
example, is the RISC-iest RISC. It relies almost completely on
compiler intelligence for its performance. Having very little "upper
level" microcode on-chip, it can devote more space to logic and still
end up with fewer transistors. Fewer transistors = lower energy
consumption, which, along with smaller size, means that DEC has
consistently been able to run at the highest clocks from the day they
introduced the alpha. IF you can get well-optimized code and have
roughly equivalent arithmetic/logic on-chip parallelism and resources
(pipelined FPU's and the like), clock speed dominates overall
performance, and DEC's compilers have been quite good.
SPARC is (or was, I haven't looked at the architecture in detail in a
couple or three years) a very organized RISC. Lots of distinct
processing units on chip, lots of parallelism, a fair amount of
on-chip logic and (one of my favorite features of the architecture)
hardware context switches. The higher degree of organization has
meant that they cannot run at the highest clock speeds. They have
usually not significantly beat Intel, for example, in raw clock. but
they are a "real" RISC and have maintained a small edge over Intel at
equivalent clock. Back when they had a "real" operating system (I'm
sorry, I still have a hard time thinking of Solaris as anything more
than Sun's attempt to emulate Microsoft and write a "Windows NT") the
on-chip context switch meant that Sun would frequently blow away
nominally much faster systems under a real multitasking load, making
Suns running SunOS one of the best choices for a server by far (in
their day). I've never really run linux on Sparc, but it might well
still yield the excellent multitasking performance of Suns of yore.
I don't really know much about current MIPS or RS6000-descended chips
these days -- the last time I looked at their specs was probably five
years ago, which is too long. But both are RISC, of course. So is
the Power PC. But I couldn't tell you just how they are organized,
although I'm sure there are list participants who can.
The P6/PII is actually a very interesting chip. It is a RISC core
with a CISC microcode translation front end (oversimplified
description but hey). It therefore achieves near-RISC performance
while still allowing one to run native code that ran on the IBM-PC. No
kidding, old software never dies. Again, having all the extra
overhead of the instruction translation unit on chip means that they
make more heat and are bigger, so they cannot run their clock as fast
or put in quite as many parallel floating point units (and they need
honker heat sinks and fans to run at the clocks they DO run at). A
PPro CPU draws what, 36 watts? Without the heat sink you could bake
muffins in between the CPUs. I think that Intel has been gaining
relative ground as VLSI scale improvements have cut back the marginal
cost of the interface layer, allowing them to concentrate more on the
RISC core and crank up the clock, but they won't really get into the
game on a fair basis until maybe Merced, IF they make Merced, as
promised, a 64 bit chip that can no longer run DOS software written in
1982 except in a REAL emulator.
So no, RISC isn't inferior, it is the only game in town -- there are
no real CISC CPUs (that I can offhand think of) being made anymore
except for things like smart appliances and device controllers. The
reason is fairly clear from the above -- if the speed of raw logic and
arithmetic is directly proportional to clock, and clock is limited by
size/VLSI scale and heat dissipation, then one always wins in
achievable clock if one limits what one actually puts on silicon to
the minimum required to support the necessary logic in an efficient
way. This means fewer CPU instructions, and also means that a lot of
what used to be put ON chip in the way of organization and logic has
to be handled in software -- RISC.
I confess that I used to be somewhat distrustful of RISC -- given my
experience with software and bugs and Murphy's Law, I expected that
RISC kernels and compilers would have more than their share of serious
bugs and inefficiencies that would cost a lot of their theoretical
performance. I think that my fears have proven to be groundless --
even the very RISC-y alphas tend to have very good compilers and very
good operating systems and, in practice, have achieved the highest raw
clock and (I believe) the best performance at equivalent clock.
I think there is still room for fun at the interface between the
actual ALU and the code/data streams -- I'm not sure all the issues
like whether or not hardware context switches, etc. are worth the
space in silicon are completely resolved, and I'm pretty SURE that we
have not heard the last word in cache organization, but the CISC/RISC
"war" has been over for three or four years now, and RISC is the
undisputed winner.
rgb
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu