[Beowulf] Teraflop chip hints at the future
James.P.Lux at jpl.nasa.gov
Tue Feb 13 09:36:36 PST 2007
At 07:03 AM 2/13/2007, Richard Walsh wrote:
>Mark Hahn wrote:
>>>It looked like it did IEEE754 doubles. Any Intel types out there
>>IMO, the chip is mainly interesting to explore how much we can abandon
>>the von Neumann architecture as a whole, rather than stupidly putting
>>more and more of them onto a chip. after all, the nearest-neighbor
>>latency (125 ps!) is comparable to cache or even register-file.
>Yes, but how much does it really abandon von Neumann. It is just a lot
>of little von Neumann machines unless the mesh is fully programmable
>and the DRAM stacks can source data for any operation on any cpu as
>the application's data flows through the application kernel(s) however it
>is laid out across the chip. And in that case it is a multi-core
>an FPGA ... why not just use an FPGA ... ;-) ... and avoid wasting all those
>hard-wired functional units that won't be needed for this or that particular
In fact, modern high density FPGAs (viz Xilinx Virtex II 6000 series)
have partitioned their innards into little cells, some with ALU and
combinatorial logic and a little memory, some with lots of memory and
not so much logic.
And, you can program them in Verilog, which is a fairly high level
language. There are huge libraries of useful functions out there
that you can "call".
It's still a bit (a lot?) clunky compared to zapping out C code on a
general purpose machine, but it can be done.
of an array of FPGA cores on a chip (super-FPGA model). Less wasted
>hardware. In some sense, these super, multi-mini-core designs are another
>ASIC hammer looking for a nail. Fixed instruction architectures ultimately
>waste hardware. Why not program the processor instead of instructions
>for a predefined one-size fits all ASIC?
I think that as a general rule, the special purpose cores (ASICs) are
going to be smaller, lower power, and faster (for a given technology)
than the programmable cores (FPGAs). Back in the late 90s, I was
doing tradeoffs between general purpose CPUs (PowerPCs), DSPs
(ADSP21020), and FPGAs for some signal processing applications. At
that time, the DSP could do the FFTs, etc, for the least joules and
least time. Since then, however, the FPGAs have pulled ahead, at
least for spaceflight applications. But that's not because of
architectural superiority in a given process.. it's that the FPGAs
are benefiting from improvements in process (higher density) and
nobody is designing space qualified DSPs using those processes (so
they are stuck with the old processes).
Heck, the latest SPARC V8 core from ESA (LEON 3) is often implemented
in an FPGA, although there are a couple of space qualified ASIC
implementations (from Atmel and Aeroflex).
In a high volume consumer application, where cost is everything, the
ASIC is always going to win over the FPGA. For more specialized
scientific computing, the trade is a bit more even... But even so,
the beowulf concept of combining large numbers of commodity computers
leverages the consumer volume for the specialized application, giving
up some theoretical performance in exchange for dollars.
James Lux, P.E.
Spacecraft Radio Frequency Subsystems Group
Flight Communications Systems Section
Jet Propulsion Laboratory, Mail Stop 161-213
4800 Oak Grove Drive
Pasadena CA 91109
More information about the Beowulf