[Beowulf] 3.79 TFlops sp, 0.95 TFlops dp, 264 TByte/s, 3 GByte, 198 W @ 500 EUR

Prentice Bisbal prentice at ias.edu
Thu Dec 22 11:49:15 PST 2011


If you or anyone else on this are interested in learning more about the
anton architecture, there a bunch of links here:


There's a couple that give good descriptions of the anton architecture.
I read most of the computer-related ones  over the summer. Yes, that's
my idea of light summer reading!


On 12/22/2011 12:33 PM, Lux, Jim (337C) wrote:
> That's an interesting approach of combining ASICs with FPGAs. ASICs will
> blow the doors off anything else in a FLOP/Joule contest or a FLOPS/kg or
> FLOPS/dollar.. For tasks for which the ASIC is designed.  FPGAs to handle
> the routing/sequencing/variable parts of the problem and ASICs to do the
> crunching is a great idea.  Sort of the same idea as including DSP or
> PowerPC cores on a Xilinx FPGA, at a more macro scale.
> (and of interest in the HPC world, since early 2nd generation Hypercubes
> from Intel used Xilinx FPGAs as their routing fabric)
> The challenge with this kind of hardware design is PWB design. Sure, you
> have 1100+ pins coming out of that FPGA.. Now you have to route them
> somewhere. And do it in a manufacturable board: I've worked recently with
> a board that had 22 layers, and we were at the ragged edge of tolerances
> with the close pitch column grid array parts we had to use.
> I would expect the clever folks at DE Shaw did an integrated design with
> their ASIC.. Make the ASIC pinouts such that they line up with the FPGAs,
> and make the routing problem simpler.
> On 12/22/11 8:53 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
>> Just for the record - I'm only the messenger. I noticed  a
>> not-insignificant number of booths touting FPGAs at SC11 this year, so I
>> reported on it. I also mentioned other forms of accelerators, like GPUs
>> and Intel's MIC architecture.
>> The Anton computer architecture isn't just a FPGA - it also has
>> custom-designed processors (ASICS). The ASICs handle the parts of the
>> molecular dynamics  (MD)  algorithms that are well-understood, and
>> unlikely to change, and the FPGAs handle the parts of the algorithms
>> that may change or might have room for further optimization.
>> As far as I know, only 8 or 9 Antons have been built. One is at the
>> Pittsburgh Supercomputing Center (PSC), the rest are for internal use at
>> DE Shaw. A single Anton consists of 512 cores, and takes up 6 or 8
>> racks. Despite it's small size, it's orders of magnitude faster  at
>> doing MD calculations than even super computers like Jaguar and
>> Roadrunner with hundreds of thousands of processors. So overall, Anton
>> is several orders of magnitudes faster than an general-purpose processor
>> based supercomputer. And sI'm sure it uses a LOT less power. I don't
>> think the Anton's are clustered together, so I'm pretty sure the
>> published performance on MD simulations is for a single Anton with 512
>> cores
>> Keep in mind that Anton was designed to do only 1 thing: MD, so it
>> probably can't even run LinPack, and if it did, I'm sure it's score
>> would be awful. Also, the designers cut corners where they knew the
>> safely could, like using fixed-precision (or is it fixed-point?) math,
>> so the hardware design is only half the story in this example.
>> Prentice
>> On 12/22/2011 11:27 AM, Lux, Jim (337C) wrote:
>>> The problem with FPGAs (and I use a fair number of them) is that you're
>>> never going to get the same picojoules/bit transition kind of power
>>> consumption that you do with a purpose designed processor.  The extra
>>> logic needed to get it "reconfigurable", and the physical junction sizes
>>> as well, make it so.
>>> What you will find is that on certain kinds of problems, you can
>>> implement
>>> a more efficient algorithm in FPGA than you can in a conventional
>>> processor or GPU.  So, for that class of problem, the FPGA is a winner
>>> (things lending themselves to fixed point systolic array type processes
>>> are a good candidate).
>>> Bear in mind also that while an FPGA may have, say, 10-million gate
>>> equivalent, any given practical design is going to use a small fraction
>>> of
>>> those gates.  Fortunately, most of those unused gates aren't toggling,
>>> so
>>> they don't consume clock related power, but they do consume leakage
>>> current, so the whole clock rate vs core voltage trade winds up a bit
>>> different for FPGAs.
>>> The biggest problem with FPGAs is that they are difficult to write high
>>> performance software for.  With FORTRAN on conventional and vectorized
>>> and
>>> pipelined processors, we've got 50 years of compiler writing expertise,
>>> and real high performance libraries.   And, literally millions of people
>>> who know how to code in FORTRAN or C or something, so if you're looking
>>> for the highest performance coders, even at the 4 sigma level, you've
>>> got
>>> a fair number to choose from.  For numerical computation in FPGAs, not
>>> so
>>> many. I'd guess that a large fraction of FPGA developers are doing one
>>> of
>>> two things: 1) digital signal processing, flow through kinds of stuff
>>> (error correcting codes, compression/decompression, crypto; 2) bus
>>> interface and data handling (PCI bus, disk drive controls, etc.).
>>> Interestingly, even with the relative scarcity of FPGA developers versus
>>> conventional CPU software, the average salaries aren't that far apart.
>>> The distribution on "generic coders" is wider (particularly on the low
>>> end.. Barriers to entry are lower for C,Java,whathaveyou code monkeys),
>>> but there are very, very few people making more than, say, 150-200k/yr
>>> doing either.  (except in a few anomalous industries, where compensation
>>> is higher than normal in general).  (also leaving out "equity
>>> participation" type deals)
>>> On 12/22/11 7:42 AM, "Prentice Bisbal" <prentice at ias.edu> wrote:
>>>> On 12/22/2011 09:57 AM, Eugen Leitl wrote:
>>>>> On Thu, Dec 22, 2011 at 09:43:55AM -0500, Prentice Bisbal wrote:
>>>>>> Or if your German is rusty:
>>>>>> http://www.zdnet.com/blog/computers/amd-radeon-hd-7970-graphics-card-l
>>>>>> au
>>>>>> nched-benchmarked-fastest-single-gpu-board-available/7204
>>>>> Wonder what kind of response will be forthcoming from nVidia,
>>>>> given developments like
>>>>> http://www.theregister.co.uk/2011/11/14/arm_gpu_nvidia_supercomputer/
>>>>> It does seem that x86 is dead, despite good Bulldozer performance
>>>>> in Interlagos
>>>>> http://www.heise.de/newsticker/meldung/AMDs-Serverprozessoren-mit-Bulld
>>>>> oz
>>>>> er-Architektur-legen-los-1378230.html
>>>>> (engage dekrautizer of your choice).
>>>> At SC11, it was clear that everyone was looking for ways around the
>>>> power wall. I saw 5 or 6 different booths touting the use of FPGAs for
>>>> improved performance/efficiency. I don't remember there being a single
>>>> FPGA booth in the past. Whether the accelerator is GPU, FPGA, GRAPE,
>>>> Intem MIC, or something else,  I think it's clear that the future of
>>>> HPC
>>>> architecture is going to change radically in the next couple years,
>>>> unless some major breakthrough occurs for commodity processors.
>>>> I think DE Shaw Research's Anton computer, which uses FPGAs and custom
>>>> processors, is an excellent example of what the future of HPC might
>>>> look
>>>> like.
>>>> --
>>>> Prentice
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>> Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf

More information about the Beowulf mailing list