[Beowulf] Vector coprocessors
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Daniel Pfenniger daniel.pfenniger at obs.unige.chThu Mar 16 00:04:32 PST 2006
- Previous message: [Beowulf] Vector coprocessors
- Next message: [Beowulf] Vector coprocessors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The shipment of this accelerator card has been delayed many times. Last time I asked was October 2005. Apparently the first shipment has been made this month for a Japanese supercomputer with 10^4 Opterons. The cost is not indicated, but something like above $8000.- per card would put it outside commodity hardware. I wouldn't be astonished that more performance can be obtained in most applications with commodity clustering. If Clearspeed would consider mass production with a cost like $100.-$500.- per card the market would be huge, because the card would be competing with multi-core processors like the IBM-Sony Cell. The possibly most interesting niche for the Clearspeed cards appears to me accelerating proprietary applications like Matlab, Mathematica and particularly Excel that run on a single PC and that can hardly be reprogrammed by their users to run on a distributed cluster. Dan Bill Broadley wrote: > I noticed a few news reports on Intel/AMD considering the Clearspeed > co-processor. > > Looks like a fairly interesting widget, here's an Intel/Clearspeed paper > that describes it: > http://www.clearspeed.com/downloads/Intel%20Math%20Kernel%20whitepaper.pdf > > Some interesting snippets on the Clearspeed advance board: > * 192 pipelines, 2 flops per clock (not fused), 250 MHz, peak 96GFlops > (I believe this is for 2 chips) > * 50 GFlops sustained with the DGEMM kernel > * 1 GB of ram per board. > * 128 registers per PE, register file allows 3 reads 2 writes per clock > * 1.44 MB of SRAM that can deliver one word per FP op per clock. > * 800MB/sec over pci-x, enough for 50 GFlops on DGEMM. > * Less than 10 watts while sustaining 25 GFlops > * 1-D complex FFTs of 1024 elements @ 400k per second (20 GFlops with 32-bit), > but only 1/4th of that streaming because of pci-x bottlenecks. > * 12 GFlops when running 2-d FFTs (512x512 single precision) that are > resident on board (in the 1GB) > > In any case it looks like an interesting development. > > Speaking of which, what is the double precision peak rate of today's p4 > and opteron? One 128 bit SSE operation every other cycle (so 1 64 bit > flop per cycle)? I believe Intel mentioned doubling this rate at IDF > (shipping sometime in the 2nd half of this year). >
- Previous message: [Beowulf] Vector coprocessors
- Next message: [Beowulf] Vector coprocessors
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
