[Beowulf] Intel unveils 1 teraflop chip with 50-plus cores

Vincent Diepeveen diep at xs4all.nl
Wed Nov 16 09:44:43 PST 2011


Well look everyone here is looking to the part of the machine that is  
delivering the 'big punch' which either is the Tesla's or the AMD 6990.

However we shouldn't forget that in its basis each node is a 2 node  
intel Xeon machine having 2 intel Xeon cpu's and requires a very fast  
network.
The weakness of the network is not only the network, but gets  
especially determined by the quality of those 2 cpu's, as they have  
to feed the GPU
and more importantly also the network.

Furthermore a part of the software is uncapable of running at GPU's  
and has to run on the cpu.

That said the big punch being a Tesla, it's obvious that this can't  
be so high clocked like the gamerscards, as it focuses more upon  
reliability.

We see recently that the bandwidth transfers one can achieve from CPU  
to GPU have tremendeously improved in bandwidth. From 2 GB/s to many  
gigabytes
per second now and approaching also a big part of the total bandwidth  
the ram CAN deliver.

Suppose we do a big multiplication of some giant prime number using a  
safe form of FFT (of course there is specialized forms here that are  
faster,
but for readability i call it and not DWT).

Now we can see the FFT as something that in O ( log n ) steps is  
doing a number of things. Only in the last few phases of the O ( log  
n ) we actually
need communication between all the nodes.

Basically there is nothing that prevents us from doing a double check  
of the results at a different GPU, meanwhile we are busy with the  
finalizing steps,
as the majority of the GPU's basically idle anyway at that point.

So if we would calculate just a single number, we can rather  
efficiently do a double check. of our GPU calculations, as the  
crunching power of those things is
much above anything else that it's always ahead of any other step.

Only if you already run other independant calculations at the same  
time, you can keep those GPU's busy.

However if you'd run independant calculations, where do you need that  
massive huge expensive cluster for, as you could also give each  
machine its own number and
just sit and wait until they all finished with it. So that's an  
embarrassingly parallel approach where basically it's a JBOM, "just a  
bunch of machines".

In order words if we take advantage of the cluster as a whole to  
speedup the calculation, then the crucial reliability part of the  
calculation gets done by the
CPU's, not by the GPU; it would be easy to give the GPU double  
checking time of results previously calculated and a simple  
comparision which happens while we
are already some steps further, would occur.

With GPU's you simply do have the system time to double check and you  
MUST double check; there is no reason to not buy GPU's with millions  
of sustainability demands,
as the reaosn why they're so fast also is the reason why it's cheap  
and that's also the reason why you need to double check.

So cheap kick butt GPU's is the way to go for now.


On Nov 16, 2011, at 1:49 PM, Micha wrote:

> They are just busting the one teraflop but they are going with it  
> into the GPU market, only without a GPU, i.e. they're competing  
> with the Tesla GPU here. The Tesla admittedly is also about 1  
> TFlops but the consumer market has already gone past the 2 TFlop  
> mark about a year ago and the next generation is just around the  
> corner (will be operational before the mic). And the funny part is  
> that its a discrete (over pci) card that is running a software  
> micro-kernel ands scheduler that you can ssh into.
> I'm not sure how much I buy into the hype their selling that it's  
> the next best thing because its x86 so you run the same code,  
> although aparantly its not binary compatible, so you do need to  
> recompile. And I think we all know that real world codes need a  
> rework to transfer well to different vector sizes and communication/ 
> synchronization/etc. So why is it so much better than picking up an  
> AMD or NVIDIA?
>
> Eugen Leitl <eugen at leitl.org> wrote:
> http://seattletimes.nwsource.com/html/technologybrierdudleysblog/ 
> 2016775145_wow_intel_unveils_1_teraflop_c.html
>
> Wow: Intel unveils 1 teraflop chip with 50-plus cores
>
> Posted by Brier Dudley
>
> I thought the prospect of quad-core tablet computers was exciting.
>
> Then I saw Intel's latest -- a 1 teraflop chip, with more than 50  
> cores, that
> Intel unveiled today, running it on a test machine at the SC11  
> supercomputing
> conference in Seattle.
>
> That means my kids may take a teraflop laptop to college -- if  
> their grades
> don't suffer too much having access to 50-core video game consoles.
>
> It wasn't that long ago that Intel was boasting about the first  
> supercomputer
> with sustained 1 teraflop performa nce. That was in 1997, on a  
> system with
> 9,298 Pentium II chips that filled 72 computing cabinets.
>
> Now Intel has squeezed that much performance onto a matchbook-sized  
> chip,
> dubbed "Knights Ferry," based on its new "Many Integrated Core"  
> architecture,
> or MIC.
>
> It was designed largely in the Portland area and has just started
> manufacturing.
>
> "In 15 years that's what we've been able to do. That is stupendous.  
> You're
> witnessing the 1 teraflop barrier busting," Rajeeb Hazra, general  
> manager of
> Intel's technical computing group, said at an unveiling ceremony.  
> (He holds
> up the chip here)
>
> A single teraflop is capable of a trillion floating point  
> operations per
> second.
>
> On hand for the event -- in the cellar of the Ruth's Chris Steak  
> House in
> Seattle -- were the directors of the National Center for Computational
> Sciences at Oak Ridge Laboratory and the Application Acceleration  
> Center of
> Excellence.
>
> Also speaking was the chief science officer of the GENCI  
> supercomputing
> organization in France, which has used its Intel-based system for  
> molecular
> simulations of Alzheimer's, looking at issues such as plaque  
> formation that's
> a hallmark of the disease.
>
> "The hardware is hardly exciting. ... The exciting part is doing the
> science," said Jeff Nichols, acting director of the computational  
> center at
> Oak Ridge.
>
> The hardware was pretty cool, though.
>
> George Chrysos, the chief architect of Knights Ferry, came up from the
> Portland area with a test system running the new chip, which was  
> connected to
> a speed meter on a laptop to show that it was running around 1  
> teraflop.
>
> Intel had the test system set up behind closed doors -- on a coffee  
> table in
> a hotel suite at the Grand Hyatt, and wouldn't allow reporters to take
> pictures of the setup.
>
> Nor would the company spe cify how many cores the chip has -- just  
> more than
> 50 -- or its power requirement.
>
> If you're building a new system and want to future-proof it, the  
> Knights
> Ferry chip uses a double PCI Express slot. Chrysos said the systems  
> are also
> likely to run alongside a few Xeon processors.
>
> This means that Intel could be producing teraflop chips for personal
> computers within a few years, although there's lots of work to be  
> done on the
> software side before you'd want one.
>
> Another question is whether you'd want a processor that powerful on  
> a laptop,
> for instance, where you may prefer to have a system optimized for  
> longer
> battery life, Hazra said.
>
> More important, Knights Ferry chips may help engineers build the next
> generation of supercomputing systems, which Intel and its partners  
> hope to
> delivery by 2018.
>
> Power efficiency was a highlight of another big announcement this  
> week at
> SC11. On Mon day night, IBM announced its "next generation  
> supercomputing
> project," the Blue Gene/Q system that's heading to Lawrence Livermore
> National Laboratory next year.
>
> Dubbed Sequoia, the system should run at 20 petaflops peak  
> performance. IBM
> expects it to be the world's most power-efficient computer,  
> processing 2
> gigaflops per watt.
>
> The first 96 racks of the system could be delivered in December. The
> Department of Energy's National Nuclear Security Administration  
> uses the
> systems to work on nuclear weapons, energy reseach and climate  
> change, among
> other things.
>
> Sequoia complements another Blue Gene/Q system, a 10-petaflop setup  
> called
> "Mira," which was previously announced by Argonne National Laboratory.
>
> A few images from the conference, which runs through Friday at the  
> Washington
> State Convention & Trade Center, starting with perusal of Intel  
> boards:
>
>
> Take home a Cray today!< br />
> IBM was sporting Blue Genes, and it wasn't even casual Friday:
>
> A 94 teraflop rack:
>
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list