[Beowulf] cpu's versus gpu's - was Intel buys QLogic InfiniBand business

Thu Jan 26 04:35:40 PST 2012

On Jan 26, 2012, at 1:28 PM, Vincent Diepeveen wrote:

> Mike you replied to me not to mailing list.
>
> note that itanium2 released too late and it was $100k a box  
> initially and $7500 a cpu (1.5Ghz) if you ordered a 1000.
> And it had same IPC for integers like opteron at the time (later on  
> compilers got pgo for opteron as well and then opteron was faster,
> at least for diep, in ipc).
>
> Larrabee indeed resembles itanium to some extend, but not quite.
> intels expertise is producing highclocked cpu's. itanium was a low  
> clocked cpu and therefore failed.
> no one pays big bucks for a low clocked cpu. look on ebay -  
> cheapest cpu's always the lowclocked ones.
>
> larrabee is something in between a cpu and a gpu so total other  
> ballgame - intel moving to a market where they actually have  
> competition
> and are not the ones owning the patents.
>
> So that's not gonna be easy for intel some years from now if they  
> show up with a 100% vectorized design and not some dreadnought
> in between cpu and gpu which is low clocked.
>
> As for your infiniband remark realize that it took 25 years or so  
> to bugfix ethernet everywhere - forget 'setting a new standard'  
> there for the average Joe.
> Not gonna work.
>
> Infiniband is meant for HPC and uses MPI protocol to communicate.  
> This is very powerful for clusters and the way to go when scaling  
> at supercomputers,
> yet it's not gonna conquer average joe's machine, as there is a  
> price to pay which is too high for now.
>
> However realize some of sales of the HPC manufacturers goes to low  
> latency ethernet - my guess is that intel will use qlogics know how  
> there to improve
> their cheapo cpu's and upgrade them with better ethernet. Seems  
> plausible goal and a very useful one, the rest, such as rivalling  
> Mellanox at ethernet,
> that's not gonna happen.
>

Oops small typo during speedy write. "mellanox at ethernet" should of  
course be 'mellanox at HPC'.

The question is whether typical low latency ethernet products are  
gonna suffer from intels move. I doubt solarflare will.
they already deliver this stuff only to those who really battle for  
every picosecond, so price is just not the issue there.

Vincent

> On Jan 26, 2012, at 7:23 AM, MDG wrote:
>
>> Technically the Itanium Chip was a failure, it was not x86 100%  
>> compatible and actually was for servers but often under-preformed  
>> the traditional x86 chips, Intel let it quietly vanish as it came  
>> nowhere near the first advertised performance. It varied too far  
>> from the x86 architecture design requiring special programing  
>> code, so much like the GPUs, though they are actually able to run  
>> some parallel process, both under Windows and Linux.
>>
>> There is a difference the M series NVIDIA cards are moe for  
>> servers and the C series such as the C2070 or C2075 for  
>> Workstations, the M series also used the same numbering sequence  
>> and I think they are up to the 2090 or 2095 series, but you do  
>> need PCIe high speed slots for both sets of cards.  Most resale  
>> cards I have talked to a few, and be careful there are some  
>> knockoffs from mainland China, I verified this with NVIDIA.
>>
>>  These GPUs are designed that they are not seen as cores or cpus,  
>> also most resale’s, are pulled from in one case a pool of HP  
>> Workstations and servers, yet the seller had no idea the  
>> difference between the C2070 and the M2070s. and as I said none  
>> had of them had the required software, most did not even know it  
>> was needed! Otherwise the GPUs do not function.  So, as for  
>> resale’s it is a pretty expensive gamble as they are untested as  
>> no software to even try them with!
>>
>> The GPUs can be used if you wtrite your own parallel code usually  
>> in C++ per NVIDIA, but you still need the software to offload the  
>> work to them.  If you are into heavy number crunching, assuming  
>> allows parallel processing versus the traditional linear method  
>> where a must always come before B and b before C in processes, you  
>> will see a lot more results than a typical program, in other  
>> things you will see little improvement, my talk with an NVIDIA  
>> technician confirmed this you can get a great results for creating  
>> say graphics but very little improvement to display a already  
>> designed piece, same for statistics, weather forecasting, geology,  
>> technically intel has even used their network as a massive HPC to  
>> elp design chips, so add engineering, while beyond most physics  
>> and nuclear explosions simulations, etc.
>>
>> Also with fiber optics now coming down in price the idea of  
>> multiple super-workstations and even super-servers where a client  
>> server relationship and the Server does most of the processing  
>> will most likely grow into stable and usable systems before the  
>> average work-station.
>>
>> It will help some with a statistics driven database but not that  
>> much for a pure relational database, it also works well with  
>> MathLab and SPSS.
>>
>>
>>
>> Overall I would expect that the GPUs will soon have more code  
>> written for them as they become more plentiful in the real world  
>> applications, also there is open source code that is available and  
>> being further developed under linux, which with Wine and Winex can  
>> run Windows, to some degree, not 100% and as for Windows 7 I have  
>> not a clue if it will run under Wine or WineX, though the  
>> Macintosh’s now run Windows very well as a second operating  
>> system..  Than I would like to have 4 12 core Xeons in my  
>> workstation but that bill is far higher than a few 448 GPU cards.  
>> Just as any new technology it starts on the high end and then as  
>> developed works its way down the price chain, than I was shocked  
>> to see a twin Xeon 6 core in a Game machine! So things are moving  
>> faster than I anticipated.
>>
>>
>>
>> I know I am watching the GPU idea and cards carefully as so far  
>> beyond just throwing more cores in the x86 architecture it seems  
>> to be moving far faster than when intel started moving upwards,  
>> maybe you remember the hardware flaw in the first Pentiums where  
>> simple math was processed incorrectly?  Like all things when you  
>> introduce new variables into a system, be it hardware or software,  
>> there are a lot of things that will not always work or work to the  
>> potential of the system.
>>
>>
>>
>> As I said I am watching the GPUs closely as so far they seem the  
>> most likely next beak-through as software is written that can take  
>> advantage of their unique abilities. Also from what I have read  
>> they draw far less power than even the new generation of multi- 
>> core x86 series.  I am not an expert with these GPU systems but  
>> they do hold a great promise as in a leap-forward than just adding  
>> x86 cores.
>>
>> The buying of Infiniband shows hat Intel is looking to move past  
>> the copper Ethernet systems, which surpased Arcnet systems.  the  
>> only constamnt is change, while technically not an Intel Chip this  
>> still shows Moore's law is being leveraged to other platforms  
>> including GPUs
>>
>> Mike.
>>
>> --- On Wed, 1/25/12, Vincent Diepeveen <diep at xs4all.nl> wrote:
>>
>> From: Vincent Diepeveen <diep at xs4all.nl>
>> Subject: Re: [Beowulf] cpu's versus gpu's - was Intel buys QLogic  
>> InfiniBand business
>> To: "Prentice Bisbal" <prentice at ias.edu>
>> Cc: "Beowulf Mailing List" <beowulf at beowulf.org>
>> Date: Wednesday, January 25, 2012, 2:46 PM
>>
>> The supercomputing codes i saw run on processors, to say polite, were
>> losing it everywhere.
>>
>> Also NASA when porting from Origin3800 to Itanium2 1.5Ghz, reported
>> publicly a speedup of factor 2 in the forums.
>>
>> However my own chessprogram, not exactly optimized for itanium2, got
>> a boost of factor 4 moving from 500Mhz R14000 (origin3800)
>> to itanium2 1.3Ghz. That was just a single compile, and it's an
>> integer program, whereas the itanium2 is a floating point processor.
>>
>> The itanium2 1.5Ghz has 6 gflops on paper versus the R14k 500Mhz has
>> 1 Gflop on paper.
>>
>> Now a Chinese reporter posted on THIS mailing list, the beowulf
>> mailing list, already at GPU hardware some generations ago
>> an IPC of 25% at nvidia and 50% at AMD.
>>
>> At the same gpu's back then, most studentprojects got around 25% at
>> nvidia; Volkov then went ahead and understood GPU's better
>> and scored 70% efficiency - again at very old gpu's. Sincethen they
>> really improved.
>>
>> See: http://www.cs.berkeley.edu/~volkov/
>>
>> So you want to build a supercomptuer now 10x more expensive, and each
>> generation lose more efficiency on newer hardware,
>> whereas some who do effort to write new good code, they get very high
>> efficiency?
>>
>> Just learn how to program and ignore the desinformation - if you have
>> a box that fast you really can get a lot of speed out of it.
>>
>> You shouldn't ask for a 1 billion dollar box that can run your
>> oldschool Fortran codes as good as a 5 million GPU box,
>> look what you can do to write good codes for that manycore hardware.
>> OpenGL works at all, CUDA just at nvidia.
>>
>> Vincent
>>
>> On Jan 25, 2012, at 11:01 PM, Prentice Bisbal wrote:
>>
>> > On 01/24/2012 12:02 AM, Steve Crusan wrote:
>> >>
>> >>
>> >> On Jan 23, 2012, at 8:44 PM, Vincent Diepeveen wrote:
>> >>
>> >>
>> >>> It's 500 euro for a 1 teraflop double precision Radeon HD7970...
>> >>
>> >>
>> >> Great, and nothing runs on it. GPUs are insanely useful for  
>> certain
>> >> tasks, but they aren't going to be able to handle most normal
>> >> workloads(similar to the BG class of course). Any center that buys
>> >> BGP
>> >> (or Q at this point) gear is going to pay for a scientific  
>> programmer
>> >> to adapt their code to take advantage of the BG's strengths;
>> >> parallelism.
>> >>
>> >> But It's nice that supercomputing centers use GPUs to boost their
>> >> flops numbers. Any word on that Chinese system's efficiency? If  
>> you
>> >> look at the architecture of the new K computer in Japan, it's  
>> similar
>> >> to the BlueGene line.
>> >
>> > I attended a presentation at Princeton U. on Monday about the  
>> state of
>> > HPC in China. The talk  was given by someone who has been to  
>> China and
>> > spoken with the leaders of their HPC efforts. While the Chinese
>> > systems
>> > get great scores on LINPACK, even the Chinese concede that on their
>> > "real" applications, they are getting well below the theoretical  
>> max
>> > flops, because their codes aren't getting the most out of their
>> > systems.
>> > In other words, on real programs, they aren't all that efficient
>> > (yet).
>> >
>> > --
>> > Prentice
>> >
>> >
>> >
>> > _______________________________________________
>> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>> > Computing
>> > To change your subscription (digest mode or unsubscribe) visit
>> > http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>