[Beowulf] How Can Microsoft's HPC Server Succeed?

Jan Heichler jan.heichler at gmx.net
Sat Apr 19 00:42:49 PDT 2008


Hallo Bogdan,

Freitag, 18. April 2008, meintest Du:


BC> Sorry to divert a bit the thread towards its initial subject and away 
BC> from the security issues currently discussed...

BC> I've just seen a presentation from a University (which shall remain 
BC> unnamed) which partnered with Microsoft for... well, HPC. The reasons 
BC> for using Windows were more or less the same that have been mentioned 
BC> in this thread, so I won't repeat them. To note is that they weren't 
BC> using Windows exclusively, but only on a part of the cluster, the rest
BC> running Linux.

Since i think i heard the same presentation i have to add some thoughts here...

BC> Towards the beginning of the presentation there was a mention of a MPI
BC> latency benchmark showing 2.something microseconds over their IB 
BC> (unknown make and speed) in mainboards using latest generation Intel 
BC> CPUs with Microsoft's MPI libs, which seemed like a decent performance
BC> and got me pretty excited. 

But as far as i saw it they didn't state the type of IB. 2.7 is great for Infinihost III (normal SDR/DDR cards) - but pretty bad for Qlogic Infinipath or ConnectX. Since the cluster is pretty new one should ask which interconnect they are using.

When i saw the first performance numbers for CCS 2003 i was a little shocked - and the recent numbers seem to be a huge improvement. 

BC> But then I changed my mind when I started 
BC> to hear what a great feature it is to have several nodes booting and 
BC> installing the OS in the same 50 minutes (yes, minutes!) that a single
BC> node takes, due to a wonderful feature called multicast. 

50 minutes for a single node is of course unacceptable. 50 Minutes for 256 nodes is okay i think. But i doubt that it scales that well. Even Multicast packages get lost - needs retransmission etc. 

Does Rocks (for example) in the latest release use Multicast to install nodes?


BC> And then 
BC> things turned really strange after a statement saying that in Linux it
BC> takes several minutes to start a parallel job while in Windows only 
BC> about 10 seconds. Then I started wondering: were those 2.something 
BC> microseconds a measure of the same latency that I know of ?

This was really strange. I never took the time of a startup so could please somebody tell me a number of 2048 processes starting on a linux cluster? Would be interesting to know. 


BC> I can't say for sure that this was part of some Microsoft strategy and
BC> not a PR effort gone bad, but I'm strongly enclined towards the first 
BC> which leads me to believe that the answer to the question in the 
BC> subject is: by disinforming people. #

All presentations i saw about CCS or the new 2008 product where pretty honest about that they don't want to beat linux but want to make it easier for people that are no linux/unix experts. This presentation yesterday was pretty interesting because it did a real comparison with linux - and one has to see if the results are reproduceable. 

BC> Yes, there are probably many CEOs 
BC> of SMBs, who don't know/care much about technical details and don't 
BC> have a clue about Linux HPC, who are going to be impressed by such 
BC> statements. And when you can run HPL from Excel by modifying in a cell
BC> one of the parameters and getting the results back in that table, 
BC> results from which you can quickly generate a graphic and say "whew, 
BC> I should be in Top500", who can say that clustering is hard and 
BC> user-unfriendly ?

Hey... the excel-sheet was pretty neat! ;-)


BC> I'm all for healthy competition in this area, especially as I think 
BC> that HPC didn't evolve significantly in the past few years. But such 
BC> aproaches are far from healthy... well, at least for my definition of 
BC> healthy competition. ;-)

My oppinion is that many people in the HPC world just don't like Windows (including me). And a strange thing (that i experienced myself): even after years of using and deploying linux-clusters you are a newbie to the Microsoft-CCS world. That feels really strange because you're no expert anymore. Maybe that keeps people from liking MS CCS (or HPS 2008).

Cheers,
Jan


Bye Jan                            
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20080419/f836ece7/attachment.html


More information about the Beowulf mailing list