[Beowulf] 512 nodes Myrinet cluster Challanges

Wed May 3 08:08:05 PDT 2006

Let me react onto next small quote:

> You are somehow convinced that institutions buying clusters are brain
> dead and always get ripped off. Some are, but most are not. You don't
> have all of the informations used in their decision process, so you draw
> invalid conclusions.

Ok a few points:

First a simplistic fact:
  0) the biggest and vaste majority of supercomputers get bought by 
(semi-)government organisations

That in itself should already raise questionmarks on our heads. Even though 
some companies need massive
power, they somehow seem to be smarter and manage to do the job in a 
different way.

However i can now only talk and speak about how government works, as all 
those machines belong to
semi-government.

There is now suddenly a great distinction between university level and 
government level.
SOME universities do pretty well actually and the points underneath do not 
apply to them

  a) Let me give a simple example. In the supercomputer europe 2003 report 
they still seem to not know that
      opteron processors support SSE/SSE2, whereas, i'm guessing this report 
was used to order for 2004/2005
      machines.

      As a result of that intel processor had an advantage over amd, whereas 
that AMD processor in SSE2 already
      was performing in tests for me in DFT faster than any intel processor. 
Even though i reported
      this to the persons in question and the organisation, i never got 
feedback.

      When i read that report 2003 which was available in start 2004, i 
already knew what kind of machine
      they were gonna order for 2005.

  b) hardware moves fast. What is fast this year, is slow next year. A 
simple indexation *previous* year what
      was available that year and only use *that* to order a machine for 
*next* year, is slow moving.
      Companies in general are faster there.

  c) if you learn what price gets paid for the hardware effectively, you 
will fall off your chair. The so called public
      costs that governments sometimes tell publicly simply aren't true what 
they *effectively-all-in* pay.

A few good guys excepted, but in general it's simply not their money, so 
they don't care.
Sometimes there are good reasons for this. So it's not like that the persons 
having the
above actions are bad guys. It's easier to get 20 million euro from 
goverment
sometimes than to get 1 million.

  d) It is definitely the case that i do not see 'bad' persons on those 
spots,
      it's just that the average government official knows nothing from 
contracts. They have doctor and professor titles,
      because one day they were good somewhere in some domain, and perhaps 
even are today; they aren't on that
      spot because they are good in contracts. Government jobs pay based 
upon number of titles you've got,
      they don't pay based upon how good you are in closing deals and 
closing good contracts. They sit there because
      they are good in working with commissions which are good for your 
health; such commissions usually have nice meetings,
      unlike companies where meetings are usually way harder as tough 
decisions get forced there.

      Find me 1 burocrat on entire planet who is good in contracts and sits 
at the right spot to decide what happens.

I saw this in politics even more than in HPC. Those sectors
go about a lot more money than several tens of millions. Highend 
supercomputing
really is about little money if you compare it with other infrastructural 
projects. In telecommunication
i remember a meeting i had with a manager of a telecommunication company who 
nearly wanted to
beat me up when en passant i remarked to him that such a wireless station of 
around 60 watt i would never
want in my garden, as 60 watt straight through my head i do not consider as 
a healthy way of life.
Even if they would offer me 20k euro a year for it, like they do to farmers 
for example.

Compared to that HPC is near holy. It is not bad for health.
It's good for health in fact as it avoids for example atomic test blasts and 
at a very cheap price too.

So in comparision to telecommunication and high voltage power and gas pipes 
where you see sometimes
complete idiocy which is near to or equal to corruption and mixed interests; 
i'm 100% sure this isn't the case in HPC.

The talk i had with next director looks worse than his actions
are in fact. He's a good guy and really did do his best, but next 
conversation is typical for government.

We were standing at a certain big supercluster. 1 meter away from it.

In the huge sound it produced he said: "we bought this giant machine for a 
lot of money
and i was promised at the time that we could simply
upgrade it a few years later by putting in a new processor.
Now we have already take it out in production by 2006,
with by then completely outdated processors. Just upgrading those cpu's
would still make it a powerful machine. New dual core 700Mhz processors are 
expected to
arrive by 2005 or even 2006 and will not be socket compatible. So we got 
tricked and in
fact lost a lot of money by buying this machine as it can't get upgraded."

Me: "But you signed a contract which takes care you can sue them now?"

[ear deafening silence]

Best regards,
Vincent

Director DiepSoft
www.diep3d.com

p.s. certain people in HPC only seem to react when they feel attacked or 
insulted

>> They either give the job to a friend of them (because of some weird 
>> demand that just 1 manufacturer can provide),
>> or they have an open bid and if you can bid with a network that's $800 a 
>> port, then that bid is gonna get taken over
>> a bid that's $1500 a port.
>
> The key is to set the right requirements in your RFP. Naive RFPs would
> use broken benchmarks like HPCC. Smart RFPs would require benchmarking
> real application cores under reliability and performance constraints.
>
> It's not that "you get what you pay for", it's "you get what you ask for
> at the best price".
>
>> This where the network is one of the important choices to make for a 
>> supercomputer. I'd argue nowadays, because
>> the cpu's get so fast compared to latencies over networks, it's THE most 
>> important choice.
>
> In the vast majority of applications in production today, I would argue
> that it's not. Why ? Because only a subset of codes have enough
> communications to justify a 10x increase in network cost compared to
> basic Gigabit Ethernet. Your application is very fine grain, because it 
> does not compute much, but chess is not representative of HPC workloads...
>
>> My government bought 600+ node network with infiniband and and and.... 
>> dual P4 Xeons.
>> Incredible.
>
> again, you don't know the whole story: you don't know the deal they got
> on the chips, you don't know if their applications runs fast enough on 
> Xeons, you don't know if they could not have the same support service on 
> Opteron (Dell does not sell AMD for example).
>
> By the way, your gouvernment is also buying Myrinet ;-)
> http://www.hpcwire.com/hpc/644562.html
>
>> I believe personal in measuring at full system load.
>
> Ok, you want to buy a 1024 nodes cluster. How do you measure at full
> system load ? You ask to benchmark another 1024 nodes cluster ? You
> can't, no vendor has a such a cluster ready for evaluation. Even if they
> had one, things change so quickly in HPC, it would be obsolete very
> quickly from a sale point of view.
>
> The only way is to benchmark something smaller (256 nodes) and define
> performance requirements at 1024 nodes. If the winning bid does not
> match the acceptance criteria, you refuse the machine or you negociate a
> "punitive package".
>
>> The myri networks i ran on were not so good. When i asked the same big 
>> blue guy the answer was:
>>   "yes on paper it is good nah? However that's without the overhead that 
>> you practical have from network
>>    and other users".
>
> Which machine, which NICs, which software ? We have 4 generations of
> products with 2 different software interfaces, and it's all called
> "Myrinet".
>
> On *all* switched networks, there is a time when you share links with
> other communications, unless you are on the same crossbar. Some sites do
> care about process mapping (maximize job on same crossbar or same
> switch), some don't. From the IBM guy's comment, I guess he doesn't know 
> better.
>
>> A network is just as good as its weakest link. With many users there is 
>> always a user that hits that weak link.
>
> There is no "weak" link in modern network fabrics, but there is
> contention. Contention is hard to manage, but there is no real way
> around except having a full crossbar like the Earth Simulator. Clos
> topologies (Myrinet, IB) have contention, Torus topologies (Red Storm,
> Blue Gene) have contention, that's life. If you don't understand it, you
> will say the network is no good.
>
>> That said, i'm sure some highend Myri component will work fine too.
>>
>> This is the problem with *several* manufacturers basically.
>> They usually have 1 superior switch that's nearly unaffordable, or just 
>> used for testers,
>> and in reality they deliver a different switch/router which sucks ass, to 
>> say polite.
>> This said without accusing any manufacturer of it.
>>
>> But they do it all.
>
> Not often in HPC. The HPC market is so small and so low-volume, you
> cannot take the risk to alienate customers like that, they won't come
> back. If they don't come back, you run out of business.
>
> Furthermore, the customer accepts delivery of a machine on-site, testing
> the real thing. If it does not match the RFP requirements, they can
> refuse it and someone will lose a lot of money. It has happened many
> times. It's not like the IT business when you buy something based on
> third-party reviews and/or on specsheet. Some do that in HPC, and they
> get what they deserve, but believe me, most don't.
>
> Patrick
>
>