Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] 512 nodes Myrinet cluster Challanges

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Mark Hahn hahn at physics.mcmaster.ca
Tue May 2 14:20:26 PDT 2006


> > moving it, stripped them out as I didn't need them.  (I _do_ always require
> > net-IPMI on anything newly purchased.)  I've added more nodes to the cluster
> 
> Net-IPMI on all hardware?  Why? Running a second (or 3rd) network isn't
> a trivial amount of additional complexity, cables, or cost.  What do

I really like being able to reset remotely, as well as power up/down,
fetch temperatures and fan speeds, etc.

> you figure you pay extra on the nodes (many vendors charge to add IPMI,
> sun, tyan, supermicro, etc), cables, switches, etc.  As a data point on
> a x2100 I bought recently the IPMI card was $150.

the IPMI add-in for many Tyan boards is a lot less than that ($50?),
but quite a few servers already have it.  (such as the HP DL145 G2).

and it's not a "real" nother network, since each rack's worth of IPMI
net ports can just go to an in-rack switch.  if you have 32-40 nodes/rack
with a better-than-ethernet interconnect, then you've probably already
got another switch (gigabit) in the rack so all the extra stuff is in-rack.

> Seems like collecting fan speeds and temperatures in-band seems reasonable,
> after all much of the data you want to collect isn't available via IPMI
> anyways (cpu utilization, memory, disk I/O, etc.).

true.  though it's not clear to me how important those extras are to 
the kind of HPC cluster I run.  a job gets complete ownership of its 
CPUs (and usually multiple whole nodes), so it's quite unlike a 
load-balancing cluster, where you actually want realtime info on 
cpu or memory utilization.  doing load-balanced clusters is not 
unreasonable for more cores-per-node, or perhaps for strictly 
serial workloads.  for anything that's nontrivially parallel, the job
_must_ completely own all its resources, so there's really no reason 
to worry about unused memory on an already occupied node...

> Upgrading a 208 3phase PDU to a switched PDU seems like it costs on the
> order of $30 per node list.  As a side benefit you get easy to query
> load per phase.

that's nice.  but it only lets you power up/down.  you can't do a 
warm reset, only hard ones that limit your life.

> After dealing with a few clusters with PDUs in the airflow blocking
> airflow and physical access to parts of the node I now specify the
> zero-u variety that are outside the airflow.

that's nice.  HP's PDUs have a breaker section which consume about 
1u each, and a set of outlet bars which mount zero-u (but which 
have far too many (or too low-power) outlets.

interestingly, our racks are bayed together, which means that there's
enough space for some airflow between racks.  unfortunately, Quadrics
switches are fairly narrow, so there's enough room for a noticable 
counter-circulation.




More information about the Beowulf mailing list