[Beowulf] What is the right lubricant for computer rack sliding rails?yh

Mon Feb 9 03:14:42 PST 2009

On Sat, 7 Feb 2009, Mark Hahn wrote:

>> The tricky thing with the "rack rail" approach (which we use a LOT at JPL)
>> you give is that if the units have cases that are also full height, you
>> can't stack one unit directly above another, so you have to leave a 1U gap.
>
> OK, this may gross you all out, but why mount 1U's individually at all?
> I'm a lazy slob, but I have several 1U's stacked on top of each other - after 
> all, how often will you really need to get the non-top one(s) out?
> depending on how fancy your servers are, saving the price of a rail kit
> might even make $ense.  admittedly, I did this on an older cluster,
> designed-for-cheapness cluster because I couldn't be bothered to order and 
> assemble the rails...
>
> wait, how about this: consider stacking 1U servers on their side.  you'll
> still be able to pull them out if you want.  would the hardware care about 
> the orientation?  muffin fans wouldn't, and disks out to be able to handle 
> it.  come to think of it, there's no reason to think of this wrt conventional 
> 19" racks.  you could just weld together a frame of nice
> tough angle-iron supports (of whatever length you can manage to support the 
> load without sagging embarassingly ;)

It's really the difference between a HPC cluster and most business
clusters.  HPC nodes go in, and they don't come out.  Unless they break,
and if you get high quality nodes, breaking is unlikely.

> think of it this way: by omitting a proper rack and rails, you could probably
> save something like $60/node.  can you think of a better way to spend $2400
> (per rack)?  admittedly, that's probably only a few percent of total cost.

And don't forget the ten or twenty minutes of human labor required to
mount the nodes on the rails, one at a time.  Take rails out of box, out
of back.  Pull out bag of screws.  Ditto the node.  Set node on side.
Line up rail with holes.  Not these holes, those holes.  Oops, make sure
the front is facing the right way and that the rail marked L is on the
left side.  If not, do over.  Place screws in little holes. Oops,
dropped one.  Fish it out with magnetic driver.  Use electric
screwdriver to tighten them down.  Mount matching part of rail in rack.
Hope the screws all work; don't cross thread them and worry about burrs
in the holes (seems like at least one screw won't work in any given
rack).  Hold node up to rails, line it up, snap it in.  Make sure it
rolls in and out (or slides, if you have truly cheap rails).  Slide it
all the way in, move on to next one.  Even at FIVE minutes per node,
this is 200 minutes for a 40+U rack, well over three hours of work.  And
it could easily be 2-3 times that.

Add in labor costs at a measely $40/hour and you're up to $4000 per rack
in costs.

That said, I have to say that I think rails are usually worth it in a
professional-grade cluster (one that other people's money is paying for,
where downtime is an issue) just like I think that 3-4 year service
contracts are worth it and that racks are worth it compared to heavy
duty steel shelving.  Ultimately, these all reduce downtime later,
reduce the time and personal effort required to get a cluster back
online when something breaks.  And it looks a lot better, which
actually does (sometimes) matter when your grant officer comes to call,
when you need to show pictures of your cluster when applying for money
to augment or replace it.

A tough call -- $2400 to $4000 buys one more node, basically, per rack,
perhaps a 2.5% increase in capacity.  But you still have to spend
SOMETHING to stack up the nodes one way or another, so you don't really
save all of this (unless you stack them right up on the floor with no
rack at all, the bottom node supporting the weight of the entire stack).
And if you do this and a low node dies, you may well have to shut down
and disassemble the entire stack to get at the node.  Even if it IS a
CBA win, this is a PITA at an unscheduled time, where building the
cluster in the first place is scheduled and unavoidable.  Who knows what
your time (or cluster runtime) will be "worth" the minute the node dies?
It could be right before a conference and you might need every node to
finish that last data point.  Not a good time to shut down a stack just
to get at a node.  Doubly ungood if you're running the cluster for a
bunch of users, as they are OFTEN going to be in this sort of
time-pressured situation -- it is human nature.

    rgb

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu