high physical density cluster design - power/heat/rf questions

Velocet mathboy at velocet.ca
Tue Mar 6 09:01:50 PST 2001

On Mon, Mar 05, 2001 at 11:29:19PM -0800, szii at sziisoft.com's all...
> We were pondering the exact same questions ourselves about a month
> ago and while our project is on hold, here's what we came up with...
> Mounting:  Plexiglass/plastic was our choice as well.  Strong, cheap,
>                    and can be metal-reinforced if needed.
> We were going to orient the boards on their sides, stacked 2 deep.  At
> a height of 3" per board, you can get about 5 comfortably (6 if you try)
> into a stock 19" rack.  They can also slide out this way.  Theoretically
> you can get 10-12 boards in 5-6U (not counting powersupplies or hard drives)
> and depending on board orientation. We were looking at ABIT VP6 boards.
> They're cheap, they're DUAL CPU boards, and they're FC-PGA so they're thin.
> 20-24 CPUs in 5-6U.  *drool*  If AMD ever gets around to their dual boards,
> those will rock as well.

The price/performance for intel is just so much lower than with Athlons.
We've run a number of tests and the Athlons are at least 90% the speed
of a similarly clocked Intel chip for Gaussian and MPQC calculations. I
say at least as it may be more. Considering the price is usually 50-70%
depending on the speed of an intel chip, theres just a huge win.

Prices for me, in Toronto, are around $240 CDN for the VP6, and $300
per 800Mhz P3 (granted its 133Mhz FSB, where the Duron is 100).
Thats $840 for the board and CPUs not including ram. I have no savings
with ram either, I basically have to double it on a dual board. Compare
this to $260 * 2 = $560 for two boards + CPUs for the M810 with Durons.
The VP6 may well take up a bit less space, it being standard ATX factor
where as the M810 is 2x9.5" wide. For this increased cost, I think that
even the $560 vs $840 will more than makeup for the 100 vs 133Mhz bus
as well. (Actually, the M810 runs up to 2x133 FSB for Tbirds, and the
TBirds are about $50CDN more expensive. I am running some tests to see
the performance difference between Duron and TBird with Gaussian - anyone
got any stats?).

The problem with dual CPU boards with gaussian is that it thrashes the cache.
I dont know much behind the math (yet) of gaussian style calculations (quantum
computational chemistry) but I gather there are large matrices involved. These
often do not fit in L1 or L2 cache and the memory bus gets a nice hard
workout. On top of that, main memory often isnt big enough depending on the
calculation, and disk can be thrashed heavily as well with its scratch files.
You get two CPUs that wanna hammer both disk and ram and things start slowing

With the current state of both freeBSD and Linux resource locking, we're
finding that for some jobs there are huge bottlenecks at the memory and PCI
(disk) bus.  Certain jobs run SLOWER on two CPUs on one board than on a single
CPU (this was from my tests of scan jobs with gaussian on a BP6 with 2xC400s
as well as 2xC366s O/C'd to 550Mhz). Usually performance isnt that
pathalogically bad, but most jobs run at a loss of efficiency (ie its not
twice as fast). Seeing as we need this cluster now and cant wait for dual
Athlon boards to do some real tests, I am relatively confident that the M810
single Duron/Tbird boards sporting a D800 are going to do us quite well for
the price/performance ratio. (Considering the cost of DDR boards and ram,
thats not a possibility either.)

> For powersupplies and HA, we were going to use "lab" power supplies
> and run a diode array to keep them from fighting too much.

Saw someone post that most diodes will steal too much voltage to be
able to maintain a steady ~+3V supply to the finicky CPUs unless you
are very careful. I do have an electrical engineering friend tho... ;)

> Instead of x-smaller supplies, you can use 4-5 larger supplies and run them
> into a common harness to supply power.  You'll need 3.3v, 5v, 12v supplies,
> but it beats running 24 serparate supplies (IMHO) and if one dies, you don't
> lose the board, you just take a drop in supply until you replace it.
> For heat dissapation, we're in a CoLo facility.  Since getting to/from the
> individual video/network/mouse/keyboard/etc stuff is very rare (hopefully)
> once it's up, we were going to put a pair of box-fans (wind tunnel style)
> in front and behind the box.  =)  In a CoLo, noise is not an issue.
> Depending
> on exact design, you might even get away with dropping the fans off of the
> individual boards and letting the windtunnel do that part, but that's got
> problems if the tunnel dies and affects every processor in the box.

True. Without the fans on the boards we can actually get the boards closer -
which means more heat though :) Its too bad we cant get the vanes of the
heatsinks 90 degrees from what they traditionally are so that the air will
flow through them (since we're mounting the sides of the boards its hard
to flow air in from the side).

Like I said we may just take some accordion airduct hose from the
ceiling and latch it onto the whole array. The Liebert is far more
reliable than any box fan and is alarmed up the yingyang. Its possible
it would stop, however, so we'll have poweroff protection for when things
get rather warm. Better to lose 6 hours of calculations than fry the boards.
I spose we're lucky in that way, that we're running mainly jobs that
are no longer than 1-2 days of calculations in most cases. GIves us alot
of flexibility.


> I'm not an EE guy, so the power-supply issue is being handled by someone
> else.  I'll field whatever questions I can, and pass on what I cannot.
> If you even wander down an isle and see a semi-transparent blue piece
> of plexiglass with a bunch of surfboards on it, you'll know what
> it is - the Surfmetro "Box O' Boards."
> Does anyone have a better way to do it?  Always room for improvement...
> -Mike
> ----- Original Message -----
> From: Velocet <mathboy at velocet.ca>
> To: <beowulf at beowulf.org>
> Sent: Monday, March 05, 2001 10:13 PM
> Subject: high physical density cluster design - power/heat/rf questions
> > I have some questions about a cluster we're designing. We really need
> > a relatively high density configuration here, in terms of floor space.
> >
> > To be able to do this I have found out pricing on some socket A boards
> with
> > onboard NICs and video (dont need video though). We arent doing anything
> > massively parallel right now (just running Gaussian/Jaguar/MPQC
> calculations)
> > so we dont need major bandwidth.* We're booting with root filesystem over
> > NFS  on these boards. Havent decided on FreeBSD or Linux yet. (This email
> > isnt about software config, but feel free to ask questions).
> >
> > (* even with NFS disk we're looking at using MFS on freebsd (or possibly
> > the new md system) or the new nbd on linux or equivalent for gaussian's
> > scratch files - oodles faster than disk, and in our case, with no
> > disk, it writes across the network only when required. Various tricks
> > we can do here.)
> >
> > The boards we're using are PC Chip M810 boards (www.pcchips.com). Linux
> seems
> > fine with the NIC on board (SiS chip of some kind - Ben LaHaise of redhat
> is
> > working with me on some of the design and has been testing it for Linux, I
> > have yet to play with freebsd on it).
> >
> > The configuration we're looking at to achieve high physical density is
> > something like this:
> >
> >                NIC and Video connectors
> >               /
> >  ------------=-------------- board upside down
> >     | cpu |  =  |   RAM   |
> >     |-----|     |_________|
> >     |hsync|
> >     |     |      --fan--
> >     --fan--      |     |
> >    _________     |hsync|
> >   |         |    |-----|
> >   |  RAM    | =  | cpu |
> >  -------------=------------- board right side up
> >
> > as you can see the boards kind of mesh together to take up less space. At
> > micro ATX factor (9.25" I think per side) and about 2.5 or 3" high for the
> > CPU+Sync+fan (tallest) and 1" tall for the ram or less, I can stack two of
> > these into 7" (4U). At 9.25" per side, 2 wide inside a cabinet gives me 4
> > boards per 4U in a standard 24" rack footprint. If I go 2 deep as well (ie
> 2x2
> > config), then for every 4U I can get 16 boards in.
> >
> > The cost for this is amazing, some $405 CDN right now for Duron 800s with
> > 128Mb of RAM each without the power supply (see below; standard ATX power
> is
> > $30 CDN/machine). For $30000 you can get a large ass-load of machines ;)
> >
> > Obviously this is pretty ambitious. I heard talk of some people doing
> > something like this, with the same physical confirguration and cabinet
> > construction, on the list. Wondering what your experiences have been.
> >
> >
> > Problem 1
> > """""""""
> > The problem is in the diagram above, the upside down board has another
> board
> > .5" above it - are these two boards going to leak RF like mad and
> interefere
> > with eachothers' operations? I assume there's not much to do there but to
> put
> > a layer of grounded (to the cabinet) metal in between.  This will drive up
> the
> > cabinet construction costs. I'd rather avoid this if possible.
> >
> > Our original construction was going to be copper pipe and plexiglass
> sheeting,
> > but we're not sure that this will be viable for something that could be
> rather
> > tall in our future revisions of our model. Then again, copper pipe can be
> > bolted to our (cement) ceiling and floor for support.
> >
> > For a small model that Ben LaHaise built, check the pix at
> > http://trooper.velocet.ca/~mathboy/giocomms/images
> >
> > Its quick a hack, try not to laugh. It does engender the 'do it damn
> cheap'
> > mentality we're operating with here.
> >
> > The boards are designed to slide out the front once the power and network
> > are disconnected.
> >
> > An alternate construction we're considering is sheet metal cutting and
> > folding, but at much higher cost.
> >
> >
> > Problem 2 - Heat Dissipation
> > """"""""""""""""""""""""""""
> > The other problem we're going to have is heat. We're going to need to
> build
> > our cabinet such that its relatively sealed, except at front, so we can
> get
> > some coherent airflow in between boards. I am thinking we're going to need
> to
> > mount extra fans on the back (this is going to make the 2x2 design a bit
> more
> > tricky, but at only 64 odd machines we can go with 2x1 config instead, 2
> > stacks of 32, just 16U high). I dont know what you can suggest here, its
> all
> > going to depend on physical configuration. The machine is housed in a
> proper
> > environment (Datavaults.com's facilities, where I work :) thats climate
> > controlled, but the inside of the cabinet will still need massive airflow,
> > even with the room at 68F.
> >
> >
> > Problem 3 - Power
> > """""""""""""""""
> > The power density here is going to be high. I need to mount 64 power
> supplies
> > in close proximity to the boards, another reason I might need to maintain
> > the 2x1 instead of 2x2 design. (2x1 allows easier access too).
> >
> > We dont really wanna pull that many power outlets into the room - I dont
> know
> > what a diskless Duron800 board with 256Mb or 512Mb ram will use, though I
> > guess around .75 to 1 A. Im gonna need 3 or 4 full circuits in the room
> (not
> > too bad actually). However thats alot of weight on the cabinet to hold 60
> odd
> > power supplies, not to mention the weight of the cables themselves
> weighing
> > down on it, and a huge mess of them to boot.
> >
> > I am wondering if someone has a reliable way of wiring together multiple
> > boards per power supply? Whats the max density per supply? Can we
> > go with redundant power supplies, like N+1? We dont need that much
> > reliability (jobs are short, run on one machine and can be restarted
> > elsewhere), but I am really looking for something thats going to
> > reduce the cabling.
> >
> > As well, I am hoping there is some economy of power converted here -
> > a big supply will hopefully convert power for multiple boards more
> > efficiently than a single supply per board. However, as always, the
> > main concern is cost.
> >
> > Any help or ideas are appreciated.
> >
> > /kc
> > --
> > Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto,
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

More information about the Beowulf mailing list