[Beowulf] advice for newbie - component list

Vincent Diepeveen diep at xs4all.nl
Wed Aug 22 12:42:40 PDT 2012


>
> I might have lost this point, so I will try to wrap it up:
>
>  * 16 nodes: pci-e motherboard, 2x 4 core xeons L5420, infiniband  
> DDR card, no hard drives, about 4-8GB RAM

i'd go with 8GB a node if i were you. 1GB/core is pretty much  
standard for 10+ years in HPC now.

>  * file server: rackmount with 8 hard SATA drives, raid card, 8GB  
> RAM, pci-e motherboard, 2x 4 core xeons L5420
>  * infiniband switch
>

All your nodes need pci-e for the DDR infiniband. Default most those  
motherboards have pci-x slot as well.
Note that pci-x is not pci-e, pci-x is older and therefore dirt cheap.

New pci-e raid card is a lot more expensive. hundreds of dollars.
This where it won't deliver you more bandwidth at 8 drives, only when  
using more than 8 drives and i doubt any node
could handle such bandwidth.

Reality of bandwidth is that you never want to stress in HPC things  
to the limit. Everything usually ughs out then.

My advice always is to really limit total bandwidth to 10-20% of the  
maximum the network in theory is capable of handling.

> We will try to see if we can afford new hard wares first (prices  
> from ebay):
>
> pci-e motherboard: ~ 17x 100 = 1700

you get a certificate in gold?

$100 for motherboard is way way too expensive. They're for $50-$60  
there on ebay.
Yet better is buy a barebone or ready rackmount.

> 2x 4core L5420 ~ 17x 200 = 3400

http://www.ebay.com/itm/Intel-Xeon-L5420-2-5GHz-12M-1333MHz-LGA-771- 
SLARP-OEM-/140817053377?pt=CPUs&hash=item20c959b6c1

They're $19.90 a piece on ebay a piece.

> 8G DDR3 ~ 17x 100 = 1700

they don't eat DDR3 ram. they eat FBDIMMS About $6 a gigabyte if you  
put in larger types ram. Say $50 for 8 GB in total.
Not everyday such offers there.

> infiniband card ~ 17x 100 = 1700

Yeah around a $60-$70 depends what card you get. The cards that can  
do pci-e 2.0 are useless anyway except if you get motherboards
with a seaburg chipset.

> infiniband cables ~ 17x 20 = 140
> 8 SATA ~ 1000



> RAID card ~ 50
> file server ~ 500?

At least 500.


> infiniband switch ~ 500?

saw them $300 on ebay last time i checked but that's a while ago. It  
always depends upon how many of such
specialized hardware is on ebay. If it's several suddenly it's dirt  
cheap and $100 otherwise it's $1k or $2k or more...

> a server rack (or PC shelf) ~ ?

A case for each node and 2 passive heatsinks for each machine might  
get in handy.

The ready rackmounts come with a PSU.

> 5-6kW PSU ~ ?

each machine has its own PSU.

A 16 node network with everything except the airconditioner to cool  
the room,
that's gonna eat you a 3 kilowatt roughly.

You really want to order ready rackmounts.

Here is example of a barebone if you want to be active yourself:

http://www.ebay.com/itm/1U-Server-Intel-S5000PSL-Barebone-2x-LGA771- 
CPU-DDR2-RAM-2-5-SATA-Hard-Dirves-/140831053892? 
pt=COMP_EN_Servers&hash=item20ca2f5844

And it has heatsinks and a psu. Note if you order there you probably  
want ready rackmounts anyway as you pay the transport costs already
gonna be $500 at least transport.

If you want to save shelf space maybe get this:

http://www.ebay.com/itm/Supermicro-SC809T-980B-1U-2-Node-Twin-Server- 
X7DWT-INF-Barebone-LGA771-Socket-/140826089474? 
pt=COMP_EN_Servers&hash=item20c9e39802]

that's 2 motherboards in 2 machine. It needs 4 cpu's and 2 x 8 GB ram  
(so 8 dimms i'd guess) and then you have 2 machines in 1 rackmount.

I didn't check what comes with it. You want 2 riser cards of course  
for the network.

Ask the guy for cpu's inside it and RAM and a small harddrive and 2  
riser cards as local scratch and you're probably done for the  
calculation nodes.
Just need then network gear.

Note that E53** ready rackmounts are a LOT cheaper on ebay. Under  
full load doesn't at that much power btw,
intel writes big novels always on that. If you have a special room  
burning a tad more energy is usually not a major problem...

Note there is much cheaper offers if you want things dirt cheap. I  
just picked the first link i saw on ebay, that's true for everything.
So no links so far were gotten by me on purpose.

This X7DWT from supermicro is possibly a seaburg chipset. that means  
that it can do pci-e 2.0
pci-e 2.0 is 8 GB/s (i'm sure someone corrects it if i write it  
wrong) versus pci-e 1.0 is at x8 doing half that speed.

Now i hope you don't need that bandwidth full speed of course...

>
> So that will be around $11k (not including the shelf/rack and the  
> PSU). It looks like we can afford this system. Am I missing  
> anything else? Are there those above components for box PC? I did  
> some quick search on ebay and they are all seem to be for rack- 
> mount servers.
>
>>>
>>> Also it's pretty low power compared to alternatives. It'll eat a
>>> 180 watt a node or so under full load. It's 170 watt a node
>>> here under full load (but that's with a much better psu).
>>>
>>> As for software to install in case you decide for infiniband, your
>>> choices are limited as OpenFED doesn't give
>>> much alternatives.
>>>
>>> Fedora Core or Scientific Linux are for free and probably your only
>>> 2 options if you want to use free software
>>> that are easy to get things done as you want to.
>>>
>>> Then install OpenFED that has the openmpi and other infiniband
>>> stuff for free.
>>>
>>> Probably Debian works as well, provided you use the exact kernel
>>> that OpenFED recommends.
>>> Any other kernel won't work. So you have to download some older
>>> debian then, get the exact kernel recommended
>>> and then i guess OpenFED will install as well.
>
> Thanks for this suggestion about softwares. I think we will go with  
> SL6. I actually tried a 3-node cluster (2 clients and 1 master)  
> with SL6 and OpenMPI, and it works fine. For the infiniband cards,  
> I have zero experience but I assume it is not too hard to install/ 
> configure?

They didn't make it real easy i feel as the manufacturers only have  
stuff at their website that works brilliant for
the expensive SLES and RHEL, yet with OpenFED you can get there and  
do things manual by typing a few commands.

There is manuals online.

OpenFED installed fine at default SL6.1 and SL6.2, what didn't work  
was latest GCC (i need at least 4.7.0)
I had to modify the OpenFED installation script for that.

For my programs gcc 4.7.0 at intel cpu's is a lot faster than 4.5  
series and before 4.5.3 (4.6.3 didn't check might be fast also).
for 64 bits programs 4.7.0 really is a lot faster than 4.5 and before.

You sure you have software that can use MPI by the way?
I suppose so as you refered to openmpi...

One should never order hardware without knowing you have software  
that can work for that specific hardware.

Realize how compatible all this infiniband gear is, it all works  
crisscross. Mellanox with others are cross compatible.
No wonder infiniband dominates right now in HPC.

Interesting i didn't see a cheap infiniband switch at ebay right now.  
That's probably because i was just looking for a Mellanox switch.
They are probably in high demand.

As for the network gear:

http://www.ebay.com/itm/Mellanox-Voltaire-Grid-Director-2036-36- 
DDR-20Gbps-ports-1U-Switch-/170889969337? 
pt=LH_DefaultDomain_0&hash=item27c9d5feb9

Auch $1150, switches sometimes are expensive...

My guess is it'll eat a 200 watt or so, max is 300 watt i guess  
(blindfolded guess based upon the switch that stands here).
When it works default here and idles, it's eating just over a 100  
watt by the way.

I'd go for that switch though and one by one collect the network  
cards off ebay or sit and wait until after summer as usual
some clusters get dismantled and equipment thrown to ebay.

No real cheap connect-X ddr cards this time on ebay, but you might  
get lucky, sometimes they're cheap there sometimes they're not.
There is also cheap topspin switches but i'm not sure about them.

Note this mellanox has 36 ports in case you want to order more nodes  
or directly do that.

Clusters tend to get useful if you really build bigger ones, as most  
software is not so every efficient. It loses big overhead.
Just a few nodes is same speed like a box with 48 cores. My cluster  
is on the brink of losing there in that sense (currently has
8 nodes but planned to increase to 16 when works well).

Ok seems the cheap network cards on ebay aren't connectX and that  
many of the cheap offers are gone.
Maybe you want to collect it network card by network card from ebay.

http://www.ebay.com/itm/HP-INFINIBAND-4X-DDR-PCIe-DUAL-PORT- 
HCA-409778-001-409376-B21-/130744094587? 
pt=COMP_EN_Servers&hash=item1e70f48f7b

This doesn't say it's connectX.
It is pci-e though.

Many of the other infiniband cards offered seem to be pci-x, don't  
get those except when bandwidth is not a big issue to you.

For example to me latency is important, more important than bandwidth  
(except for the EGTB crunching), so Connect-X then gets
recommended here on this list.

If bandwidth is not a big issue things change. Cheaper switch,  
cheaper network and cheaper cards, or maybe just a gigabit switch.

tons of cheap offers that were there a year ago on ebay, they seem  
all gone!

Maybe it's just bad luck and sit and wait a little until some of the  
big supercomputers get dismantled unleashing a flow of QDR network
cards to ebay - then the DDR ones will become dirt cheap i bet and  
everyone wants to get rid of them.

Usualy big new supercomputers take some years to prepare and then the  
big install happens especially in July when all scientists have a  
holiday
and no one runs on those machines. So some weeks from now the tide  
might change again...

So the question is to you: what sort of network do you need and what  
do you want?

>
> Thanks,
>
> D.
>
>>>
>>> Good Luck,
>>> Vincent
>>>
>>> On Aug 20, 2012, at 6:55 AM, Duke Nguyen wrote:
>>>
>>>> Hi folks,
>>>>
>>>> First let me start that I am total novice with cluster and/or
>>>> beowulf. I
>>>> am familiar with unix/linux and have a few years working in a  
>>>> cluster
>>>> (HPC) environment, but I never have a chance to design and admin a
>>>> cluster.
>>>>
>>>> Now my new institute decides to build a (small) cluster for our  
>>>> next
>>>> research focus area: genome research. The requirements are simple:
>>>> expandable and capable of doing genome research. The budget is low,
>>>> about $15,000, and we have decided:
>>>>
>>>>    * cluster is a box cluster, not rack (well, mainly because our
>>>> funding
>>>> is low)
>>>>    * cluster OS is scientific linux with OpenMPI
>>>>    * cluster is about 16-node with a master node, expandable is  
>>>> a must
>>>>
>>>> Now next step for us is to decide hardwares and other aspects:
>>>>
>>>>    * any recommendation for a reliable 24-port gigabit switch  
>>>> for the
>>>> cluster? I heard of HP ProCurve 2824 but it is a little bit hard
>>>> to find
>>>> it in my country
>>>>    * should our boxes be diskless or should they have a hard disk
>>>> inside?
>>>> I am still not very clear the advantages if the clients has about
>>>> 80GB
>>>> hard disk internally except that their OS are independent and does
>>>> not
>>>> depend on the master node, and maybe faster data processing
>>>> (temporay),
>>>> but 80GB each for genome research is too small
>>>>    * hard drives/data storage: we want to have storage of about
>>>> 10TB but
>>>> I am not sure how to design this. Should all the hard disk be in  
>>>> the
>>>> master node, or they can be on each of the node, or should it be a
>>>> NAS?
>>>>    * any recommendation for a mainboard (gigabit network, at least
>>>> 4 RAM
>>>> slots) about $200-$300 good for cluster?
>>>>
>>>> I would love to hear any advice/suggestion from you, especially if
>>>> you
>>>> had built a similar cluster with similar purpose.
>>>>
>>>> Thank you in advance,
>>>>
>>>> Duke.
>>>>
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>>>> Computing
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
>> Computing
>> To change your subscription (digest mode or unsubscribe) visit  
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>




More information about the Beowulf mailing list