[Beowulf] GPU Beowulf Clusters

Micha michf at post.tau.ac.il
Sun Jan 31 16:31:30 PST 2010


On 01/02/2010 00:06, Mark Hahn wrote:
>>> Be very very sure that consumer geforces can go in 1u boxes. It's not
>>> so much
>>> the space as much as I'm skeptical with their ability of handling the
>>> thermal
>>> issues. They are just not designed for this kind of work.
>>
>> I've had to go to 2u and eventually to larger boxes because of power
>> supply and air-flow requirements. This is a big issue.
>
> I'm a bit puzzled here. sumermicro sells servers that take either two
> M1060's, or two C1060's, or two of any pcie 2 x16 cpus. their airflow
> design seems at least thought-about, and their PSU is 1400W.
>

The PSU is enough

> C1060 specs merely say "200W max, 160W typical" - which is probably
> about the same as gtx275 according to wikipedia. so something like 600W
> expected from 1U - not really that hard, especially if you don't
> have a wall of 40u racks full of them...
>
>>> Note that geforces are overclocked (my gtx 285 by 30% compared to a
>>> tesla with
>>> the same chip)
>
> well, they're tuned differently: gf cards have substantially higher memory
> clocks and lower shader clocks. tesla has higher shader and substantially
> slower memory clocks (presumably because there are more loads on the bus.)
>

yes, they're tuned differently, but because they are meant for different 
markets. gf are for the gamer market and are assumed to be run for several hours 
at a time, without too many of them in the machine fighting for airflow (gamer 
setup). Throttling is no big issue if needed.

tesla is a server product that needs to run 24/7 for days/months without 
throttling (consistent output). Usually there are several of them in one machine 
(or shared quadro + tesla)

Another issue is tolerance to memory errors. Higher temp/clock can cause more 
memory errors. These may cause small unnoticeable glitches for game graphics but 
will ruin hpc results.

The two main issues taken into account for tuning is running time, and leniency 
to throttling.

>>> and are actively cooled, which means that you need to get air
>>> flowing into the side fan. That's exactly why they put the tesla m
>>> and not the
>>> c into those boxes.
>
> why is this a problem with 1U? or do you really mean "double-wide cards
> don't provide enough clearance in 1U to get air to the card's intake"?
>

all the cards we are talking about are double wide. c1060 is actively cooled and 
is designed for a desktop pc. m1060 is passively cooled and designed for a 1u 
server.

the c1060 assumes side air intake and rear exhaust. m1060 expects through flow 
and no external access to exhaust.

Different design based on different airflow paradigms.

I never built some systems so I'm not talking from experience but assumption (we 
are using c1060 in desktops and s1070 in servers). I'm not sure if a double wide 
card with side air intake in a 1u box allow any airflow to reach the air intake 
and thus the GPU. Maybe you can mod the card by taking the plastic off to 
improve airflow though.

It looks from their site that they support double wide cards in their boxes so I 
guess that they tested the cooling. They definitely have more experience than me 
with such setups.

I didn't say that it doesn't work, I just advised that you make sure as it 
sounded borderline to me and as noted previously by someone else, it has caused 
problem for people.

> -mark hahn
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list