[Beowulf] GPU Beowulf Clusters

Sun Jan 31 12:31:40 PST 2010

I employ a GX285 in a dedicated remote-access graphics box for 
data-local visualization and run into some of these issues, too.  More 
inline, but Micha has it right.

Micha Feigin wrote:
> On Sat, 30 Jan 2010 10:24:09 -0800
> Jon Forrest <jlforrest at berkeley.edu> wrote:
> 
>> On 1/30/2010 4:31 AM, Micha Feigin wrote:
>>
>>> It is recommended BTW, that you have at least the same amount of system memory
>>> as GPU memory, so with tesla it is 4GB per GPU.
>> I'm not going to get Teslas, for several reasons:
>>
>> 1) This is a proof of concept cluster. Spending $1200
>> per graphics card means that the GPUs alone, assuming
>> 2 GPUs, would cost as much as a whole node with
>> 2 consumer-grade cards. (See below)
>>
> 
> Be very very sure that consumer geforces can go in 1u boxes. It's not so much
> the space as much as I'm skeptical with their ability of handling the thermal
> issues. They are just not designed for this kind of work.

I've had to go to 2u and eventually to larger boxes because of power 
supply and air-flow requirements. This is a big issue.

> Note that geforces are overclocked (my gtx 285 by 30% compared to a tesla with
> the same chip) and are actively cooled, which means that you need to get air
> flowing into the side fan. That's exactly why they put the tesla m and not the
> c into those boxes.

Depends on where you get your gx from. I've got one that claims to not 
be overclocked but also claims to be as fast as one that says it IS 
overclocked. Since I'm not yet interested enough to actually look at the 
onboard chip speeds, I don't know.  However, the one I've now got is in 
a 4u with additional forced air in the case to support an overtemp 
problem we had that was primarily flow-related (extra fans in the 2u and 
3u cases we tried). We've not wandered too far into GPGPU processing... 
our user community has not shown an interest in it, but for graphics, 
it's useful.

> The geforce driver also throttles the card under load to solve thermal issues.

I believe this depends on onboard temp monitoring. Again, sufficient 
airflow is your friend.

> You will probably want to under clock the cards to the tesla spec and be sure
> to monitor the thermal state.
> 
> I know someone who works with 3 gtx295 in a desktop box and he initially had
> some thermal shutdown issues with older drivers. I'm guessing that the newer
> drivers just throttle the cards more aggressively under load.
> 
>> 2) We know that the Fermi cards are coming out
>> soon. If we were going to spend big bucks
>> on GPUs, we'd wait for them. But, our funding
>> runs out before the Fermis will be available.
>> This is too bad but there's nothing I can do
>> about it.
>>
> 
> Check out the mad scientist program, it's supposed to end today, but maybe if you talk to NVidia they can still get you into it (they are rather flexible, esspecially with universities, and they also offer if for companies)
> http://www.nvidia.com/object/mad_science_promo.html
> You can buy a current telsa (t10 core) and upgrade it for a fermi (t20 core)
> when it comes out for the cost difference. May be more cost effective if you do
> plan to build a fermi cluster later on. It is designed to upgrade to the same
> line though (c, m or s) so you may want to consider now which one to go with.
> 
>> See below for comments regarding CPUs and cores.
>>
>>> You use dedicated systems. Either one 1u pizza box for the CPU and a matched 1u
>>> tesla s1070 pizza box which has 4 tesla GPUs
>> Since my first post I've learned about the Supermicro boxes
>> that have space for two GPUs
>> (http://www.supermicro.com/products/system/1U/6016/SYS-6016GT-TF.cfm?GPU=) .
>> This looks like a good way to go for a proof-of-concept cluster. Plus,
>> since we have to pay $10/U/month at the Data Center, it's a good
>> way to use space.
>>
> 
> See my previous comment
> 
>> The GPU that looks the most promising is the GeForce GTX275.
>> (http://www.evga.com/products/moreInfo.asp?pn=017-P3-1175-AR)
>> It has 1792MB of RAM and is only ~$300. I realize that there
>> are better cards but for this proof-of-concept cluster we
>> want to get the best bang for the buck. Later, after we've
>> ported our programs, and have some experience optimizing them,
>> then we'll consider something better, probably using whatever
>> the best Fermi-based card is.
>>
>> The research group that will be purchasing this cluster does
>> molecular dynamics simulations that usually take 24 hours or more
>> to complete using quad-core Xeons. We hope to bring down this
>> time substantially.
>>
>>> It doesn't have a swap in/swap out mechanism, so the way it may time share is
>>> by alternating kernels as long as there is enough memory. Shouldn't be done for
>>> HPC (same with CPU by the way due to numa/l2 cache and context switching
>>> issues).
>> Right. So this means 4 cores should be good enough for 2 GPUs.
>> I wish somebody made a motherboard that would allow 6-core
>> AMD Istanbuls, but they don't. Putting 2 4-cores CPUs on the
>> motherboard might not be worth the cost. I'm not sure.
>>
>>> The processes will be sharing the pci bus though for communications so you may
>>> prefer to setup the system as 1 job per machine or at least a round robin
>>> scheduler.
>> This is another reason not to go crazy with lots of cores.
>> They'll be sitting idle most of the time, unless I also
>> create queues for normal non-GPU jobs.
>>
>>> Take note that the s1070 is ~6k$ so you are talking at most two to three
>>> machines here with your budget.
>> Ha, ha!! ~$6K should get me two compute nodes, complete
>> with graphics cards.

gerry