[Beowulf] GPU Beowulf Clusters
tvsingh at ucla.edu
Thu Jan 28 12:57:05 PST 2010
This is not a problem in your setup as you are assigning a whole node
together. In general how one can deal with problem of binding a
particular gpu device to scheduler?
Sorry if I am asking something which is already known and there are ways
to bind the devices within scheduler.
From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]
On Behalf Of Michael Di Domenico
Sent: Thursday, January 28, 2010 9:54 AM
To: Beowulf Mailing List
Subject: Re: [Beowulf] GPU Beowulf Clusters
The way I do it is, but your mileage may vary...
We allocate two CPU's per GPU and use the Nvidia Tesla S1070 1U
So a standard quad/core - dual/socket server with four GPU's attached
We've found that even though you expect the GPU to do most of the
work, it really takes a CPU to drive the GPU and keep it busy
Having a second CPU to pre-stage/post-stage the memory has worked
pretty well also.
For scheduling, we use SLURM and allocate one entire node per job, no
On Thu, Jan 28, 2010 at 12:38 PM, Jon Forrest <jlforrest at berkeley.edu>
> I'm about to spend ~$20K on a new cluster
> that will be a proof-of-concept for doing
> GPU-based computing in one of the research
> groups here.
> A GPU cluster is different from a traditional
> HPC cluster in several ways:
> 1) The CPU speed and number of cores are not
> that important because most of the computing will
> be done inside the GPU.
> 2) Serious GPU boards are large enough that
> they don't easily fit into standard 1U pizza
> boxes. Plus, they require more power than the
> standard power supplies in such boxes can
> provide. I'm not familiar with the boxes
> that therefore should be used in a GPU cluster.
> 3) Ideally, I'd like to put more than one GPU
> card in each computer node, but then I hit the
> issues in #2 even harder.
> 4) Assuming that a GPU can't be "time shared",
> this means that I'll have to set up my batch
> engine to treat the GPU as a non-sharable resource.
> This means that I'll only be able to run as many
> jobs on a compute node as I have GPUs. This also means
> that it would be wasteful to put CPUs in a compute
> node with more cores than the number GPUs in the
> node. (This is assuming that the jobs don't do
> anything parallel on the CPUs - only on the GPUs).
> Even if GPUs can be time shared, given the expense
> of copying between main memory and GPU memory,
> sharing GPUs among several processes will degrade
> Are there any other issues I'm leaving out?
> Jon Forrest
> Research Computing Support
> College of Chemistry
> 173 Tan Hall
> University of California Berkeley
> Berkeley, CA
> jlforrest at berkeley.edu
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> To change your subscription (digest mode or unsubscribe) visit
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf