[Beowulf] [jak at uiuc.edu: Re: [APPL:Xgrid] [Xgrid] megaFlops per Dollar?]
eugen at leitl.org
Tue May 17 02:50:20 PDT 2005
----- Forwarded message from "Jay A. Kreibich" <jak at uiuc.edu> -----
From: "Jay A. Kreibich" <jak at uiuc.edu>
Date: Mon, 16 May 2005 01:22:25 -0500
To: "Holder, Darryl" <Darryl.Holder at us.army.mil>
Cc: Xgrid-users at lists.apple.com
Subject: Re: [APPL:Xgrid] [Xgrid] megaFlops per Dollar?
Reply-To: jak at uiuc.edu
On Wed, May 11, 2005 at 02:44:38PM -0500, Holder, Darryl scratched on the wall:
> Greetings all;
> I will this summer be assembling a "nano-Xgrid" system to perform a
> large antenna modeling calculation. I will be buying all of the hardware
> new of maximum clock speed, number of CPUs, (number of cores per CPU?)
> for this project. My question is:
> What is the best Mac platform for the maximum "megaFlops per Dollar"?
This is an exceptionally difficult problem to answer. There are many
factors involved, and the "right" choice can be heavily influenced by
political, management, business, and technical decisions. It is a
topic that you could write half a book about (actually a whole book,
if the publisher doesn't kill the project half-way through; but that's
While I could easily go off on a mutli-hundred line discussion about
I realize that nobody would really care to read it. So let me just
say that the real question is one of "USABLE megaFlops for TOTAL cost."
First, USABLE. Clusters are a compromise between the monolithic
architecture of 1980 style Crays and cost. Larger singular systems
are easier to program, easier to debug, and easier to utilize (not
universally, but nearly so). Distributed computing (i.e. clusters)
takes on additional "costs" (often non-monetary) in these areas to
compensate for the fact that a cluster often costs a tiny fraction of
a more traditional monolithic supercomputer. This is just good
business and engineering sense, and there are good reasons why the
majority of the systems on the Top-500 list are clusters.
But, it is important to recognize that (for most programs) there are
"distribution" costs, and they are a compromise. For most (but not
all!) applications, it is easier to get better performance on a lower
number of more powerful nodes. If your application efficiency is
reduced by 20% each time you double the node count, saving a few
bucks by buying a much larger number of less powerful systems is
going to be a losing proposition.
Next, TOTAL. Only looking at the price of the machines in a cluster
is like only looking at the price of a machine in a desktop
workstation. You don't get a very accurate assessment of costs unless
you include the monitor, keyboard, possible printer, and all of the
software (for example, Office, Photoshop, whatever) to make the
computer useful. There's also whole "cost of ownership" issue that
I'll just gloss over. If you're looking at performance optimizing your
dollar, you need to look at your total dollars, not just what the
machines cost. Many of the costs involved in building a cluster are
"per node" costs, and quickly dilute cost savings derived from
choosing a large number of *very* inexpensive systems. For example,
if you need a high-speed network, you need to pay for one (or more)
ports per node, not "per GFLOP" or "per $5K of computing hardware."
Same is true of items like large memory upgrades. If your
application requires every node to have a full copy of the data set,
and that means 4GB of RAM per node, that can get extremely expensive
if you have a large number of nodes. The biggest killer of all is
often software costs. If you are running commercial software (which,
at the very least, includes the OS (although this argument in
relation to the OS is reduced now that Tiger is on shipping machines))
you need to pay per-node. There are also costs that are related to
per-node, although not direct, such as space, power, and cooling.
While there is some "economy of scale" for admin costs, system
administration time becomes a much bigger issue with a larger number
In short, most of the time there is a pressure to move towards a
smaller number of more powerful systems. Obviously this isn't
absolute-- if carried to its logical extreme we'd be back to large
single monolithic systems-- but given that cost/performance is
generally linear (or close to it), most clusters will benefit from
the most powerful systems you can (practically) purchase.
> By this, I mean Mac desktop versus Mac 1U server chassis.
If the decision is purely an engineering one, and you don't have
other business questions (like what to do with the machines when the
cluster has completed useful lifetime) I'd go with the Xserve systems
in a second. Yes, you can get slightly more powerful desktop systems
(which, I realize, I just got done saying is a good idea), but the
space, power, and heat costs are considerable for a large(r) number
> I have no real space/cooling/power constraints,
Careful. Cooling, followed by power, are the two biggest mistakes
made when building clusters. Xserve systems put out about 1000+ BTUs
when running all out. A G5 PowerMac can put out twice that. You can
heat a small house (and consume its whole electrical capacity) with
a 32 node PowerMac cluster.
PowerMacs also take up about 5x the space. Again, not a big deal for
four or five, but a huge deal for 20 or 30.
Other issues aside, one other consideration that I didn't see
mentioned is that the Xserve systems come with OS X Server, while the
PowerMacs do not. You don't really need Server on the compute nodes,
but if you are running a cluster of four or more systems without Server
on the head-node, you're insane (and wasting money and time). The
ability to image compute nodes via NetBoot/NetInstall and provide a
centralized Directory for all account and preference information is
extremely valuable for a cluster of any size.
Humm... I'm still over 100 lines, but not by much.
Jay A. Kreibich | CommTech, Emrg Net Tech Svcs
jak at uiuc.edu | Campus IT & Edu Svcs
<http://www.uiuc.edu/~jak> | University of Illinois at U/C
Do not post admin requests to the list. They will be ignored.
Xgrid-users mailing list (Xgrid-users at lists.apple.com)
Help/Unsubscribe/Update your Subscription:
This email sent to eugen at leitl.org
----- End forwarded message -----
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 189 bytes
Desc: Digital signature
More information about the Beowulf