[Beowulf] [jak at uiuc.edu: Re: [APPL:Xgrid] [Xgrid] megaFlops per Dollar?]

Eugen Leitl eugen at leitl.org
Tue May 17 02:50:20 PDT 2005

----- Forwarded message from "Jay A. Kreibich" <jak at uiuc.edu> -----

From: "Jay A. Kreibich" <jak at uiuc.edu>
Date: Mon, 16 May 2005 01:22:25 -0500
To: "Holder, Darryl" <Darryl.Holder at us.army.mil>
Cc: Xgrid-users at lists.apple.com
Subject: Re: [APPL:Xgrid] [Xgrid] megaFlops per Dollar?
User-Agent: Mutt/
Reply-To: jak at uiuc.edu

On Wed, May 11, 2005 at 02:44:38PM -0500, Holder, Darryl scratched on the wall:
> Greetings all;
> I will this summer be assembling a "nano-Xgrid" system to perform a
> large antenna modeling calculation. I will be buying all of the hardware
> new of maximum clock speed, number of CPUs, (number of cores per CPU?)
> for this project. My question is:
> What is the best Mac platform for the maximum "megaFlops per Dollar"? 

  This is an exceptionally difficult problem to answer.  There are many
  factors involved, and the "right" choice can be heavily influenced by
  political, management, business, and technical decisions.  It is a
  topic that you could write half a book about (actually a whole book,
  if the publisher doesn't kill the project half-way through; but that's
  different story).

  While I could easily go off on a mutli-hundred line discussion about
  I realize that nobody would really care to read it.  So let me just
  say that the real question is one of "USABLE megaFlops for TOTAL cost."

  First, USABLE.  Clusters are a compromise between the monolithic
  architecture of 1980 style Crays and cost.  Larger singular systems
  are easier to program, easier to debug, and easier to utilize (not
  universally, but nearly so).  Distributed computing (i.e. clusters)
  takes on additional "costs" (often non-monetary) in these areas to
  compensate for the fact that a cluster often costs a tiny fraction of
  a more traditional monolithic supercomputer.  This is just good
  business and engineering sense, and there are good reasons why the
  majority of the systems on the Top-500 list are clusters.

  But, it is important to recognize that (for most programs) there are
  "distribution" costs, and they are a compromise.  For most (but not
  all!) applications, it is easier to get better performance on a lower
  number of more powerful nodes.  If your application efficiency is
  reduced by 20% each time you double the node count, saving a few
  bucks by buying a much larger number of less powerful systems is
  going to be a losing proposition.

  Next, TOTAL.  Only looking at the price of the machines in a cluster
  is like only looking at the price of a machine in a desktop
  workstation.  You don't get a very accurate assessment of costs unless
  you include the monitor, keyboard, possible printer, and all of the
  software (for example, Office, Photoshop, whatever) to make the
  computer useful.  There's also whole "cost of ownership" issue that
  I'll just gloss over.  If you're looking at performance optimizing your 
  dollar, you need to look at your total dollars, not just what the
  machines cost.  Many of the costs involved in building a cluster are
  "per node" costs, and quickly dilute cost savings derived from
  choosing a large number of *very* inexpensive systems.  For example,
  if you need a high-speed network, you need to pay for one (or more)
  ports per node, not "per GFLOP" or "per $5K of computing hardware."
  Same is true of items like large memory upgrades.  If your
  application requires every node to have a full copy of the data set,
  and that means 4GB of RAM per node, that can get extremely expensive
  if you have a large number of nodes.  The biggest killer of all is
  often software costs.  If you are running commercial software (which,
  at the very least, includes the OS (although this argument in
  relation to the OS is reduced now that Tiger is on shipping machines))
  you need to pay per-node.  There are also costs that are related to
  per-node, although not direct, such as space, power, and cooling.
  While there is some "economy of scale" for admin costs, system
  administration time becomes a much bigger issue with a larger number
  of systems.

  In short, most of the time there is a pressure to move towards a
  smaller number of more powerful systems.  Obviously this isn't
  absolute-- if carried to its logical extreme we'd be back to large
  single monolithic systems-- but given that cost/performance is
  generally linear (or close to it), most clusters will benefit from
  the most powerful systems you can (practically) purchase.

> By this, I mean Mac desktop versus Mac 1U server chassis.

  If the decision is purely an engineering one, and you don't have
  other business questions (like what to do with the machines when the
  cluster has completed useful lifetime) I'd go with the Xserve systems
  in a second.  Yes, you can get slightly more powerful desktop systems
  (which, I realize, I just got done saying is a good idea), but the
  space, power, and heat costs are considerable for a large(r) number
  of nodes.

> I have no real space/cooling/power constraints,

  Careful.  Cooling, followed by power, are the two biggest mistakes
  made when building clusters.  Xserve systems put out about 1000+ BTUs
  when running all out.  A G5 PowerMac can put out twice that.  You can
  heat a small house (and consume its whole electrical capacity) with
  a 32 node PowerMac cluster.

  PowerMacs also take up about 5x the space.  Again, not a big deal for
  four or five, but a huge deal for 20 or 30.

  Other issues aside, one other consideration that I didn't see
  mentioned is that the Xserve systems come with OS X Server, while the
  PowerMacs do not.  You don't really need Server on the compute nodes,
  but if you are running a cluster of four or more systems without Server
  on the head-node, you're insane (and wasting money and time).  The
  ability to image compute nodes via NetBoot/NetInstall and provide a
  centralized Directory for all account and preference information is
  extremely valuable for a cluster of any size.

  Humm... I'm still over 100 lines, but not by much.


                     Jay A. Kreibich | CommTech, Emrg Net Tech Svcs
                        jak at uiuc.edu | Campus IT & Edu Svcs
          <http://www.uiuc.edu/~jak> | University of Illinois at U/C
Do not post admin requests to the list. They will be ignored.
Xgrid-users mailing list      (Xgrid-users at lists.apple.com)
Help/Unsubscribe/Update your Subscription:

This email sent to eugen at leitl.org

----- End forwarded message -----
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20050517/7971aea9/attachment.sig>

More information about the Beowulf mailing list