about fast interconnects and SCI in particular

Robert G. Brown rgb@phy.duke.edu
Mon, 14 Jun 1999 13:41:05 -0400


On Mon, 14 Jun 1999, Florent Calvayrac wrote:

> 
> I have  been involved since October 1998 in the definition, fund raising
> and purchase  of a cluster for computational physics purposes,
> and  we are about to take a final decision on the nature of the cluster
> (processors and communications hardware).
> 
> I had already asked the following question on comp.parallel last year,
> and got various and interesting answers, but am still in trouble :
> 
> -----------------------------------------
> Considering a given total budget (around $100,000) is it better to spend nearly
> all  of it into  ultrafast communications hardware (say Myrinet or  SCI)
> and then to buy 16 CPUs, or to only buy a Fast Ethernet switch and 32 or 64
> (with SMP) faster processors ?

There is a standard answer to this FAQ: "It depends on your problem".
If your problem mix is fine grained on fast switched ethernet you will
want the faster network (probably mated with faster CPUs) to make it
coarser grained (reduced the time spent communicating relative to the
time spent computing).  If it is already coarse grained on fast ethernet
and scales well to many processors (lots of parallel work, only a little
serial work, high Amdahl's Law limit) then you should get as much total
CPU as possible.  In between you have to balance your budget, probably
focusing on the finer grain problems as coarse grain problems will
always run on an even faster network just fine.

As several of your previous responders noted, latency can be as
important as raw bandwidth for certain problems, but the more expensive
networks often have both higher bandwidth and lower latency, with the
possible exception still of gigabit ethernet.  Again, this depends on
your problem -- is it sending lots of tiny packets or a few big ones (or
lots of big ones) between barriers?

Until you analyze your problem mix, nobody can answer the question
above because it doesn't >>have<< an answer.  How can I (or anybody)
know if it is better to focus on one resource or bottleneck at the
expense of another without knowing if the bottleneck is relevant to your
problems?  If one of your problems is one of the several out there that
scale terribly on fast ethernet and Intel CPUs, I'd look pretty silly
advising you to go that way.  If your problem only occupies 10 MB of
core, we'd both be silly adding 512 MB of memory to each node at the
expense of CPU speed or network speed (which might be crucial).

What kinds of computational physics problems will be run?  I do Monte
Carlo, which is embarrassingly coarse grained to just coarse grained
parallelizable.  Ergo, I don't care TOO much about network speed, but
love to get as much total CPU as I can for my money.  Somebody doing
gravitational cosmology or weather prediction or hydrodynamics or
anything with a relatively small grain size (or in physics-speak, with
long range interactions) would need to be very concerned about
communications channels -- some of those problems won't scale past a
very few (say, 4-8) Intel CPUs connected by fast ethernet.

What I will say is the following.  You should, in VERY GENERAL TERMS, be
comparing something like the latest Alpha systems with Myrinet (which
Greg Lindahl on this list has shown to scale right up there with the
best big iron on a fairly wide class of fine grain problems) to single
or dual CPU Intel nodes with one or more fast ethernet channels per
node.  The first will run basically anything one can run on any kind of
cluster or beowulf -- if it won't run on this, it isn't likely to run on
anything else, and the Alphas are superb floating point performers even
on single-threaded code.  The second will give you a large, CPU-rich
farm of systems to run coarse grained code (e.g. Monte Carlo
simulations, povray-pvm for rendering, and so on) and leave you with
SOME ability to tweak it up for medium grained problems of moderate
size.

>  Since several users will be using the system, the needs for communications
>  can not be estimated accurately.

This is not good enough.  At the very least, you need an idea of what
the "worst" (finest grained) problem is that needs to run well on the
setup you are designing.  Otherwise you will almost certainly overkill
on the high side or (perhaps worse) underkill and leave some potential
user irritated that the 'wulf won't do their problem.  Go bug people and
get them to describe their problems in some detail.  Do some research
and learn how parallelizable their problem components are.  Otherwise,
you might as well take a dartboard and cover it with segments labelled
with various configurations you can afford and throw a dart at it
blindfolded.

If this really, truly, is impossible (sigh), then your "safe" solution
is likely to be going with an AlphaLinux/Myrinet cluster.  I was
>>very<< impressed by the capability of this solution on the high end,
and its cost/benefit (or price/performance if you prefer) was not at all
far from the best one can manage with Intel.  The cost numbers looked
something like $5-6K/node (including the Myrinet port), so you could
afford maybe 20 nodes.  Each 21264 looked to be 3-4 times the speed of a
high end Intel processor, so this was CPU-equivalent to buying something
like 60-64 high end Intel CPU's but it scales far better on difficult
problems.

Sure (as others on the list will undoubtedly tell you) if you content
yourself with ultra-cheap Celeron nodes at maybe $750 each, you can get
MORE raw CPU going the cheap/Intel/FSE route, but you will not be able
to get the scaling on "real" parallel problems that you will from the
Alpha/Myrinet-based solution.  I'd personally still go the Celeron
route, but then I know my problem class very well; if I were designing
for a more general crowd I'd certainly think hard about putting in a
fast network and fewer, faster CPUs.  A final benefit of this is that
for problems that DON'T parallelize, it certainly doesn't hurt to be
able to run them 3 times faster on one of your nodes...;-)

   rgb

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb@phy.duke.edu