about fast interconnects and SCI in particular

PFENNIGER Daniel Daniel.Pfenniger@obs.unige.ch
Mon, 14 Jun 1999 13:03:51 -0400


On 14-Jun-99 at 18:23, Florent Calvayrac (fcalvay@aviion.univ-lemans.fr) wrote:
> 
> I have  been involved since October 1998 in the definition, fund raising
> and purchase  of a cluster for computational physics purposes,
> and  we are about to take a final decision on the nature of the cluster
> (processors and communications hardware).
> 
> I had already asked the following question on comp.parallel last year,
> and got various and interesting answers, but am still in trouble :
> 
> -----------------------------------------
> Considering a given total budget (around $100,000) is it better to spend
> nearly all  of it into  ultrafast communications hardware (say Myrinet or 
> SCI) and then to buy 16 CPUs, or to only buy a Fast Ethernet switch and 32
> or 64 (with SMP) faster processors ?
> 
>  Since several users will be using the system, the needs for communications
>  can not be estimated accurately.
> 
> ------------------------------------------
> I include a summary of the most informative answers at the end of this
> posting.
> 
...

We had a similar decision to make a year ago.  Because the ratio of fine 
grain to coarse grain computations was also not well defined, we 
built for 2/3 of the budget a 66 node PII cluster with switched Fast Ethernet. 
After experimenting for 6 months we can now see whether we want to 
enhance (in decreasing order of likeliness):

	1) the number of processors (all the boards are dual)
	2) the node RAM
	3) the network (via channel bonding)
	4) the hard disks
	5) other features

Since in between the component costs have decreased by the standard rate
while our practical experience has increased, we can much better evaluate 
which parameter we want to double with the remaining funding.

I would say that fine grain parallel problems are still best performed 
on traditional supercomputers, but a lot of applications that used to be
done on supercomputers can now be made as well on Beowulfs for a fraction of
the cost.

Finally, the more simultaneous users are allowed, the 
less one should invest in the network, for the obvious reason that 
the different concurrent applications are independent from each 
others.  An expensive network is justified only if one must run 
applications on all the nodes simultaneously. 

	Daniel Pfenniger









~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Dr Daniel Pfenniger 			  | Daniel.Pfenniger@obs.unige.ch
 Geneva Observatory, University of Geneva | tel: +41 (22) 755 2611 
 CH-1290 Sauverny, Switzerland		  | fax: +41 (22) 755 3983
__________________________________________________________________________