[Beowulf] 512 nodes Myrinet cluster Challanges

Jaime Perea jaime at iaa.es
Fri Apr 28 00:27:19 PDT 2006

El Miércoles, 26 de Abril de 2006 12:34, Walid escribió:
> Hi all,
> Does any one know what types of problems/challanges for big clusters?
> we are considering having a 512 node cluster that will be using
> Myrinet as its main interconnect, and would like to do our homework
> The cluster is meant to run an inhouse fluid simulation application
> that is I/O intensve, and requires large memory models.
> any hints, pointers will be apperciated
> Walid.
I'm not sure if this is going to be my first posting.

We have a small (16 dual xeon nodes) with myrinet. It works 
quite well and having mpich-gm is a plus, it gives very low 
latency and good bandwith.  Also we are doing some
work with the MareNostrum and although I heard that 
perhaps there are scaling problems when you are going 
to a really large number of tasks, that is not really my 
experience. Perhaps myrinet forces the use of mpi instead
of pvm, although there are alternatives. (in principle you 
can use an ethernet emulation, while being quite fast is not
the same at all)
>From my point of view, the big problem there is the IO, we installed
on our small cluster the pvfs2 system, it works well, using the
myrinet gm for the passing mechanism, the pvfs2 is only a solution
for parallel IO, since mpi can use it. On the other hand it can not
be used for the normal user stuff, so you have to take that into
account and think carefully on how to install a good poweful nfs server
machine which has to be on an alternative standard network. On the
"other" architecture IBM's gpfs is really a nice alternative.  

All the best


           Jaime D. Perea Duarte. <jaime at iaa dot es>
             Linux registered user #10472

           Dep. Astrofisica Extragalactica.
           Instituto de Astrofisica de Andalucia (CSIC)
           Apdo. 3004, 18080 Granada, Spain. 

More information about the Beowulf mailing list