[Beowulf] What services do you run on your cluster nodes?

Joe Landman landman at scalableinformatics.com
Tue Sep 23 05:27:44 PDT 2008


John Hearns wrote:

> That's a reason why I'm no great lover of Ganglia too - it just sprays 
> multicast packets all over your network.

Yeah ... try to use a network in the middle of a multicast storm.  As I 
remember, every machine seeing a multicast packet has to at least 
inspect the packet to see if this IP is being subscribed to.  If so, 
they have to deliver the contents to the multicast consumer.

> Which really should be OK - but if you have switches which don't perform 
> well with multicast you get problems.

... or a crappy driver->TCP stack implementation on your local machine 
or cluster (cough cough ... vendor's name elided to protect those who 
really ought to be exposed)

I think the major problem that people racking and stacking boxes in an 
effort to build a cluster make is that they just don't grasp how things 
scale, or haven't run into the scaling issue due to lack of experience.

Large scale out machines/codes/runs often have surprising (and sometimes 
banal) failure modes.  We have customers whom have run into some rather 
surprising (for them) scale-up problems, in large part due to the 
software they are using not taking into account *big* data sets.  In 
some cases (with source code) we could help fix the app.  In others (no 
source code) we could fix the underlying hardware cause.  Usually you 
don't hear about these things until you get the phone call about "things 
not working", and you have to walk the cat back to where the problem 
originated.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list