Sandia releases Cplant software

Ron Brightwell rbbrigh at valeria.mp.sandia.gov
Sat Jun 9 11:33:58 PDT 2001


> 
>   From the news release:
> 
> "dramatically extends the capability of researchers to modularly 
> assemble large blocks of off-the-shelf computer components"
> 
> "While other cluster software may run faster, none exceed the Cplant 
> system software's ability to help off-the-shelf processors work together 
> in large numbers."
> 
> OK, so what's this suppose to mean?? It doesn't make cluster software 
> run faster, it just helps their ability to work together in large numbers.??
> 
> If it doesn't run the code faster, what does it do to make processors 
> exceed in their ability to work together? Are they trying to say it 
> increases the efficiency of the internetworking clusters? I wonder how 
> this article would read if someone who knew what was going on with the 
> Cplant software actually wrote the news release.

So there are a couple of things going on here.  Yes, the PR department wrote
the news release, so it was written by people with little technical knowledge
on the subject and intended for people with little technical knowledge about
the subject.  You also have to understand that Sandia's PR department works
hardest at keeping Sandia out of the news and not the other way around,
so there's some political manuevering going on as well.  Nothing that I can
comment on directly, but you can probably figure out why the description is
somewhat muted.

> 
> Can anyone comment on the what the Cplant system software actually does 
> that is of benefit to the performance of clusters?
> 

That depends on which performance measurement your'e interested in.
If you're interested in how quickly you can distribute system software
or boot a cluster, then we have a hierarchical architecture that allows
system software to be distributed to a thousand nodes in matter of a few
minutes and the ability to boot all of those nodes in about the same time.
If you're interested in how quickly you can launch a parallel application,
our runtime environment software starts 1000+ nodes jobs in about 15 seconds.
If you're interested in just application performance, then we don't do a
whole lot.  The message passing software that we have done for Myrinet has
worse latency and bandwidth performance than Myricom's GM.

I think the press release was trying to say was that we are addressing
the scalability limitations of the typical cluster software environment.
We've been able to deploy a 1000+ node cluster with this software.  Other
(smaller) clusters may be faster, but the software on these clusters will
not scale to the size that we have been able to.

We have designed the Cplant hardware and software architecture to be scalable,
but we're not done with the implementation.  There are a few things, like
the Myrinet, where our current implementation meets our functional goals,
but not our performance goals.

-Ron 






More information about the Beowulf mailing list