Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David S. Greenberg dsg at super.orgThu Jun 22 08:45:53 PDT 2000
- Previous message: Beowulf: A theorical approach
- Next message: Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm going to use Mr. Ruiz' question as a springboard for a little pontificating and some conference advertising so caveat lire. I paraphrase Mr. Ruiz' question: Can commodity clusters compete with supercomputers on a performance basis and not just on a price-performance basis? Some of us who believe that the answer is yes have been promoting the idea through the Extreme Linux mailing list, website (www.extremelinux.org), and workshops. The challenge is to answer many of Mr. Ruiz' questions and more and when necessary to work together to fill in missing pieces. Now a brief digression for a conference advertisement. The Extreme Linux Track will be part of the Atlanta Linux Showcase and Conference, http://www.linuxshowcase.org/, from October 12-14, 2000. The refereed papers portion of the track has been determined but we are leaving two 90 minute sessions open for participation. The first of these sessions will be devoted to working cluster updates. Everyone is encourage to send us one-page formatted descriptions of their cluster (typically a black-and-white picture with text telling how big it has grown and what cool things it has done over the last year - sort of a Christmas/New Years card from your cluster). We will publish the one-pagers in the proceedings and give as many folks as possible (in order of submission) a few minutes to present to the workshop. Similarly there will be a session for one-page descriptions of applications. Here we'd like to hear about how well your application runs on clusters, about what you'd like most to have added to clusters, and about comparisons with runs on classic supercomputers. Send your one-pagers to me, dsg at super.org, with the subject, EL2000 one-pager. Remember, first come first serve. Back to the question at hand, how to make a supercomputer from commodity parts. Many of us have determined that not only should it be possible to "build your own" supercomputer but it is likely to be the only way to do so since "supercomputer" companies are quickly disappearing. There are several approaches: (1) Design it yourself and build it yourself. The example I'm most familiar with is the CPlant project at Sandia (www.cs.sandia.gov/~cplant). Based on the success of the ASCI Red 9000+ processor Intel machine the Sandia team set out to duplicate/surpass its performance, usability, and extensibility with high-end but "commodity" parts. They chose Alpha processors and Myrinet interconnect. They have been a regular in the top third of the top 500 for several years and continue to grow bigger each year. 2) Customize a stock design and get someone else to build it for you. There are several small to mid-size companies which specialize in this. I've been meaning to update my list (perhaps some readers will help). The list includes at least Altatech, Atlantec, Aspen, DCG,HPTi, Paralogic, TurboLinux, VALinux. 3) Convince a vendor to make a product out of the idea. The two biggest examples of this are the Compaq SC series which clusters 4-way Alpha boxes using the Quadrics interconnect and the IBM move toward Linux clusters, see in particular the Roadrunner cluster at UNM, www.alliance.unm.edu and the Chiba City cluster at Argonne, http://www-unix.mcs.anl.gov/chiba/. A big advantage of clusters is that it is possible to customize to your exact needs. Of course, as is often mentioned on these lists, you must first understand your needs which can take some time. The range of choices can sometimes seem overwhelming but the nice thing is that there are many solutions which will be in some sense 90% optimal. The real trick is to pick something reasonable and get it up and running your applications while it is still "hot" hardware. You can modify and upgrade later as you learn more about your needs. One note of interest is that the cost of a supercomputer use to be mostly in the processors. Then the cost moved to the memory. We are currently seeing a move to putting money in the interconnect (both the memory to processor bus and the internode network). Each such change in focus is difficult for buyers to make since it seems like "too much is being spent on specialized hardware". My advise is to go with the trend. Two major software issues for really large machines are system administration/fault tolerance and parallel IO. Don't miss the panels and papers on these topics at the Extreme Linux Track. David Nacho Ruiz wrote: > Hi, > > I'm doing my final year project and I'm writting about the Beowulf project > an the Beowulf clusters. > I've been reading several documents about the beowulf clusters, but I would > like to ask all of you some questions about them. > > As I've seen the main objective behind any Beowulf cluster is the > price/performance tag, specially when compared to supercomputers. But as > network hardware and commodity systems are becoming faster and faster > (getting closer to GHz and Gigabit speeds), could you think on competting > directly with supercomputers? > > As I see it the Beowulf cluster idea could be based in the distributed > computign and the parallel computing: you put more CPUs to get more speedup, > but as you can't have all the CPUs in the same machine you use several. So > the Beowulf cluster could fit in between the distributed computing and the > supercomputers (vetorial computers, parallel computers,..etc). You have > advantages from both sides: parallel programming and high scalability; but > you also have several drawbacks: mainly interconection problems. Do you > think that with 10 Gb conections (OC-192 bandwith), SMP in chip (Power 4) > and massive primary and secondary memory devices at low cost, you could > have a chance to beat most of the traditional supercomputers? or is not your > "goal"? > > And about the evolution of the Beowulf clusters, do you all follow a kind of > guideness or the project have divided in several flavors and objectives? > Are the objectives of the beggining the same as today or now you plan to > have something like a "super SMP computer" in a distributed way (with good > communications times). I've seen that a lot of you are focusing in the GPID > and whole machine idea, do you think that is reachable? What are the main > objectives vs the MPI/PVM message passing idea? > And what about shared memory (in the HD level or the RAM level), do you take > advantage of having this amount of resouces? > > Is this idea trying to reach the objective of making parallel programs > "independent" to the programmer? I mean, that instead of having to program > having in mind that you are using a parallel machine you can program in a > "normal" way and the compiler will divide/distribute the code over the > cluster. Is this reachable or just a dream? Is somebody working on this? > > And what about the administration of a cluster. Having all the machine of > the cluster under control, so you can know which are avaliable to send some > work, is an hazarous task but necessary. Is not as easy as in a SMP machine > where you know or assume that all the CPUs inside are working, in a cluster > you can't do that as the CPU might work but the HD, NIC or memory may fail. > How much computational time do you spend in this task? There's somebody > working in a better way to manage with this? > > I know that sometime ago HP had a machine woth several faulty processors > working and achiving high computational speeds without any error. They used > some kind of "control algorithm" that manages to use only the good CPUs. Do > you have something like this or there is no point? Does it make sense? > > That's all for now, thanks to all of you. > If you know of some sources where I can get more information, please let me > know. > > Nacho Ruiz. > > _______________________________________________ > Beowulf mailing list > Beowulf at beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: Beowulf: A theorical approach
- Next message: Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
