1 GFLOP / Parallel Input-Output Systems / AI
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduWed Sep 13 05:35:47 PDT 2000
- Previous message: 1 GFLOP / Parallel Input-Output Systems / AI
- Next message: newbie: rsh and pvm problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Tue, 12 Sep 2000, Sahil Rewari wrote: > Hi, > Just wanted to know that in order to achieve 1 GFLOP what would the hardware requirements for a Linux cluster be? > I am about to begin on making my first cluster for research in mainly two fields which are parallel input / output systems and the further development of Artificial Intelligence (AI) systems. > I stay in Mumbai (INDIA) and am finding it difficult to get likeminded people to work on these projects. Linux in general is used by very few people here. The inadequate availability of service and support professionals and training for Linux probably is the cause to this. My entire research work is done with the availability of resources on the Internet and through other people working on such projects. Are there any other people working on similar projects? Please do write to me at nesol at bol.net.in As Greg already said, this depends strongly on your application and the design of your cluster. One answer might be as few as two machines, for example (and you might be able to get there in one) if the bulk of your application (the core loop) is tiny and can run entirely in L1 cache on a (say) 800 or 900 MHz PIII CPU. On the other hand, if it is big and highly nonlocal in memory, it might be as many as 25 e.g. 500 MHz Celeron CPUs, and this is assuming that the application itself is embarrassingly parallel so the parallel design of the cluster is mostly irrelevant. If the application is coarse to medium grained and has some significant fraction of interprocessor communications (IPC's) during its execution, then it is probably that you cannot just add up the FLOPS of N cpus to get N*FLOPS performance. Indeed, you may well not be able to get to a GFLOP at all with any number of CPUs with certain beowulf designs. It sounds like you need to do a bit of reading on parallel program design and speedup scaling and then run some benchmarks on candidate systems to come up with a quantitative answer, if this really matters to you. On the other hand, if you are just interested in generating a big number (or a justifiable number:-) for a grant proposal, then by all means, just add up the MFLOPS for N nodes. This number will represent a kind of peak (for embarrassingly parallel or very coarse grained applications) but is the "best" number your cluster might achieve. To find some help in learning about parallel program design and scaling and all that, look for links under: http://www.phy.duke.edu/brahma where there are a number of papers and presentations that discuss speedup scaling and application profiling. To get a very nice set of microbenchmarking tools to run on a prospective node, check out lmbench at http://www.bitmover.com lmbench is used by (among many others) Linus Torvalds and the kernel development folks to tune and optimize the kernel; it is a well-designed and reliable package that I hope to talk about some at the ALSC next month. HTH, rgb > Any help / suggestions regarding the above will be highly appreciated. > Thanks in Advance, > > Regards, > Sahil Rewari > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: 1 GFLOP / Parallel Input-Output Systems / AI
- Next message: newbie: rsh and pvm problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
