[Beowulf] $2500 cluster. What it's good for?
Jeffrey B. Layton
laytonjb at charter.net
Sun Dec 19 14:30:55 PST 2004
Jim Lux wrote:
>>On Sat, 18 Dec 2004, Jim Lux wrote:
>>>I think it would be interesting to contemplate potential uses of a $2500
>>>cluster. Once you've had the thrill of putting it together and
>>>something with POVray, what next?
>>That is the $64,000 dollar question. Here is my 2 cent answer.
>>BTW, your ideas are great. I would love to see a discussion like this
>>continue because we all know the hardware is easy part!
>>There is part of this project which has a "build it and they will come
>>(and write software)" dream. Not being that naive, I believe there are
>>some uses for systems like this. The indented audience are not the
>>uber-cluster-geeks on this list, but rather the education, home, hacker,
>>crowd. In regards to education, I think if cluster technology is readily
>>available, then perhaps students will look to these technologies to solve
>>problems. And who knows maybe the "Lotus 123 of the cluster" will be built
>>by some person or persons with some low cost hardware and an idea everyone
>>said would not work.
>>If you have followed the magazine, you will see that we highlighted
>>many open projects that are useful today. From an educational standpoint,
>>a small chemistry/biology department that can do quantum chemistry,
>>protein folding, or sequence analysis is pretty interesting to me.
>>There are others ares as well.
>I was thinking of the cluster video wall idea, however the video hardware
>would be kind of pricey (more than the cluster!). Something like using the
>cluster to provide the crunch to provide an immersive environment might be
I think this is coming RSN. Have you seen the prices of home projectors?
They are dropping very fast. So fast that I gave up trying to track them
for my own home. Of course, the projectors and the nodes are only part
of the whole system. There was a very cool article in ClusterWorld from
some people at NCSA that have developed a video wall in a box kind of
>>There are also some other immediate things like running Mosix or Condor
>>on the cluster. A small group that has a need for a computation server
>>could find this useful for single process computational jobs.
>This brings up an interesting optimization question. Just like in many
>things (I'm thinking RF amplifiers in specific) it's generally cheaper/more
>cost effective to buy one big thing IF it's fast enough to meet the
>requirements. Once you get past what ONE widget can do, then, you're forced
>to some form of parallelism or combining smaller widgets, and to a certain
>extent it matters not how many you need to combine (to an order of
>magnitude). The trade comes from the inevitable increase in system
>management/support/infrastructure to support N things compared to supporting
>just one. (This leaves aside high availability/high reliability kinds of
>So, for clusters, where's the breakpoint? Is it at whatever the fastest
>currently available processor is? This is kind of the question that's been
>raised before.. Do I buy N processors now with my grant money, or do I wait
>a year and buy N processors that are 2x as fast and do all the computation
>in the second of two years? If one can predict the speed of future
>processors, this might guide you whether you should wait for that single
>faster processor, or decide that no matter if you wait 3 years, you'll need
>more than the crunch of a single processor to solve your problem, so you
>might as well get cracking on the cluster.
You've hit the nail on the head! I think it's good to start thinking about
parallelizing your codes or your ideas and testing them on a small but
useful cluster. You can learn from them - find where the bottlenecks
are, try different approaches, try different filesystems even - and then
adjust your code/algorithm. You can also learn some interesting things
along the way, such as being able to tune a code for specific cluster
hardware (BTW - only clusters can do this). You can also learn how
clusters are put together, to some degree, which you can use later when
and if you want to buy a large production style cluster. This information
will help you make reasonable trade-offs and judgments about what
you want in the cluster. It will also help you keep the vendor honest :)
Then once the code has been tuned and you are ready for production
runs, get a bigger cluster, either by building one or by buying one from
a vendor, and have at it!
>Several times, I've contemplated a cluster to solve some problem, and then,
>by the time I had it all spec'd out and figured out and costed, it turned
>out that I'd been passed by AMD/Intel, and it was better just to go buy a
>(single) faster processor. There are some interesting power/MIPS trades
>that are non-obvious in this regime, as well as anomalous application
>environments where the development cycle is much slower (not too many "Rad
>Hard" Xeons out there).
>There are also inherently parallel kinds of tasks where you want to use
>commodity hardware to get multiples of some resource, rather than some
>special purpose thing (say, recording multi-track audio or the
>aforementioned video wall). Another thing is some sort of single input
>stream, multiple parallel processes for multiple outputs. High performance
>speech recognition might be an example.
I was working on an article about unique uses for clusters and I
interviewed a guy who was using a small cluster with OpenMOSIX
to rip his massive album/tape/CD collection into MP3's. He built
a nice automated system in his garage with a fairly large but
inexpensive storage system for his MP3's.
>What about some sort of search process with applicability to casual users
>(route finding for robotics or such...)
>>I also have an interest in seeing a cluster version of Octave or SciLab
>>set to work like a server. (as I recall rgb had some reasons not to use
>>these high level tools, but we can save this discussion for later)
>I'd be real interested in this... Mathworks hasn't shown much interest in
>accomodating clusters in the Matlab model, and I spend a fair amount of time
>running Matlab code.
Mathworks has a new toolbox that allows you to do parallel
computations. HOWEVER, the current version only allows
embarrassingly parallel operations - i.e. no nodal communication.
I've also followed some discussions on the Octave mailing list
about incorporating MPI. I don't think they've quite made it
there yet. I'd also like to see them incorporate something like
PLAPACK to create a "parallel" version of the computational
capabilities. If only I had some time to work on it.... :)
More information about the Beowulf