Frequently Asked Questions
This FAQ is intended to forestall the repetitive questions on the Beowulf mailing list. This section takes five minutes to read; please read it before posting!
- What's a Beowulf?
- What are Beowulf systems being used for these days?
- Where can I get the Beowulf software?
- Can I take my software and run it on a Beowulf and have it go faster?
- PVM? MPI? Huh?
- Is there a compiler that will automatically parallelize my code for a Beowulf, like SGI's compilers?
- Why do people use Beowulfs?
- Do people use keyboard-video-mouse switches?
- Who are the experts?
- Does anyone have a Linux compiler that recognizes bits of code that could be optimized with KNI, 3DNow!, and MMX instructions?
- Do I need to run Red Hat?
- I'm using the Extreme Linux CD . . .
- Does Beowulf need glibc?
- What's the most important: CPU speed, memory speed, memory size, what CPU should I use? . . .
- Can I make a Beowulf out of different kinds of machines?
- Where do I go for more information?
- Where does the name Beowulf originate?
Don Becker, Robert G. Brown, Greg Lindahl, Forrest Hoffman, and Putchong Uthayopas, and Kragen Sitaker contributed valuable information to this FAQ.
1. What's a Beowulf?
Beowulf Clusters are scalable performance clusters based on commodity hardware, on a private system network, with open source software (Linux) infrastructure.
Each consists of a cluster of PCs or workstations dedicated to running high-performance computing tasks. The nodes in the cluster don't sit on people's desks; they are dedicated to running cluster jobs. It is usually connected to the outside world through only a single node.
Some Linux clusters are built for reliability instead of speed. These are not Beowulfs.
2. What are Beowulf systems being used for these days?
Traditional technical applications such as simulations, biotechnology, and petro-clusters; financial market modeling, data mining and stream processing; and Internet servers for audio and games.
3. Where can I get the Beowulf software?
There isn't a software package called "Beowulf". There are, however,several pieces of software many people have found useful for building Beowulfs. None of them are essential. They include MPICH, LAM, PVM, the Linux kernel, the channel-bonding patch to the Linux kernel (which lets you 'bond' multiple Ethernet interfaces into a faster 'virtual' Ethernet interface) and the global pid space patch for the Linux kernel (whichlets you see all the processes on your Beowulf with ps, and eliminate them), DIPC (which lets you use sysv shared memory and semaphores and message queues transparently across a cluster).
4. Can I take my software and run it on a Beowulf and have it go faster?
Don Becker: Perhaps though it may require modification. You need to split it into parallel tasks that communicate using MPI or PVM or network sockets or SysV IPC. Then you need to recompile it.
Greg Lindahl: If you just want to run the same program a few thousand times with different input files, a shell script will suffice.
Christopher Bohn: even multi-threaded software won't automatically get a speedup as multi-threaded software assumes shared-memory. There are some distributed shared memory packages under development (DIPC, Mosix, ...), but the memory access patterns in software written for an SMP machine could potentially result in a *loss* of performance on a DSM machine.
5. What are PVM and MPI?
PVM and MPI are software systems that allow you to write message-passing parallel programs that run on a cluster, in Fortran and C. PVM used to be the standard until MPI appeared. MPI (Message Passing Interface) is the standard for portable message-passing parallel programs standardized by the MPI Forum and available on all massively-parallel supercomputers.
More information can be found in the MPI Forum.
6. Is there a compiler that will automatically parallelize my code for a Beowulf, like SGI's compilers?
No. BERT from plogic.com which will help you manually parallelize your Fortran code. And NAG's and Portland Group's Fortran compilers can also build parallel versions of your Fortran code, given some hints from you (in the form of HPF and OpenMP directives). These versions may not run any faster than the non-parallel versions.
7. Why do people use Beowulfs?
Either because they think they're cool or because they get supercomputer performance on some problems for a third to a tenth the price of a traditional supercomputer.
8. Do people use keyboard-video-mouse switches?
Most people don't because they don't need them. Since they're running Linux, they can just telnet to any machine anyway unless it's broken. Lots of Beowulfs don't even have video cards in every node. Console access is generally only needed when the box is so broken it won't boot. Some people use serial ports instead even for this.
9. Who are the experts?
Don Becker, Robert G. Brown, Rob Nelson, Walter B. Ligon, Putchong Uthayopas, Christopher Bohn, Greg Lindahl, Doug Eadline, Eugene Leitl, Gerry Creager, and William Rankin are generally thoughtful and well-informed, as well as frequently willing to help.
10. Does anyone have a Linux compiler that recognizes bits of code that could be optimized with KNI, 3DNow!, and MMX instructions?
No, though PentiumGCC has some support for this.
11. Do I need to run Red Hat?
No. Indeed, the original Beowulf ran Slackware.
12. I'm using the Extreme Linux CD.
Don't -- it's way out of date.
13. Does Beowulf need glibc?
No. But if you want to run a particular application on a libc5-based beowulf, make sure it compiles and works with libc5. Similarly if you want to run a particular application on a glibc-based beowulf, make sure it compiles and works with glibc.
It is not recommended to configure different nodes differently in software; that's a headache.
14. What's the most important: CPU speed, memory speed, memory size, cache size, disk speed, disk size, or network bandwidth? Should I use dual-CPU machines? Should I use Alphas, PowerPCs, ARMs, or x86s? Should I use Xeons? Should I use Fast Ethernet, Gigabit Ethernet, Myrinet, SCI, FDDI? Should I use Ethernet switches or hubs?
IT ALL DEPENDS ON YOUR APPLICATION!!!
Benchmark, profile, find the bottleneck, fix it, repeat.
Some people have reported that dual-CPU machines scale better than single-CPU machines because your computation can run uninterrupted on one CPU while the other CPU handles all the network interrupts.
15. Can I make a Beowulf out of different kinds of machines -- single-processor, dual-processor, 200MHz, 400MHz, etc.?
Sure. Splitting up your application optimally gets a little harder but it's not infeasible.
16. Where to go for more information?
- Cluster Foundry at Sourceforge
- The archive and overview sections on beowulf.org
- The email@example.com mailing list
17. Where does the name Beowulf originate?
Beowulf is the earliest surviving epic poem written in English. It is a story about a hero of great strength and courage who defeated a monster called Grendel.
"Famed was this Beowulf: far flew the boast of him, son of Scyld, in the Scandian lands. So becomes it a youth to quit him well with his father's friends, by fee and gift, that to aid him, aged, in after days, come warriors willing, should war draw nigh, liegemen loyal: by lauded deeds shall an earl have honor in every clan."