How to Build a Beowulf

There are quite a few books and online descriptions of how to configure your own Beowulf cluster. Don Becker recommends the work of Robert G. Brown, a professor in the Duke University Physics Department. His online book is in a perpetual state of being rewritten and updated as new technologies and ideas are introduced. Follow the link to read the latest version.

Engineering a Beowulf-style Compute Cluster
Robert G. Brown
Duke University Physics Department
Durham, NC 27708-0305

Building and Maintaining a Beowulf

Selection from Engineering a Beowulf-style Compute Cluster
Copyright Robert G. Brown, 24 May 2004

One question that is commonly enough asked on the beowulf list is "How hard is it to build or care for a beowulf?"

Mind you, it is quite possible to go into beowulfery with no more than a limited understanding of networking, a handful of machines (or better, a pocketful of money) and a willingness to learn, and over the years I've watched and sometimes helped as many groups and individuals (including myself) in many places went from a state of near-total ignorance to a fair degree of expertise on little more than guts and effort.

However, this sort of school is the school of hard (and expensive!) knocks; one ought to be able to do better and not make the same mistakes and reinvent the same wheels over and over again, and this book is an effort to smooth the way so that you can.

One place that this question is often asked is in the context of trying to figure out the human costs of beowulf construction or maintenance, especially if your first cluster will be a big one and has to be right the first time. After all, building a cluster of more than 16 or so nodes is an increasingly serious proposition. It may well be that beowulfs are ten times cheaper than a piece of "big iron'' of equivalent power (per unit of aggregate compute power by some measure), but what if it costs ten times as much in human labor to build or run? What if it uses more power or cooling? What if it needs more expensive physical infrastructure of any sort?

These are all very valid concerns, especially in a shop with limited human resources or with little linux expertise or limited space, cooling, power. Building a cluster with four nodes, eight nodes, perhaps even sixteen nodes can often be done so cheaply that it seems ''free'' because the opportunity cost for the resources required are so minimal and the benefits so much greater than the costs. Building a cluster of 256 nodes without thinking hard about cost issues, infrastructure, and cost-benefit analysis is very likely to have a very sad outcome, the least of which is that the person responsible will likely lose their job.

If that person (who will be responsible) is you, then by all means read on. I cannot guarantee that the following sections will keep you out of the unemployment line, but I'll do my best.

Back to top