The Future of Beowulf

By Donald Becker

Last month's column focused on the history and lessons of the Beowulf Project as we celebrated its 10-Year Anniversary. As you know, the initial goal was to get Beowulf clusters to work at all, demonstrating that applications written for high-end machines could be deployed on low-cost clusters of COTS (commercial off-the-shelf machines). Our next objective was to make these increasingly large collections of machines easy enough for non-scientists to install and maintain. Our challenge now is to make software easier to use, scale, and administer.

I envision a future where a site installs a single machine for each operating system type, and additional machines join the cluster rather than requiring independent installation. As more clusters are deployed and the underlying operating system becomes more complex, the importance of diagnostic and monitoring tools that identify the problem and point to their cause becomes more critical.

This column explores what is necessary for Beowulf to continue to thrive in the future technically, commercially, and philosophically.

Technical Challenges

One of the significant constraints on the Beowulf approach is the cooling and thermal dilemma. The concepts behind Moore's Law have made it possible to continually develop more powerful COTS processors which in turn generate more heat. The problem becomes, how do you cool the cluster without requiring refrigerated buildings?

Other constraints to be considered include:
  1. Interprocess communication latency
  2. Contention (i.e., lack of bandwidth)
  3. Starvation (i.e., lack of processing, slow input-output)

These problems must be solved by middleware, which leads us to the need for commercial opportunities so someone will invest the necessary time and expertise. Most universities and individual enthusiasts won't have the financial resources to build and test large installations and generic software unrelated to specific research projects.

Commercial Challenges

The reason I started Scyld Software and ultimately teamed up with Penguin Computing was that I recognized another hard fact of life. There needs to be a reasonable business model for Beowulf to evolve and succeed. Continued advancement requires we solve end user problems for more applications where the individuals involved cannot all conceptualize, architect, and code everything themselves in the manner of Beowulf pioneers.

I believe that by expanding the market opportunities for Beowulf "products" (application software, middleware, hardware) to a broader spectrum of HPC applications, the economic value of those who have invested time and effort in understanding and contributing to the knowledge pool will be increased.

Philosophical

I've noticed discussions about the Beowulf "cool factor" in some message boards. Part of what has given this movement so much energy is the passion and irreverence we've all brought to the party. The challenge when a "subversive" technology matures is how to maintain the enthusiasm. I don't pretend to have all the answers but I do know that the work by the commercial developers will be stronger if the broader community stays involved—testing, questioning, and trying to take advances to the next level. I'd invite you all to discuss these topics on the mail lists. To remain true to the promise of the open source philosophy while supporting value-added services and products is the toughest challenge we face.

September Column: Ten Years of Beowulf (a look back)
August Column: Beowulf.org Transformed