Gaussian (was: SOFTWARE PRICING FOR CLUSTERS)

Robert G. Brown rgb at phy.duke.edu
Thu Jul 25 16:02:35 PDT 2002


On Thu, 25 Jul 2002, Herbert Fruchtl wrote:

> Dean Johnson wrote:
> > I await the open source version of Gaussian. If it'll only cost like
> > $100k to develop... ;-)
> 
> It's called NWChem. And there are others that provide a subset of the 
> functionality (Gamess-US, for example). Gaussian itself is fairly 
> cheap for academics, even with source code.
> 
> You are right, of course, that the man-years invested into such 
> programs would add up to tens of millions of dollars rather than 100k.

Although my old daddy the economist taught me at a very early age the
difference between opportunity cost labor and real expenses.

He set me on his knee and said "Son, when I pay other folks to paint our
house, that's a real expense.  But when I make you paint our house for
the cost of the food I'm paying you anyway, that's opportunity cost
labor.  Now stop your bitchin' and get out there and paint!"

NWChem, PVM, MPI, Unix in general and Gnu/Linux in particular contain
BILLIONS of dollars worth of labor, but most of it is opportunity cost
labor (paid for by e.g. grant agencies who were funding the research
that relied on the tools anyway, for example, or written by graduate
students paid to do annoying tasks like actually implement a lovely
theory in code).  This is in contrast to most commercial code, which is
almost always paid for with real money.

The interesting question is where you draw the line.  If I write a
scientific application in three months of labor, using the GSL, GCC,
sundry open source toolkits from e.g. netlib, running on a linux box
(derived from Unix, which once upon a time cost some real money to
develop), while being "paid" by the physics department here to teach and
being partly subsidized by a grant to do research derived from the
application, how much does it cost?  Fair answers could range from
"nothing" (opportunity cost) to more money than was spent putting man on
the moon.

As was very wisely observed by Gerry, the answer ultimately comes down
to who's paying for it and good old fashioned cost-benefit analysis.  If
you write a proposal that involves doing quantum chemstry research, you
have to make something of a bet.  You fund salaries, you fund hardware,
you fund various kinds of support for the computations.  You generally
have a pretty accurate idea of how much the granting agency is likely to
fund as an upper bound for your particular scientific project.

In some cases it will come down to things like:  Hmmm, use a commercial
package of one sort or another with terrible cost scaling across the
cluster (e.g. doubling the cost per node including the software) and
live with half the nodes (doubling the time to complete the project,
publish, become famous, get tenure) OR half the graduate students
(doubling the amount of work for the remaining hands to do it, which can
ALSO double the time in addition to requiring a lot of YOUR time instead
of THEIR time) or get the maximum number of nodes and students one can
afford and invest some of the students' OC time in developing the
application from open source tools the best that they can?

This is by no means an imaginary scenario; it is played out every time a
group builds a beowulf at all.  The group is free to install a Windows
cluster of various flavors, or a Solaris cluster, or an Irix cluster.
In all of these cases, the cost per flop will double to triple BEFORE
getting around to choosing an application layer, but get -- maybe --
some "benefits" in terms of reduced personal time investment from the
shrink wrap and commercial support process.  Or they can use open source
Linux or perhaps FreeBSD, and live with investing more OC time to make
it all work (well, not so much any more , but five or six years ago the
bulk of the list traffic was devoted to DIRECTING the OC time that was
very definitely required:-)

Many of us are (still) here on this list because of the choice we made
then and continue to make now.  For us and our funding scenarios, we can
get money for hardware leveraged in part by our clearly demonstrated
competence in getting it all working efficiently on our problems WITHOUT
another $100K worth of software investment by the granting agency.  Or
we have a fixed budget and the ivory tower ideal of subsidized OC time
-- we teach, have a small grant to pay for hardware, and have the time
to develop the code ourselves but not the money to pay others for the
code, or at least not very much.

The strength of the open source "movement" is founded upon two things:
opportunity cost labor and cost scaling.  Linux development does cost
some real money; even if 80% of the labor is OC, there is some that
isn't.  And as noted, there are a LOT of FTE hours in Gnu/Linux and the
associated toolset! I personally view it as one of the greatest
accomplishments of mankind, quite literally dwarfing the pyramids and
the development of nuclear bombs in terms of sheer intellectual
investment.  I wasn't kidding about it comparing to the moon program --
in terms of sustained investment in time, computer systems from hardware
through firmware and up to software are arguably mankind's greatest and
most time-intensive achievement -- more time invested, and "expensive"
time at that, than nearly anything else we've touched.

But look at the benefits and how they scale!  I could NEVER write linux
from scratch.  Neither could Linus Torvalds.  A cast of thousands, nay,
tens of thousands, has written Linux -- maybe hundreds of thousands if
you consider the energy that went in to multics, unix, and all the other
early OS's and software suites that eventually found its firmly open
source realization in the Gnu/Linux/FreeBSD code base.

Still, at this point that energy runs millions of computers and enables
them to run billions to trillions of chores for hundreds of millions of
people.  We run it (and other OS software) because it is so CHEAP by the
time scaling is taken into account.  So what if one once had to
sometimes contribute a bit of extra time to management.  At this point,
a computer store I know of runs a linux server to install the
preconfigured windows systems they sell because it a) works and b)
scales so well!  

At this point, NOTHING can touch linux for ease of installation and
management -- the only factor that limits the number of systems our
linux sysadmins can handle is the rate of hardware failure!  It takes
less time for us to get a bare-nekked box, stick it on the network, and
kickstart it into our standard desktop configuration than it would to
install a pre-installed (but not configured) WinXX system!  Is it worth
it (in the long run) to contribute bug fixes, new GPL tools, and all the
rest of the OC time hard core linux users generally invest?  Absolutely.
If I contribute the code I write (that I need to write anyway) and
somebody else contributes THEIR code, we both get to use each other's
code.  Pretty soon that builds up to a LOT of code, mostly developed
with OC labor, cleaned up and packaged with a bit of real money, and
managed with a mix of real money and OC labor.

That's why I said -- tough sell for this particular list.  We KNOW about
the cost scaling of clusters and open source software because that is
the foundation of what we do.  Sure, we might buy software if we must,
but most of us will knee-jerk reject $50K packages out of hand even if
it buys a university wide site license unless its CBA is OVERWHELMINGLY
favorable and NO open source tool can even begin to do the task.

That is from experience, of course, each of us with our own.  I'd regale
the list with stories of the evolution of Duke's mail delivery system
(which began with large bodies on campus using such gems as cc-mail, to
give away the punchline) to an open source, open standard basis, but it
would be boringly familiar to most of us. In the short run, a
horrendously expensive institutional site (or cluster) license for a
proprietary solution may LOOK scalable and cheap, but historically, in
the long run open source open standard solutions end up being MUCH more
scalable and MUCH cheaper, even allowing for the OC user/admin
participation in fixing its occasional problems.  With open source
software, at least one CAN try to fix a problem instead of suffer
through very expensive downtime...

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list