Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] New member, upgrading our existing Beowulf cluster

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Chris Samuel csamuel at vpac.org
Thu Dec 3 18:32:12 PST 2009


----- "Greg Lindahl" <lindahl at pbm.com> wrote:

> That kind of policy has a fairly high opportunity
> cost, even before you factor in linked nodes.

Well we cannot dictate to our users what they do,
we set a maximum walltime of 3 months and tell users
that they should checkpoint (if they have control of
the application and have coding skills).

> E.g. you see a system disk going bad, but the user
> will lose all their output unless the job runs for
> 4 more weeks...

We run SMART tests and the like trying to proactively
spot bad disks (and other hardware) prior to failures,
but yes, that's inevitable.

cheers,
Chris
-- 
Christopher Samuel - (03) 9925 4751 - Systems Manager
 The Victorian Partnership for Advanced Computing
 P.O. Box 201, Carlton South, VIC 3053, Australia
VPAC is a not-for-profit Registered Research Agency



More information about the Beowulf mailing list