[Beowulf] New member, upgrading our existing Beowulf cluster
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Prentice Bisbal prentice at ias.eduTue Dec 8 07:50:28 PST 2009
- Previous message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Next message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ Bill Broadley wrote: > Greg Lindahl wrote: >> On Fri, Dec 04, 2009 at 12:57:07PM +1100, Chris Samuel wrote: >> >>> If you've got a job running on there for a month >>> or two then there's a fairly high opportunity cost >>> involved. >> That kind of policy has a fairly high opportunity cost, even before >> you factor in linked nodes. E.g. you see a system disk going bad, but >> the user will lose all their output unless the job runs for 4 more >> weeks... > > Indeed. You'd hope that such long running jobs would checkpoint. You'd hope that. Most of my current clusters users are scientific researchers in academia, not computer scientists. While some are extremely computer savvy, others have learned just enough about programming to do their calculations. Expecting the latter to write code with checkpointing is unrealistic, and working in academia, I can't force them to. Which is why taking down 4 nodes instead of just one is less than ideal. -- Prentice
- Previous message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Next message: [Beowulf] New member, upgrading our existing Beowulf cluster
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
