Uptime data/studies/anecdotes ... ?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Roger L. Smith roger at ERC.MsState.EduTue Apr 2 08:15:00 PST 2002
- Previous message: Uptime data/studies/anecdotes ... ?
- Next message: Uptime data/studies/anecdotes ... ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
We currently run an average of about 75% utilization on our 586 processor (293 node) cluster. We probably have about one node per week crash and hang for various reasons. We have occasional problems with memory leaks or PBS hangups which require large scale reboots of the cluster. (Actually, PBS just died as I'm typing this, but our pbs heartbeat script should restart it automatically in a few minutes). I'd say we have to do a full reboot of the cluster about every 3-4 months. For a bunch of PC hardware running a free OS, this seems like a pretty good number to me. It's not in the same class as our Sun servers (nor even our SGIs!), but then, none of those systems are this large, either. On Tue, 2 Apr 2002, Richard Walsh wrote: > > All, > > What information is available on typical uptimes > of large-scale, clusters ... say greater than 256 > processors and running a multi-user workload. What > gains do single-point-of-administration tools like > SCYLD provide? Clearly, there are a great number > of things one can do to maximize uptime/utilization > (not the same thing really). What are the essentials > from the lists point of view? > > If a good figure is, say, 80% utilization over a > 8760 hour year today, what will this number be in > three years? Annual utilization for the 1088 processor > T3E we run here is about 95%. How long until a similarly > sized cluster typically yields the same value? > > Regards, > > rbw > > #--------------------------------------------------- > # > # Richard Walsh > # Project Manager, Cluster Computing, Computational > # Chemistry and Finance > # netASPx, Inc. > # 1200 Washington Ave. So. > # Minneapolis, MN 55415 > # VOX: 612-337-3467 > # FAX: 612-337-3400 > # EMAIL: rbw at networkcs.com, richard.walsh at netaspx.com > # > #--------------------------------------------------- > # "What you can do, or dream you can, begin it; > # Boldness has genius, power, and magic in it." > # -Goethe > #--------------------------------------------------- > # "Without mystery, there can be no authority." > # -Charles DeGaulle > #--------------------------------------------------- > # "Why waste time learning when ignornace is > # instantaneous?" -Thomas Hobbes > #--------------------------------------------------- > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > _\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_\|/_ | Roger L. Smith Phone: 662-325-3625 | | Systems Administrator FAX: 662-325-7692 | | roger at ERC.MsState.Edu http://WWW.ERC.MsState.Edu/~roger | | Mississippi State University | |_______________________Engineering Research Center_______________________|
- Previous message: Uptime data/studies/anecdotes ... ?
- Next message: Uptime data/studies/anecdotes ... ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
