[Beowulf] Cluster Metrics? (Upper management view)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Michael Di Domenico mdidomenico4 at gmail.comFri Aug 20 11:26:51 PDT 2010
- Previous message: [Beowulf] Cluster Metrics? (Upper management view)
- Next message: [Beowulf] Cluster Metrics? (Upper management view)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I think measuring a clusters success based on the number of jobs run or cpu's used is a bad measure of true success. I would be more inclined to consider a cluster a success by speaking with the people who use it and find out not only whether they can use it effectively and/or what new science having cluster is being enabled by them. then only thing i find most of the below metrics overly useful is figuring out whether or not we need a bigger cluster. which i guess is a form of measurable success, but not one in which i would consider the "cluster" to be a success. it could just be dopes running thousands of "/bin/hostname" jobs trying to figure out how to use the cluster I also think you need to ask the "business" people what measure they would consider a cluster as a worthwhile investment, it doesn't sound as if you have that from your email. On Fri, Aug 20, 2010 at 1:34 PM, Stuart Barkley <stuartb at 4gh.net> wrote: > What sort of business management level metrics do people measure on > clusters? Upper management is asking for us to define and provide > some sort of "numbers" which can be used to gage the success of our > cluster project. > > We currently have both SGE and Torque/Moab in use and need to measure > both if possible. > > I can think of some simple metrics (well sort-of, actual technical > definition/measurement may be difficult): > > - 90/95th percentile wait time for jobs in various queues. Is smaller > better meaning the jobs don't wait long and users are happy? Is > larger better meaning that we have lots of demand and need more > resources? > > - core-hours of user computation (per queue?) both as raw time and > percentage of available time. Again, which is better (management > view) higher or lower? > > - Availability during scheduled hours (ignoring scheduled maintenance > times). Common metric, but how do people actually measure/compute > this? What about down nodes? Some scheduled percentage (5%?) assumed > down? > > - Number of new science projects performed. Vague, but our > applications support people can just count things occasionally. > Misses users who just use the system without interaction with us. > Misses "production" work that just keeps running. > > Any comments or ideas are welcome. > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: [Beowulf] Cluster Metrics? (Upper management view)
- Next message: [Beowulf] Cluster Metrics? (Upper management view)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
