[Beowulf] user stats on clusters
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at mcmaster.caFri Feb 27 14:25:31 PST 2009
- Previous message: [Beowulf] user stats on clusters
- Next message: [Beowulf] RE:small distro for PXE boot, autostarts sshd?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> A general question: What're folks using for stats, including queue wait, > execution times, hours/month? Any suggestions? we run ~20 clusters, some large, and collect all the stats to a single db, with a custom web interface, etc. users and PI's can see tables and graphs of usage. we don't by default do anything with per-job pend times, though it's there. we also don't do anything with hours/month - the closest would be graphs which show ncpus across time (ie, if over the past 2 weeks, the y-axis would probably be cpu-hours-per-hour, summed over all jobs, but possibly partitioned by user/cluster/queue/etc). I don't know how much this code/etc would be of interest to anyone else. I at least, have not talked to other cluster people who have quite the same take on issues. for instance, each of our jobs is stored with user (we have a single ldap), command, cluster, queue, flags, pend time, seconds allocated/utime/stime. users are either sponsor (PI) or sponsored, and there's another level of ID intended to harmonize with a pan-Canadian "people" database. the current database receives job info from a variety of schedulers - RMS on our original Alphas, LSF, my opensource minimalist scheduler, torque/maui and SGE. having a comprehensive DB like this has led to some interesting optimizations having to do with shipping batches of job records around (cron, ssh, rsync, etc), or ways of binning usage to make it feasible to generate dynamic graphs of usage. if you're OK with a per-cluster interface, aren't nagios and similar packages pretty interchangable?
- Previous message: [Beowulf] user stats on clusters
- Next message: [Beowulf] RE:small distro for PXE boot, autostarts sshd?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
