From vgregorio at penguincomputing.com Tue May 4 11:26:40 2010 From: vgregorio at penguincomputing.com (Victor Gregorio) Date: Tue, 4 May 2010 11:26:40 -0700 Subject: [Beowulf] Test - Please Ignore Message-ID: <20100504182639.GK19686@olive.penguincomputing.com> Test - Please Ignore. -- Victor Gregorio Penguin Computing From jlforrest at berkeley.edu Thu May 6 17:41:15 2010 From: jlforrest at berkeley.edu (Jon Forrest) Date: Thu, 06 May 2010 17:41:15 -0700 Subject: [Beowulf] Best Way to Use 48-cores For Undergrad Cluster? Message-ID: <4BE361AB.5090706@berkeley.edu> Let's say you were going to set up a cluster for undergraduates to learn how to use SGE and run typical chemistry applications. Ultimate performance is not the primary goal. Let's say you get charged for rack space in your data center and there's very little budget to pay for space. I see you can now get 48-cores in one 1U box. What do you think about running all the compute nodes as 1-core virtual machines on the one box? Or, would you just run the machine with one OS and a SGE queue with 47 slots (with 1 core for the frontend)? This is probably not the kind of environment most of you run in, but it does present interesting issues. Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From joshua_mora at usa.net Thu May 6 18:22:38 2010 From: joshua_mora at usa.net (Joshua mora acosta) Date: Thu, 06 May 2010 20:22:38 -0500 Subject: [Beowulf] Best Way to Use 48-cores For Undergrad Cluster? Message-ID: <437oegBVM4506S08.1273195358@cmsweb08.cms.usa.net> If you do virtualization you may want at least to pin the guest OS to each core and provide a "quota" of main memory local to that core to that guest OS. In other words avoid remote accesses. In that way you can at least guarantee some quality of service (capacity, performance, security). You could also have other queues that are bigger that would use more cores with proportional amount of memory also local if resources are available. You could even go further by "carving" the number of cores and memory amount and the proper pinning provided when you submit the job it reads those specifications and then if those HW resources are available then you would start that guest OS with those specifications and then run that workload. This last thing would be the dynamic provisioning. Another good experiment would be the migration from one guest OS to another guest OS (checkpoint on guest OS 1 and restart on guest OS 2). You can "carve" resources also based on tasks. For instance you may need some good preprocessing capability in terms of amount of RAM but just single core. If having to "move" around memory for cores that are located on remote die then you want to configure that region as memory interleaved. Finally, knowing a bit about how the application stresses the HW you may want to "carve" those resources in such a way that they do not produce "congestion" on subsystems such as memory controller so other task being run on same die by other guest OS do not suffer from the contention originated by the other very demanding task running in the other guest OS. It is certainly a great exercise to increase productivity rather than increase performance of single tasks. Another idea is to create queues that would be configured as low power consumption where you would downclock the cores as much as you can without affecting the others. And before I forget, every device hanging from chipset (Eth, IB NICs,GPUs) can be also virtualized thanks to IOMMU features. Best regards, Joshua Mora. ------ Original Message ------ Received: 07:57 PM CDT, 05/06/2010 From: Jon Forrest To: "beowulf at beowulf.org" Cc: Subject: [Beowulf] Best Way to Use 48-cores For Undergrad Cluster? > Let's say you were going to set up a cluster > for undergraduates to learn how to use > SGE and run typical chemistry applications. > Ultimate performance is not the primary goal. > Let's say you get charged for rack space > in your data center and there's very little > budget to pay for space. > > I see you can now get 48-cores in one 1U box. > What do you think about running all the > compute nodes as 1-core virtual machines on > the one box? Or, would you just run > the machine with one OS and a SGE > queue with 47 slots (with 1 core for the frontend)? > > This is probably not the kind of environment > most of you run in, but it does present > interesting issues. > > Cordially, > -- > Jon Forrest > Research Computing Support > College of Chemistry > 173 Tan Hall > University of California Berkeley > Berkeley, CA > 94720-1460 > 510-643-1032 > jlforrest at berkeley.edu > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From hearnsj at googlemail.com Thu May 6 22:22:44 2010 From: hearnsj at googlemail.com (John Hearns) Date: Fri, 7 May 2010 06:22:44 +0100 Subject: [Beowulf] Best Way to Use 48-cores For Undergrad Cluster? In-Reply-To: <437oegBVM4506S08.1273195358@cmsweb08.cms.usa.net> References: <437oegBVM4506S08.1273195358@cmsweb08.cms.usa.net> Message-ID: Joshua mentions 'pinning' the guest OS - which sounds interesting and we should hear more about that if possible. If you go the route of having one machine with many cores, borrowing a technique from big NUMA you could look at cpusets. And hey, let's be clear - we're talking 48 cores and potentially 100's of gigabytes of RAM in these single boxes - they ARE big NUMA systems. What goes around comes around etc. I know that PBS integrates well with cpusets on the Altix, SGE should also. Googling turns up an interesting paper: http://tinyurl.com/37rq8w9 From john.hearns at mclaren.com Fri May 7 05:31:47 2010 From: john.hearns at mclaren.com (Hearns, John) Date: Fri, 7 May 2010 13:31:47 +0100 Subject: [Beowulf] Liquid cooling from 3M Message-ID: <68A57CCFD4005646957BD2D18E60667B10512636@milexchmb1.mil.tagmclarengroup.com> As this list always seems to be interested in liquid cooling, here's a webinar from the Register on some new liquid cooling from 3M, called passive 2-phase immersion cooling: http://www.theregister.co.uk/2010/05/07/new_data_center_cooling/ The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. From robl at mcs.anl.gov Mon May 3 10:38:36 2010 From: robl at mcs.anl.gov (Rob Latham) Date: Mon, 3 May 2010 12:38:36 -0500 Subject: [Beowulf] [hpc-announce] CFP: Workshop on Interfaces and Abstractions for Scientific Data Storage (IASDS 2010) Message-ID: <20100503173836.GH7716@mcs.anl.gov> CALL FOR PAPERS: IASDS 2010 (http://www.mcs.anl.gov/events/workshops/iasds10/) In conjunction with IEEE Cluster 2010 (http://www.cluster2010.org/) High-performance computing simulations and large scientific experiments such as those in high energy physics generate tens of terabytes of data, and these data sizes grow each year. Existing systems for storing, managing, and analyzing data are being pushed to their limits by these applications, and new techniques are necessary to enable efficient data processing for future simulations and experiments. This workshop will provide a forum for engineers and scientists to present and discuss their most recent work related to the storage, management, and analysis of data for scientific workloads. Emphasis will be placed on forward-looking approaches to tackle the challenges of storage at extreme scale or to provide better abstractions for use in scientific workloads. TOPICS OF INTEREST: Topics of interest include, but are not limited to: - parallel file systems - scientific databases - active storage - scientific I/O middleware - extreme scale storage PAPER SUBMISSION Workshop papers will be peer-reviewed and will appear as part of the IEEE Cluster 2010 proceedings. Submissions must follow the Cluster 2010 format: PDF files only. Maximum 10 pages. Single-spaced 8.5x11-inch, Two-column numbered pages in IEEE Xplore format IMPORTANT DATES: Paper Submission Deadline: June 21, 2010 Author Notification: July 16, 2010 Final Manuscript: July 30, 2010 Workshop: September 24, 2010 PROGRAM COMMITTEE: Program Committee Robert Latham, Argonne National Laboratory Quincey Koziol, The HDF Group Pete Wyckoff, Netapp Wei-Keng Liao, Northwestern University Florin Isalia, Universidad Carlos III de Madrid Katie Antypas, NERSC Anshu Dubey, FLASH Dean Hildebrand, IBM Almaden Bradley Settlemyer, Oak Ridge National Laboratory -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA From brian.ropers.huilman at gmail.com Fri May 7 08:49:00 2010 From: brian.ropers.huilman at gmail.com (Brian D. Ropers-Huilman) Date: Fri, 7 May 2010 10:49:00 -0500 Subject: [Beowulf] Liquid cooling from 3M In-Reply-To: <68A57CCFD4005646957BD2D18E60667B10512636@milexchmb1.mil.tagmclarengroup.com> References: <68A57CCFD4005646957BD2D18E60667B10512636@milexchmb1.mil.tagmclarengroup.com> Message-ID: On Fri, May 7, 2010 at 07:31, Hearns, John wrote: > As this list always seems to be interested in liquid cooling, > here's a webinar from the Register on some new liquid cooling from 3M, > called passive 2-phase immersion cooling: > > http://www.theregister.co.uk/2010/05/07/new_data_center_cooling/ > All, we have been in conversations with 3M about this technology and were even written into a DoE grant with them for potential testing of such a system. I'm still hoping we can move forward with a prototype system regardless of grant status. There technology is exciting. There is another Minnesota-based company, Hardcore Computers http://http://www.hardcorecomputer.com/, who also make a liquid immersed system that we hope to be testing soon as well. -- Brian D. Ropers-Huilman, Director Systems Administration and Technical Operations Minnesota Supercomputing Institute 599 Walter Library +1 612-626-5948 (V) 117 Pleasant Street S.E. +1 612-624-8861 (F) University of Minnesota Twin Cities Campus Minneapolis, MN 55455-0255 http://www.msi.umn.edu/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From prentice at ias.edu Mon May 10 12:43:04 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 10 May 2010 15:43:04 -0400 Subject: [Beowulf] looking for good distributed shell program Message-ID: <4BE861C8.9050400@ias.edu> Beowulfers, I'm looking for something that isn't exactly cluster-related, but this is something that most cluster admins would be familiar with. I'm looking for a good distributed shell, something similar to tentakel or gsh. I figure all of you probably have recommendations/opinions on the best ones. I'm familiar with tentakel, but I find it lacking in a few areas, and it's recently been abandoned by it's developer. The author of tentakel recommends gsh, but gsh doesn't allow to create pre-defined groups of hosts in a config file. Here's my wish list: 1. Be able to maintain a central config file with different group definitiosn with in it. 2. Run the commands in parallel and organize the output 3. Be able specify the user the command runs as on the command-line, so I don't have to become root just to run a single command as root. 4. Be able to subtract systems from a group or add additional ones on the commandline. For example, if I have group "cluster", but node05 is down, so I want to omit it and add desktop1 instead, I could do something like. -g cluster-node05+desktop1 I used a program with these features about 10 years ago. I think it was gsh or dsh, but the gsh and dsh I've found today, are different than what I used 10 years ago. any recommendations? -- Prentice From prentice at ias.edu Mon May 10 13:54:04 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 10 May 2010 16:54:04 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE861C8.9050400@ias.edu> References: <4BE861C8.9050400@ias.edu> Message-ID: <4BE8726C.4040009@ias.edu> I think I found what I was looking for. Not the gsh the tentakel author recommends, http://guichaz.free.fr/gsh/ but this one: http://outflux.net/unix/software/gsh/ It has everything I was looking for (so far). Prentice Prentice Bisbal wrote: > Beowulfers, > > I'm looking for something that isn't exactly cluster-related, but this > is something that most cluster admins would be familiar with. I'm > looking for a good distributed shell, something similar to tentakel or > gsh. I figure all of you probably have recommendations/opinions on the > best ones. > > I'm familiar with tentakel, but I find it lacking in a few areas, and > it's recently been abandoned by it's developer. The author of tentakel > recommends gsh, but gsh doesn't allow to create pre-defined groups of > hosts in a config file. > > Here's my wish list: > > 1. Be able to maintain a central config file with different group > definitiosn with in it. > > 2. Run the commands in parallel and organize the output > > 3. Be able specify the user the command runs as on the command-line, so > I don't have to become root just to run a single command as root. > > 4. Be able to subtract systems from a group or add additional ones on > the commandline. For example, if I have group "cluster", but node05 is > down, so I want to omit it and add desktop1 instead, I could do > something like. > > -g cluster-node05+desktop1 > > I used a program with these features about 10 years ago. I think it was > gsh or dsh, but the gsh and dsh I've found today, are different than > what I used 10 years ago. > > any recommendations? > > From brs at usf.edu Mon May 10 20:03:50 2010 From: brs at usf.edu (Brian Smith) Date: Mon, 10 May 2010 23:03:50 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE8726C.4040009@ias.edu> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> Message-ID: <1273547030.6950.2.camel@voltaire> Have you ever seen pdsh? https://computing.llnl.gov/linux/pdsh.html Compile it against the genders library (also provided by llnl) and you have all of the features you need. -Brian -- Brian Smith Senior Systems Administrator IT Research Computing, University of South Florida 4202 E. Fowler Ave. ENB308 Office Phone: +1 813 974-1467 Organization URL: http://rc.usf.edu On Mon, 2010-05-10 at 16:54 -0400, Prentice Bisbal wrote: > I think I found what I was looking for. Not the gsh the tentakel author > recommends, > > http://guichaz.free.fr/gsh/ > > but this one: > > http://outflux.net/unix/software/gsh/ > > It has everything I was looking for (so far). > > Prentice > > > Prentice Bisbal wrote: > > Beowulfers, > > > > I'm looking for something that isn't exactly cluster-related, but this > > is something that most cluster admins would be familiar with. I'm > > looking for a good distributed shell, something similar to tentakel or > > gsh. I figure all of you probably have recommendations/opinions on the > > best ones. > > > > I'm familiar with tentakel, but I find it lacking in a few areas, and > > it's recently been abandoned by it's developer. The author of tentakel > > recommends gsh, but gsh doesn't allow to create pre-defined groups of > > hosts in a config file. > > > > Here's my wish list: > > > > 1. Be able to maintain a central config file with different group > > definitiosn with in it. > > > > 2. Run the commands in parallel and organize the output > > > > 3. Be able specify the user the command runs as on the command-line, so > > I don't have to become root just to run a single command as root. > > > > 4. Be able to subtract systems from a group or add additional ones on > > the commandline. For example, if I have group "cluster", but node05 is > > down, so I want to omit it and add desktop1 instead, I could do > > something like. > > > > -g cluster-node05+desktop1 > > > > I used a program with these features about 10 years ago. I think it was > > gsh or dsh, but the gsh and dsh I've found today, are different than > > what I used 10 years ago. > > > > any recommendations? > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From prentice at ias.edu Tue May 11 06:30:57 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 11 May 2010 09:30:57 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <2969652-1273537199-cardhu_decombobulator_blackberry.rim.net-1971957162-@bda517.bisx.prod.on.blackberry> References: <4BE861C8.9050400@ias.edu> <2969652-1273537199-cardhu_decombobulator_blackberry.rim.net-1971957162-@bda517.bisx.prod.on.blackberry> Message-ID: <4BE95C11.4070400@ias.edu> Derek, Thanks for the reply. I have read articles on xCAT and was always intereted in it, I just never got around to actually installing it and trying it myself. You would still need to authenticate to run commands as root. Where ssh is used, this would require the admin to have a key that's in root's authorized_keys file to run a remote command as root. I think that's safer to be able to specify the user to execute the commands as (including root), than to require they log in as root on the local machine first, where they can forget to logout and continue to execute commands as root. Also, it's nice to specify that you want to run a command as root - it eliminates the need to run ALL commands as root. Just checking uptime or kernel version, for example, shouldn't require root privileges, but restarting a daemon on a server should always require root privileges. -- Prentice derekr42 at gmail.com wrote: > Prentice, > In case you haven't dealt with it before or were put off by the fact that parts of xCAT from IBM used to be proprietary, this cluster toolkit contains an excellent parallel shell (psh) that meets most of your requirements, except for the parsing of output, which follows *nix traditions by leaving it to the user's discretion of how to process output with tools such as shell, awk/sed, perl, etc. If you don't want to use the whole toolkit, simply install the parts you want and set up the config file. It's worked well for in managing any large number of machine. > And I think it's a bad idea to run root commands w/out having to authenticate in some manner. Too much security and responsibility issues. > Best of luck! > Derek R. > Sent from my Verizon Wireless BlackBerry > > -----Original Message----- > From: Prentice Bisbal > Date: Mon, 10 May 2010 15:43:04 > To: Beowulf Mailing List > Subject: [Beowulf] looking for good distributed shell program > > Beowulfers, > > I'm looking for something that isn't exactly cluster-related, but this > is something that most cluster admins would be familiar with. I'm > looking for a good distributed shell, something similar to tentakel or > gsh. I figure all of you probably have recommendations/opinions on the > best ones. > > I'm familiar with tentakel, but I find it lacking in a few areas, and > it's recently been abandoned by it's developer. The author of tentakel > recommends gsh, but gsh doesn't allow to create pre-defined groups of > hosts in a config file. > > Here's my wish list: > > 1. Be able to maintain a central config file with different group > definitiosn with in it. > > 2. Run the commands in parallel and organize the output > > 3. Be able specify the user the command runs as on the command-line, so > I don't have to become root just to run a single command as root. > > 4. Be able to subtract systems from a group or add additional ones on > the commandline. For example, if I have group "cluster", but node05 is > down, so I want to omit it and add desktop1 instead, I could do > something like. > > -g cluster-node05+desktop1 > > I used a program with these features about 10 years ago. I think it was > gsh or dsh, but the gsh and dsh I've found today, are different than > what I used 10 years ago. > > any recommendations? > > From prentice at ias.edu Tue May 11 06:32:49 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 11 May 2010 09:32:49 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <1273547030.6950.2.camel@voltaire> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> Message-ID: <4BE95C81.908@ias.edu> That's the 3rd or 4th vote for pdsh. I guess I better take a good look at at. Thanks. Brian Smith wrote: > Have you ever seen pdsh? https://computing.llnl.gov/linux/pdsh.html > > Compile it against the genders library (also provided by llnl) and you > have all of the features you need. > > -Brian > From landman at scalableinformatics.com Tue May 11 06:46:44 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 11 May 2010 09:46:44 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE95C81.908@ias.edu> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> Message-ID: <4BE95FC4.6040307@scalableinformatics.com> Prentice Bisbal wrote: > That's the 3rd or 4th vote for pdsh. I guess I better take a good look > at at. Allow me to 5th pdsh. We don't install clusters without it. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From prentice at ias.edu Tue May 11 06:53:04 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 11 May 2010 09:53:04 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE95FC4.6040307@scalableinformatics.com> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> <4BE95FC4.6040307@scalableinformatics.com> Message-ID: <4BE96140.1000105@ias.edu> Allowed. ;) Joe Landman wrote: > Prentice Bisbal wrote: >> That's the 3rd or 4th vote for pdsh. I guess I better take a good look >> at at. > > Allow me to 5th pdsh. We don't install clusters without it. > > -- Prentice From a.travis at abdn.ac.uk Tue May 11 07:00:06 2010 From: a.travis at abdn.ac.uk (Tony Travis) Date: Tue, 11 May 2010 15:00:06 +0100 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE95FC4.6040307@scalableinformatics.com> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> <4BE95FC4.6040307@scalableinformatics.com> Message-ID: <4BE962E6.7070004@abdn.ac.uk> On 11/05/10 14:46, Joe Landman wrote: > Prentice Bisbal wrote: >> That's the 3rd or 4th vote for pdsh. I guess I better take a good look >> at at. > > Allow me to 5th pdsh. We don't install clusters without it. Hello, Joe and Prentice. We use Dancer's shell "dsh": http://www.netfort.gr.jp/~dancer/software/dsh.html.en Bye, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From bug at sas.upenn.edu Tue May 11 07:22:40 2010 From: bug at sas.upenn.edu (Gavin Burris) Date: Tue, 11 May 2010 10:22:40 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE962E6.7070004@abdn.ac.uk> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> <4BE95FC4.6040307@scalableinformatics.com> <4BE962E6.7070004@abdn.ac.uk> Message-ID: <4BE96830.10301@sas.upenn.edu> On 05/11/2010 10:00 AM, Tony Travis wrote: > We use Dancer's shell "dsh": > http://www.netfort.gr.jp/~dancer/software/dsh.html.en As do I. Highly recommended. From tjrc at sanger.ac.uk Tue May 11 08:14:36 2010 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Tue, 11 May 2010 16:14:36 +0100 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE962E6.7070004@abdn.ac.uk> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> <4BE95FC4.6040307@scalableinformatics.com> <4BE962E6.7070004@abdn.ac.uk> Message-ID: <3A31FEA5-BCC8-401D-9229-FAD5BDCA8841@sanger.ac.uk> On 11 May 2010, at 3:00 pm, Tony Travis wrote: > On 11/05/10 14:46, Joe Landman wrote: >> Prentice Bisbal wrote: >>> That's the 3rd or 4th vote for pdsh. I guess I better take a good look >>> at at. >> >> Allow me to 5th pdsh. We don't install clusters without it. > > Hello, Joe and Prentice. > > We use Dancer's shell "dsh": > > http://www.netfort.gr.jp/~dancer/software/dsh.html.en Second that recommendation - we use that one too. It's pre-packaged for Debian family distros, dunno about RPM flavour distros. We also use clusterssh for more interactive tasks, a tiny Tk application which launches multiple xterms and sends keystrokes to them (or selected subsets of them) synchronously. You can still type into the individual xterms as well, when required. Fabulously useful when the admin task is interactive (such as running something like yast2 or aptitude). Gets pretty unwieldy for more than about 20 servers - depends on how much screen real-estate you have and how small a font your eyes can cope with! Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From prentice at ias.edu Tue May 11 10:20:34 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 11 May 2010 13:20:34 -0400 Subject: [Beowulf] pdsh question Message-ID: <4BE991E2.3040703@ias.edu> Since so many of you use and recommend pdsh, I have a few questions for you: 1. Do you build and RPM from the .spec file, which doesn't support genders, or do you configure/compile yourself? 2. If not using genders, what is the syntax of the /etc/machines file? I assume it's the same as the gender file, but that's just a hunch. 3. Are there any advantages/disadvantages to using machines over genders? -- Prentice From ashley at pittman.co.uk Tue May 11 13:17:40 2010 From: ashley at pittman.co.uk (Ashley Pittman) Date: Tue, 11 May 2010 22:17:40 +0200 Subject: [Beowulf] pdsh question In-Reply-To: <4BE991E2.3040703@ias.edu> References: <4BE991E2.3040703@ias.edu> Message-ID: On 11 May 2010, at 19:20, Prentice Bisbal wrote: > Since so many of you use and recommend pdsh, I have a few questions for > you: > > 1. Do you build and RPM from the .spec file, which doesn't support > genders, or do you configure/compile yourself? I build it myself. From the top of my head the options I use are --with-ssh --without-rsh. Last time i built it if both were built the default was to prefer rsh over ssh which should probably be changed at some point. > 2. If not using genders, what is the syntax of the /etc/machines file? I > assume it's the same as the gender file, but that's just a hunch. It's just a flat list of hosts, one per line, although I believe it can take host-specs as well. e.g compute[0-1023] > 3. Are there any advantages/disadvantages to using machines over genders? Genders is much more flexible, machines is easier to configure. Two more things of note, "dshbak -c" is worth knowing about, pipe the output of pdsh into this and it'll sort the output by hostname and compress hosts with identical output into a single report. The other really useful aspect of pdsh is the "-R exec" option, instead of running the command on a remote node it runs the command locally but replaces %h with the hostname. One trivial example is "pdsh -a -R exec grep %h /var/log/messages | dskbak -c" but once you get used to it you can use it for much more advanced commands, earlier on today I ran "pdsh -w [0-25] -R exec tune2fs -O extents /dev/mapper/ost_%h" to re-tune all the devices in a lustre filesystem. Ashley. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for cluster computing http://padb.pittman.org.uk From d.love at liverpool.ac.uk Tue May 11 15:53:02 2010 From: d.love at liverpool.ac.uk (Dave Love) Date: Tue, 11 May 2010 23:53:02 +0100 Subject: [Beowulf] Re: looking for good distributed shell program In-Reply-To: <3A31FEA5-BCC8-401D-9229-FAD5BDCA8841@sanger.ac.uk> (Tim Cutts's message of "Tue, 11 May 2010 16:14:36 +0100") References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> <4BE95FC4.6040307@scalableinformatics.com> <4BE962E6.7070004@abdn.ac.uk> <3A31FEA5-BCC8-401D-9229-FAD5BDCA8841@sanger.ac.uk> Message-ID: <871vdinj8x.fsf@liv.ac.uk> Tim Cutts writes: >> We use Dancer's shell "dsh": >> >> http://www.netfort.gr.jp/~dancer/software/dsh.html.en > > Second that recommendation - we use that one too. It's pre-packaged for Debian family distros, dunno about RPM flavour distros. Why that rather than pdsh, especially in an HPC setting? (pdsh is in Debian too, for what it's worth.) From d.love at liverpool.ac.uk Tue May 11 15:56:38 2010 From: d.love at liverpool.ac.uk (Dave Love) Date: Tue, 11 May 2010 23:56:38 +0100 Subject: [Beowulf] Re: pdsh question In-Reply-To: <4BE991E2.3040703@ias.edu> (Prentice Bisbal's message of "Tue, 11 May 2010 13:20:34 -0400") References: <4BE991E2.3040703@ias.edu> Message-ID: <87zl06m4ih.fsf@liv.ac.uk> Prentice Bisbal writes: > 1. Do you build and RPM from the .spec file, which doesn't support > genders, or do you configure/compile yourself? ?? I'm pretty sure I built it with genders using the supplied spec file. Check the comments at the top about configuration -- I don't have it to hand. > 2. If not using genders, what is the syntax of the /etc/machines file? I > assume it's the same as the gender file, but that's just a hunch. > > 3. Are there any advantages/disadvantages to using machines over genders? I don't see any advantage to machines. I'd recommend genders, which is definitely useful to select the different node types we have, e.g. with different service processors. (Unfortunately freeipmi doesn't have genders support, and as far as I know, there's no way to associate nodes and their service processors directly with genders, which would make some things easier. I've been meaning to ask Al Chu about implementing that.) From d.love at liverpool.ac.uk Tue May 11 16:19:28 2010 From: d.love at liverpool.ac.uk (Dave Love) Date: Wed, 12 May 2010 00:19:28 +0100 Subject: [Beowulf] Re: Best Way to Use 48-cores For Undergrad Cluster? In-Reply-To: <437oegBVM4506S08.1273195358@cmsweb08.cms.usa.net> (Joshua mora acosta's message of "Thu, 06 May 2010 20:22:38 -0500") References: <437oegBVM4506S08.1273195358@cmsweb08.cms.usa.net> Message-ID: <87wrvam3gf.fsf@liv.ac.uk> "Joshua mora acosta" writes: > If you do virtualization you may want at least to pin the guest OS to each > core and provide a "quota" of main memory local to that core to that > guest OS. I don't see how that would help in this case except, maybe, for a frontend VM, to maintain resources for it. It gives less freedom to pack the teaching jobs on. > You could even go further by "carving" the number of cores and memory amount > and the proper pinning provided when you submit the job it reads those > specifications and then if those HW resources are available then you would > start that guest OS with those specifications and then run that workload. > This last thing would be the dynamic provisioning. Such provisioning isn't representative of typical computational chemistry setups, is it? Partitioning the system might be useful to teach SGE resource requests in a distributed system, but different OSes would considerably complicate things. > Another good experiment would be the migration from one guest OS to another > guest OS (checkpoint on guest OS 1 and restart on guest OS 2). Why would you want to, and how could you expect it to work in general (e.g. BLCR, DMTCP checkpointing)? There is experience scheduling Xen VMs with a modified SGE (e.g. Uni Marburg's XGE), but I don't see it helping the teaching. From d.love at liverpool.ac.uk Tue May 11 16:14:58 2010 From: d.love at liverpool.ac.uk (Dave Love) Date: Wed, 12 May 2010 00:14:58 +0100 Subject: [Beowulf] Re: Best Way to Use 48-cores For Undergrad Cluster? In-Reply-To: <4BE361AB.5090706@berkeley.edu> (Jon Forrest's message of "Thu, 06 May 2010 17:41:15 -0700") References: <4BE361AB.5090706@berkeley.edu> Message-ID: <87y6fqm3nx.fsf@liv.ac.uk> Jon Forrest writes: > I see you can now get 48-cores in one 1U box. > What do you think about running all the > compute nodes as 1-core virtual machines on > the one box? I don't see how that would help unless you want to teach on a simulated distributed system, but then you probably want multi-core VMs. > Or, would you just run > the machine with one OS and a SGE > queue with 47 slots (with 1 core for the frontend)? I would, modulo the above. That's surely easiest, and probably the most efficient way to run it, assuming the job mixture doesn't make over-subscription useful. From d.love at liverpool.ac.uk Tue May 11 16:24:19 2010 From: d.love at liverpool.ac.uk (Dave Love) Date: Wed, 12 May 2010 00:24:19 +0100 Subject: [Beowulf] Re: Best Way to Use 48-cores For Undergrad Cluster? In-Reply-To: (John Hearns's message of "Fri, 7 May 2010 06:22:44 +0100") References: <437oegBVM4506S08.1273195358@cmsweb08.cms.usa.net> Message-ID: <87vdaum38c.fsf@liv.ac.uk> John Hearns writes: > Joshua mentions 'pinning' the guest OS - which sounds interesting and > we should hear more about that if possible. Isn't it fairly clear you'd want that for efficiency, assuming you haven't lost much wit hthe VM? However, you typically can't use affinity within a multi-core VM (e.g. Xen on RH5). > If you go the route of having one machine with many cores, borrowing a > technique from big NUMA you could look at cpusets. I don't see why that would be relevant in this teaching case, especially if over-subscription is useful there. > I know that PBS integrates well with cpusets on the Altix, SGE should also. As far as I remember, cpusets don't work cleanly because the DRM doesn't have sufficient control. SGE now supports core binding (using the same library as OpenMPI). From d.love at liverpool.ac.uk Tue May 11 16:26:16 2010 From: d.love at liverpool.ac.uk (Dave Love) Date: Wed, 12 May 2010 00:26:16 +0100 Subject: [Beowulf] Re: Choosing pxelinux.cfg DEFAULT via dhcpd.conf? In-Reply-To: (David Mathog's message of "Thu, 22 Apr 2010 12:11:06 -0700") References: Message-ID: <87tyqem353.fsf@liv.ac.uk> In case it's still relevant at this stage, as this doesn't seem to have been mentioned: "David Mathog" writes: > Is there a way to set dhcpd.conf so that it changes which pxelinux.cfg > entry (LABEL) starts on a network boot? As Prentice implied, juggle explicit tftpboot files, rather than using labels, assuming you don't want to interact with the boot on the console. For that I use http://subtrac.sara.nl/oss/pxeconfig, which has support for transient configurations, e.g. boot to configure/provision and then reboot from a different PXE image automagically. From mm at yuhu.biz Wed May 12 00:57:23 2010 From: mm at yuhu.biz (Marian Marinov) Date: Wed, 12 May 2010 10:57:23 +0300 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE95C11.4070400@ias.edu> References: <4BE861C8.9050400@ias.edu> <2969652-1273537199-cardhu_decombobulator_blackberry.rim.net-1971957162-@bda517.bisx.prod.on.blackberry> <4BE95C11.4070400@ias.edu> Message-ID: <201005121057.32149.mm@yuhu.biz> Hello, We manage more then 2k linux installations and we don't use any distributed shells. Actually we use this script: http://sourceforge.net/projects/multy-command/ What it does is take a command and executes it in parallel or sequential on a list of servers. Best regards, Marian > > -----Original Message----- > > From: Prentice Bisbal > > Date: Mon, 10 May 2010 15:43:04 > > To: Beowulf Mailing List > > Subject: [Beowulf] looking for good distributed shell program > > > > Beowulfers, > > > > I'm looking for something that isn't exactly cluster-related, but this > > is something that most cluster admins would be familiar with. I'm > > looking for a good distributed shell, something similar to tentakel or > > gsh. I figure all of you probably have recommendations/opinions on the > > best ones. > > > > I'm familiar with tentakel, but I find it lacking in a few areas, and > > it's recently been abandoned by it's developer. The author of tentakel > > recommends gsh, but gsh doesn't allow to create pre-defined groups of > > hosts in a config file. > > > > Here's my wish list: > > > > 1. Be able to maintain a central config file with different group > > definitiosn with in it. > > > > 2. Run the commands in parallel and organize the output > > > > 3. Be able specify the user the command runs as on the command-line, so > > I don't have to become root just to run a single command as root. > > > > 4. Be able to subtract systems from a group or add additional ones on > > the commandline. For example, if I have group "cluster", but node05 is > > down, so I want to omit it and add desktop1 instead, I could do > > something like. > > > > -g cluster-node05+desktop1 > > > > I used a program with these features about 10 years ago. I think it was > > gsh or dsh, but the gsh and dsh I've found today, are different than > > what I used 10 years ago. > > > > any recommendations? -- Best regards, Marian Marinov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From tjrc at sanger.ac.uk Wed May 12 01:06:49 2010 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed, 12 May 2010 09:06:49 +0100 Subject: [Beowulf] Re: looking for good distributed shell program In-Reply-To: <871vdinj8x.fsf@liv.ac.uk> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> <4BE95FC4.6040307@scalableinformatics.com> <4BE962E6.7070004@abdn.ac.uk> <3A31FEA5-BCC8-401D-9229-FAD5BDCA8841@sanger.ac.uk> <871vdinj8x.fsf@liv.ac.uk> Message-ID: <6B10FF30-07F8-4CE6-AE66-FD900FBDEF0D@sanger.ac.uk> On 11 May 2010, at 11:53 pm, Dave Love wrote: > Tim Cutts writes: > >>> We use Dancer's shell "dsh": >>> >>> http://www.netfort.gr.jp/~dancer/software/dsh.html.en >> >> Second that recommendation - we use that one too. It's pre-packaged for Debian family distros, dunno about RPM flavour distros. > > Why that rather than pdsh, especially in an HPC setting? (pdsh is in > Debian too, for what it's worth.) No explicit reason - it was the first one we encountered that did what we needed. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From a.travis at abdn.ac.uk Wed May 12 04:23:59 2010 From: a.travis at abdn.ac.uk (Tony Travis) Date: Wed, 12 May 2010 12:23:59 +0100 Subject: [Beowulf] Re: looking for good distributed shell program In-Reply-To: <6B10FF30-07F8-4CE6-AE66-FD900FBDEF0D@sanger.ac.uk> References: <4BE861C8.9050400@ias.edu> <4BE8726C.4040009@ias.edu> <1273547030.6950.2.camel@voltaire> <4BE95C81.908@ias.edu> <4BE95FC4.6040307@scalableinformatics.com> <4BE962E6.7070004@abdn.ac.uk> <3A31FEA5-BCC8-401D-9229-FAD5BDCA8841@sanger.ac.uk> <871vdinj8x.fsf@liv.ac.uk> <6B10FF30-07F8-4CE6-AE66-FD900FBDEF0D@sanger.ac.uk> Message-ID: <4BEA8FCF.1010100@abdn.ac.uk> On 12/05/10 09:06, Tim Cutts wrote: > > On 11 May 2010, at 11:53 pm, Dave Love wrote: > >> Tim Cutts writes: >> >>>> We use Dancer's shell "dsh": >>>> >>>> http://www.netfort.gr.jp/~dancer/software/dsh.html.en >>> >>> Second that recommendation - we use that one too. It's pre-packaged for Debian family distros, dunno about RPM flavour distros. >> >> Why that rather than pdsh, especially in an HPC setting? (pdsh is in >> Debian too, for what it's worth.) > > No explicit reason - it was the first one we encountered that did what we needed. Hello, Tim and Dave. I use "dsh" because it's simpler than "pdsh" and, as Tim said, it does the job we want it to do: We used it to crawl around lists of machines checking if things are working, but I'm using Nagios2 for that now :-) Bye, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From d.love at liverpool.ac.uk Wed May 12 07:03:38 2010 From: d.love at liverpool.ac.uk (Dave Love) Date: Wed, 12 May 2010 15:03:38 +0100 Subject: [Beowulf] Re: pdsh question In-Reply-To: <87zl06m4ih.fsf@liv.ac.uk> (Dave Love's message of "Tue, 11 May 2010 23:56:38 +0100") References: <4BE991E2.3040703@ias.edu> <87zl06m4ih.fsf@liv.ac.uk> Message-ID: <87wrv9kyit.fsf@liv.ac.uk> I wrote: > Prentice Bisbal writes: > >> 1. Do you build and RPM from the .spec file, which doesn't support >> genders, or do you configure/compile yourself? > > ?? I'm pretty sure I built it with genders using the supplied spec file. > Check the comments at the top about configuration -- I don't have it to > hand. For what it's worth, it seems I used rpmbuild --without dshgroups --without netgroup --without machines --with nodeupdown --with genders along with the whatsup pingd module. (pingd gives a more reliable -v than ganglia since ganglia gives false negatives if gmond dies and false positives if you simply feed out-of-band metrics with gmetric. It's a bit unclean to have pingd and Nagios both pinging, but I don't think it has a practical effect.) From prentice at ias.edu Wed May 12 08:24:31 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 12 May 2010 11:24:31 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <201005121057.32149.mm@yuhu.biz> References: <4BE861C8.9050400@ias.edu> <2969652-1273537199-cardhu_decombobulator_blackberry.rim.net-1971957162-@bda517.bisx.prod.on.blackberry> <4BE95C11.4070400@ias.edu> <201005121057.32149.mm@yuhu.biz> Message-ID: <4BEAC82F.4090103@ias.edu> Marian Marinov wrote: > Hello, > We manage more then 2k linux installations and we don't use any distributed > shells. Actually we use this script: > > http://sourceforge.net/projects/multy-command/ > > What it does is take a command and executes it in parallel or sequential on a > list of servers. > That's exactly what these other programs do. -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ From prentice at ias.edu Wed May 12 14:01:47 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 12 May 2010 17:01:47 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE861C8.9050400@ias.edu> References: <4BE861C8.9050400@ias.edu> Message-ID: <4BEB173B.7030200@ias.edu> Beowulfers, I decided to go with this gsh: http://outflux.net/unix/software/gsh/ I looked at pdsh, and it looks powerful, but more complicated, too. This gsh does everything I need, and has a simple config file syntax as well as exactly the command sytnax I was looking for. I already have it completely configured for my environment. And due to the simpler syntax, it will be easier to get my coworkers to use it, too. ;) Thanks for all your replies. Those of you who voted for pdsh, don't worry, you're votes weren't wasted I still plan to tinker with pdsh. -- Prentice Prentice Bisbal wrote: > Beowulfers, > > I'm looking for something that isn't exactly cluster-related, but this > is something that most cluster admins would be familiar with. I'm > looking for a good distributed shell, something similar to tentakel or > gsh. I figure all of you probably have recommendations/opinions on the > best ones. > > I'm familiar with tentakel, but I find it lacking in a few areas, and > it's recently been abandoned by it's developer. The author of tentakel > recommends gsh, but gsh doesn't allow to create pre-defined groups of > hosts in a config file. > > Here's my wish list: > > 1. Be able to maintain a central config file with different group > definitiosn with in it. > > 2. Run the commands in parallel and organize the output > > 3. Be able specify the user the command runs as on the command-line, so > I don't have to become root just to run a single command as root. > > 4. Be able to subtract systems from a group or add additional ones on > the commandline. For example, if I have group "cluster", but node05 is > down, so I want to omit it and add desktop1 instead, I could do > something like. > > -g cluster-node05+desktop1 > > I used a program with these features about 10 years ago. I think it was > gsh or dsh, but the gsh and dsh I've found today, are different than > what I used 10 years ago. > > any recommendations? > > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ From prentice at ias.edu Thu May 13 07:28:48 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 13 May 2010 10:28:48 -0400 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <72261A3A4E399C41B6A6E7E01BEB0A0C40250B6A91@DCPWVMBXC1VS3.mdanderson.edu> References: <4BE861C8.9050400@ias.edu> <4BEB173B.7030200@ias.edu> <72261A3A4E399C41B6A6E7E01BEB0A0C40250B6A91@DCPWVMBXC1VS3.mdanderson.edu> Message-ID: <4BEC0CA0.6040901@ias.edu> I use clusterssh, too. It's a great tool for interactive commands, but for simple, non-interactive commands, I prefer something like gsh or pdsh. Sjursen,Robert wrote: > Greetings to all. > > I have come across this tool as well and have used it with success. Please see the link (very grateful to contributor for making available on sourceforge also) below. > > http://sourceforge.net/projects/clusterssh/ > > Regards Robert > > > > Robert Sjursen > Department of Imaging Physics > MD Anderson Cancer Center > Houston Texas > > > > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Prentice Bisbal > Sent: Wednesday, May 12, 2010 4:02 PM > To: Beowulf Mailing List > Subject: Re: [Beowulf] looking for good distributed shell program > > Beowulfers, > > I decided to go with this gsh: > > http://outflux.net/unix/software/gsh/ > > I looked at pdsh, and it looks powerful, but more complicated, too. This > gsh does everything I need, and has a simple config file syntax as well > as exactly the command sytnax I was looking for. I already have it > completely configured for my environment. And due to the simpler syntax, > it will be easier to get my coworkers to use it, too. ;) > > Thanks for all your replies. Those of you who voted for pdsh, don't > worry, you're votes weren't wasted I still plan to tinker with pdsh. > > -- > Prentice > > > Prentice Bisbal wrote: >> Beowulfers, >> >> I'm looking for something that isn't exactly cluster-related, but this >> is something that most cluster admins would be familiar with. I'm >> looking for a good distributed shell, something similar to tentakel or >> gsh. I figure all of you probably have recommendations/opinions on the >> best ones. >> >> I'm familiar with tentakel, but I find it lacking in a few areas, and >> it's recently been abandoned by it's developer. The author of tentakel >> recommends gsh, but gsh doesn't allow to create pre-defined groups of >> hosts in a config file. >> >> Here's my wish list: >> >> 1. Be able to maintain a central config file with different group >> definitiosn with in it. >> >> 2. Run the commands in parallel and organize the output >> >> 3. Be able specify the user the command runs as on the command-line, so >> I don't have to become root just to run a single command as root. >> >> 4. Be able to subtract systems from a group or add additional ones on >> the commandline. For example, if I have group "cluster", but node05 is >> down, so I want to omit it and add desktop1 instead, I could do >> something like. >> >> -g cluster-node05+desktop1 >> >> I used a program with these features about 10 years ago. I think it was >> gsh or dsh, but the gsh and dsh I've found today, are different than >> what I used 10 years ago. >> >> any recommendations? >> >> > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ From ldcaamano at gmail.com Wed May 5 11:44:02 2010 From: ldcaamano at gmail.com (Didier Caamano) Date: Wed, 5 May 2010 12:44:02 -0600 Subject: [Beowulf] Using beowulf to unify or consolidate storage Message-ID: Hello to everyone, I apologize if this email is out of place, but I have the following personal project and I have been searching over the internet to try to find the best possible solution. We are not a big company and I am trying to implement some sort of SAN or NAS to consolidate my storage. I've been reading through Beowulf Book, I have to admit I am just in the first pages, but before I continue on reading I wanted to ask the question so not to waste my time reading in case it is not possible. I have a whole bunch of PCs that are not longer in use and are just collecting dust, I'm trying to (in case it is possible) to somehow put them all to work as a single unit and use their hard drives or add more hard drives, to create a Storage Area Network. Is it possible, using Beowulf to achieve this goal. Are there any recommendations as to where to start with this? I am eager to learn new things, I have experience using BSD and GNU/Linux, I just want to know which direction to go in order to achieve my goal. Thanks and have a good day. -- Didier Caamano -------------- next part -------------- An HTML attachment was scrubbed... URL: From hunteke at earlham.edu Thu May 6 18:18:55 2010 From: hunteke at earlham.edu (Kevin Hunter) Date: Thu, 06 May 2010 21:18:55 -0400 Subject: [Beowulf] Best Way to Use 48-cores For Undergrad Cluster? In-Reply-To: <4BE361AB.5090706@berkeley.edu> References: <4BE361AB.5090706@berkeley.edu> Message-ID: <4BE36A7F.30004@earlham.edu> At 8:41pm -0400 Thu, 06 May 2010, Jon Forrest wrote: > I see you can now get 48-cores in one 1U box. What do you think > about running all the compute nodes as 1-core virtual machines on > the one box? Or, would you just run the machine with one OS and a > SGE queue with 47 slots (with 1 core for the frontend)? I'm not an administrative type and haven't yet had to solve this style of problem, so I can't actually respond directly with an answer. I will briefly mention that this has recently become a perhaps-viable option to investigate thanks to the memory deduplication code that was incorporated into the Linux kernel as of 2.6.32. http://kernelnewbies.org/Linux_2_6_32#head-d3f32e41df508090810388a57efce73f52660ccb http://www.ibm.com/developerworks/linux/library/l-kernel-shared-memory/index.html?ca=dgr-lnxw01LX-KSMdth-LX I haven't yet read anything about how it plays with HPC. Would not mind a link, if anyone has one. Another angle you might try is putting a portable cluster in your office or in class everyday. I'm not aware of the chemistry track software requirements, but you may be interested in a couple of ongoing HPC education oriented projects: http://www.calvin.edu/~adams/research/microwulf/ -> cluster attempting to be portable and to reduce the $/GFlop ratio. This article on clustermonkey.net may also be of interest: http://www.clustermonkey.net/content/view/211/1/ Finally, some related projects may also be of interest: http://littlefe.net/ -> inexpensive cluster created specifically with undergraduate (and younger) students in mind. Students can physically handle the components and can get a visceral experience of solving their problem. Hope this helps, Kevin From dsk at ci.uchicago.edu Tue May 11 00:02:10 2010 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Tue, 11 May 2010 08:02:10 +0100 Subject: [Beowulf] [hpc-announce] 25th IEEE International Parallel & Distributed Processing Symposium Message-ID: [Our apologies if you receive multiple copies of this CFP] May 10, 2010 Release ---------------------------------------------- 25th IEEE International Parallel & Distributed Processing Symposium ---------------------------------------------- Call for Participation Call for Workshop Proposals Call for Papers NEW: Program Committee Listing ---------------------------------------------- IPDPS 2011 Anchorage (Alaska) USA 16-20 May 2011 www.ipdps.org ---------------------------------------------- * Sponsored by IEEE Computer Society Technical Committee on Parallel Processing * In cooperation with ACM SIGARCH, IEEE Computer Society Technical Committee on Computer Architecture, and IEEE Computer Society Technical Committee on Distributed Processing ---------------------------------------------- Abstracts due...24 September 2010 Papers due...1 October 2010 ---------------------------------------------- IPDPS 2011 CALL FOR PARTICIPATION Anchorage, home to moose, bears, birds and whales, is strategically located at almost equal flying distance from Europe, Asia and the Eastern USA. Embraced by six mountain ranges, with views of Mount McKinley in Denali National Park, and warmed by a maritime climate, the area offers year-round adventure, recreation, and sporting events. It is a fitting destination for IPDPS to mark a quarter century of tracking developments in computer science. IPDPS serves as a forum for engineers and scientists from around the world to present their latest research findings in the fields of parallel processing and distributed computing. The five-day program will follow the usual format of contributed papers, invited speakers, and panels mid week, framed by workshops held on the first and last days. To celebrate the 25th year of IPDPS, plan to come early and stay late and also enjoy a modern city surrounded by spectacular wilderness. For updates on IPDPS 2011, visit the Web at www.ipdps.org. GENERAL CHAIR Alan Sussman (University of Maryland, USA) PROGRAM CHAIR Frank Mueller (North Carolina State University, USA) PROGRAM VICE-CHAIRS ALGORITHMS: Olivier Beaumont (INRIA, France) APPLICATIONS: Leonid Oliker (Lawrence Berkeley National Laboratory, USA) ARCHITECTURES: Mahmut Taylan Kandemir (The Pennsylvania State University, USA) SOFTWARE: Dimitrios S. Nikolopoulos (FORTH-ICS and University of Crete, Greece) ---------------------------------------------- WORKSHOPS CHAIR Umit V. Catalyurek (Ohio State University, USA) Call for Workshops: IPDPS workshops, held on the first and last days of the symposium, provide attendees an opportunity to explore special topics. They also broaden the content of the week's presentations by extending the topics of interest beyond those of the main symposium. For more information on organizing a new workshop, contact the Workshops Chair (workshops at ipdps.org) before July 1, 2010. ---------------------------------------------- IPDPS 2011 CALL FOR PAPERS Scope: Authors are invited to submit manuscripts that present original unpublished research in all areas of parallel and distributed processing, including the development of experimental or commercial systems. Work focusing on emerging technologies is especially welcome. Topics of interest include, but are not limited to: * Parallel and distributed algorithms, focusing on issues such as: stability, scalability, and fault-tolerance of algorithms and data structures for parallel and distributed systems, communication and synchronization protocols, network algorithms, scheduling and load balancing. * Applications of parallel and distributed computing, including web applications, peer-to-peer computing, grid computing, scientific applications, and mobile computing. Papers focusing on applications using novel commercial or research architectures, or discussing scalability toward the exascale level are encouraged. * Parallel and distributed architectures, including architectures for instruction-level and thread-level parallelism; petascale and exascale systems designs; special-purpose architectures, including graphics processors, signal processors, network processors, media accelerators and other special purpose processors and accelerators; impact of technology on architecture; network and interconnect architectures; parallel I/O and storage systems; architecture of the memory hierarchy; power-efficient architectures; dependable architectures; and performance modeling and evaluation. * Parallel and distributed software, including parallel and multicore programming languages and compilers, runtime systems, operating systems, resource management, middleware, libraries, performance modeling and evaluation, parallel programming paradigms, and programming environments and tools. IMPORTANT DATES * Abstracts due September 24, 2010 * Submissions due October 1, 2010 (hard deadline, no extensions) * Rebuttal Period: November 10-12, 2010 * Author notification: December 17, 2010 * Camera-ready papers: February 1, 2011 * Conference dates: May 16-20, 2011 Best Papers Awards: Awards will be given for one best paper in each of the four conference technical tracks: algorithms, applications, architectures, and software. Selected papers will be considered for possible publication in a special issue of the Journal of Parallel and Distributed Computing. What/Where to Submit: Submitted manuscripts may not exceed 12 single-spaced pages using 12-point size font on 8.5x11 inch pages (IEEE conference style), including figures, tables, and references. More details on submissions and instructions for submitting files are available at www.ipdps.org or may be obtained by sending email to cfp at ipdps.org for an automatic reply. IPDPS will again require submission of abstracts one week before the paper submission deadline without any late exceptions (see above). Review of Manuscripts: All submitted manuscripts will be reviewed. Submissions will be judged on correctness, originality, technical strength, significance, quality of presentation, and interest and relevance to the conference scope. Submitted papers may NOT have appeared in, or be under consideration for, another conference or workshop, or for a journal. Abstracts are due September 24, 2010, and full manuscripts must be received by October 1, 2010. This is a final, hard deadline; to ensure fairness, no extensions will be given. There will be a rebuttal period from November 17-19, 2010. Notification of review decisions will be mailed by December 17, 2010, and camera-ready papers will be due February 1, 2011. PROGRAM COMMITTEE Mark ADAMS (Columbia University) USA Gul AGHA (University of Illinois at Urbana-Champaign) USA Sadaf ALAM (Swiss National Supercomputer Center) Switzerland Hideharu AMANO (Keio University) Japan Henrique ANDRADE (IBM Thomas J. Watson) USA Christos D. ANTONOPOULOS (University of Thessaly) Greece James ASPNES (Yale University) USA Rosa BADIA (Barcelona Supercomputing Center) Spain Amitabha BAGCHI (Indian Institute of Technology Delhi) India David BAILEY (Lawrence Berkeley National Laboratory) USA Ray BAIR (Argonne National Laboratory) USA Pavan BALAJI (Argonne National Lab) USA Anne BENOIT (ENS Lyon) France Petra BERENBRINK (Simon Fraser University) Canada Rupak BISWAS (NASA Ames Research Center) USA Ron BRIGHTWELL (Sandia National Labs) USA Ali R. BUTT (Virginia Tech) USA Wentdong CAI (Nanyang University) Singapore Henri CASANOVA (University of Hawaii at Manoa) USA Calin CASCAVAL (Qualcomm) USA Umit CATALYUREK (Ohio State University) USA Barbara CHAPMAN (University of Houston) USA Amitabh CHAUDHARY (University of Notre Dame) USA Guihai CHEN (University of Nanjing) China Wenguang CHEN (Tsinghua University) China Bruce CHILDERS (University of Pittsburgh) USA Alok CHOUDHARY (Northwestern University) USA Edmond CHOW (D. E. Shaw Research) USA Kei DAVIS (Los Alamos National Laboratory) USA Ewa DEELMAN (Information Sciences Institute) USA Bronis R. DE SUPINSKI (Lawrence Livermore National Lab) USA Karen DEVINE (Sandia National Labs) USA Chen DING (University of Rochester) USA Shlomi DOLEV (Ben Gurion University of the Negev) Israel Zhihui DU (Tsinghua University) China Anne ELSTER (Norwegian University of Science & Technology) Norway Robert van ENGELEN (Florida State University) USA St??phane ETHIER (Princeton Plasma Physics Laboratory) USA Rob FARBER (Pacific Northwest Laboratory) USA John FEO (Pacific Northwest Laboratory) USA Paola FLOCCHINI (University of Ottawa) Canada Ian FOSTER (Argonne National Laboratory & The University of Chicago) USA Geoffrey FOX (Indiana University) USA Michael GARLAND (NVIDIA) USA Leszec GASIENIEC (University of Liverpool) UK Ada GAVRILOVSKA (Georgia Tech) USA Maria Engracia GOMEZ (Universidad Politecnica de Valencia) Spain Olga GOUSSEVSKAIA (ETH Zurich and ABB Research) Switzerland Manimaran GOVINDARASU (Iowa State University) USA Laura GRIGORI (INRIA) France John GROSH (Lawrence Livermore National Laboratory) USA Isabelle GUERIN-LASSOUS (University of Lyon 1) France Erik HAGERSTEN (Uppsala University) Sweden Bruce HENDRICKSON (Sandia National Labs) USA Jeff HOLLINGSWORTH (University of Maryland) USA Bo HONG (Georgia Institute of Technology) USA Wen-Jing HSU (Nanyang Technological University) Singapore Engin IPEK (University of Rochester) USA Mary Jane IRWIN (Penn State University) USA Ravishankar IYER (Intel) USA Klaus JANSEN (University of Kiel) Germany Emmanuel JEANNOT (INRIA) France Natalie Enright-JERGER (University of Toronto) Canada Song JIANG (Wayne State University) USA Hai JIN (Huazhong University of Science and Technology) China Gabriele JOST (Unviversity of Texas at Austin) USA Vana KALOGERAKI (Athens University of Economics and Business) Greece Sven KARLSSON (Technical University of Denmark) Denmark George KARYPIS (University of Minnesota) USA Stefanos KAXIRAS (University of Patras) Greece Thilo KIELMANN (Vrije Universiteit) Netherlands Eun Jung KIM (Texas A&M University) USA Hyesoon KIM (Georgia Tech) USA David KONERDING (Google) USA Alice KONIGES (Lawrence Berkeley National Laboratory) USA Goran KONJEVOD (Arizona State University) USA Madhukar KORUPOLU (Google) USA Miroslaw KORZENIOWSKI (Wroclaw University of Technology) Poland Nectarios KOZIRIS (National Technical University of Athens) Greece Uwe KUESTER (High Performance Computing Center Stuttgart) Germany Milind KULKARNI (Purdue University) USA Rakesh KUMAR (UIUC) USA Alexey LASTOVETSKY (University College Dublin) Ireland Patrick Pak-Ching LEE (The Chinese Univ of Hong Kong) China Arnaud LEGRAND (CNRS) France Xiaoming LI (Peking University) China Xiang LONG (Beihang University) China David LOWENTHAL (University of Arizona) USA Bob LUCAS (Information Sciences Institute) USA Xiaosong MA (North Carolina State University) USA Fredrik MANNE (University of Bergen) Norway Loris MARCHAL (CNRS) France Simon MARLOW (Microsoft Research) USA Xavier MARTORELL (Univesitat Politecnica de Catalunya and Barcelona Supercomputing Center) Spain Fabien MATHIEU (Orange Labs) France Satoshi MATSUOKA (Tokyo Institute of Technology) Japan Gokhan MEMIK (Northwestern University) USA Bilha MENDELSON (IBM Haifa Research Labs) Israel Shirley MOORE (University of Tennessee) USA Christine MORIN (INRIA) France Kengo NAKAJIMA (University of Tokyo) Japan Chrysostomos NICOPOULOS (University of Cyprus) Cyprus Boyana NORRIS (Argonne National Lab) USA Ozcan OZTURK (Bilkent University) Turkey Vijay PAI (Purdue University) USA Dhabaleswar PANDA (Ohio State University) USA Marina PAPATRIANTAFILOU (Chalmers University of Technology) Sweden Manish PARASHAR (National Science Foundation & Rutgers University) USA Srinivasan PARTHASARATHY (Ohio State University) USA Cynthia A. PHILLIPS (Sandia National Laboratories) USA Beth PLALE (Indiana University Bloomington) USA Padma RAGHAVAN (Pennsylvania State University) USA Alistair RENDELL (Australian National University) Australia Philip ROTH (Oak Ridge National Laboratory) USA Yogish SABHARWAL (IBM Research) India Rizos SAKELLARIOU (University of Manchester) UK Nagiza SAMATOVA (North Carolina State University), USA Vivek SARKAR (Rice University) USA Mitsuhisa SATO (University of Tsukuba) Japan Li SHANG (University of Colorado-Boulder) USA Christian SCHINDELHAUER (University of Freiburg) Germany Stefan SCHMID (Deutsche Telekom Laboratories and TU Berlin) Germany Erik SCHNETTER (Louisiana State University) USA Jennifer SCHOPF (National Science Foundation) USA Martin SCHULZ (Lawrence Livermore National Lab) USA Xipeng SHEN (College of William and Mary) USA John STONE (University of Illinois at Urbana-Champaign) USA Yuzhong SUN (Institute of Computing Technology - Chinese Academy of Sciences) China Nigel TOPHAM (University of Edinburgh) UK Pedro TRANCOSO (University of Cyprus) Cyprus Dan TSAFRIR (Technion - Israel Institute of Technology) Israel Geoffroy VALLEE (Oak Ridge National Lab) USA Laurent VIENNOT (INRIA) France Richard VUDUC (Georgia Institute of Technology) USA Cho-Li WANG (University of Hong Kong) China Jianyong WANG (Tsinghua University) China Gerhard WELLEIN (Erlangen Regional Computing Center) Germany Li XIAO (Michigan State University) USA Ramin YAHYAPOUR (TU Dortmund University) Germany Chia-Lin YANG (National Taiwan University) Taiwan Qing YANG (University of Rhode Island) USA Yuanyuan YANG (Stony Brook University) USA Xiaodong ZHANG (Ohio State University) USA Youtao ZHANG (University of Pittsburgh) USA -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From angelv at iac.es Thu May 13 01:44:55 2010 From: angelv at iac.es (=?utf-8?Q?=C3=81ngel_de_Vicente?=) Date: Thu, 13 May 2010 09:44:55 +0100 Subject: [Beowulf] Re: looking for good distributed shell program In-Reply-To: <201005111900.o4BJ06Yq011948@bluewest.scyld.com> References: <201005111900.o4BJ06Yq011948@bluewest.scyld.com> Message-ID: > On 11 May 2010, at 3:00 pm, Tony Travis wrote: > >> On 11/05/10 14:46, Joe Landman wrote: >>> Prentice Bisbal wrote: >>>> That's the 3rd or 4th vote for pdsh. I guess I better take a good look >>>> at at. >>> >>> Allow me to 5th pdsh. We don't install clusters without it. >> >> Hello, Joe and Prentice. >> >> We use Dancer's shell "dsh": >> >> http://www.netfort.gr.jp/~dancer/software/dsh.html.en > > Second that recommendation - we use that one too. It's pre-packaged for > Debian family distros, dunno about RPM flavour distros. > > We also use clusterssh for more interactive tasks, a tiny Tk application > which launches multiple xterms and sends keystrokes to them (or selected > subsets of them) synchronously. You can still type into the individual > xterms as well, when required. Fabulously useful when the admin task is > interactive (such as running something like yast2 or aptitude). Gets > pretty unwieldy for more than about 20 servers - depends on how much > screen real-estate you have and how small a font your eyes can cope with! For interactive tasks I use Omnitty (http://omnitty.sourceforge.net/), which doesn't require the screen real-estate of clusterssh and can work with any number of machines. I guess clusterssh would be preferable if you needed to inspect the output of a command in a number of machines simultaneously, but other than that I find that omnitty is a better option (also, it runs in text mode in a terminal, which is very convenient for working through ssh). Cheers, ?ngel de Vicente -- +---------------------------------------------+ | | | http://www.iac.es/galeria/angelv/ | | | | High Performance Computing Support PostDoc | | Instituto de Astrof?sica de Canarias | | | +---------------------------------------------+ From samuel at unimelb.edu.au Thu May 13 18:09:42 2010 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Fri, 14 May 2010 11:09:42 +1000 Subject: [Beowulf] looking for good distributed shell program In-Reply-To: <4BE95C11.4070400@ias.edu> References: <4BE861C8.9050400@ias.edu> <2969652-1273537199-cardhu_decombobulator_blackberry.rim.net-1971957162-@bda517.bisx.prod.on.blackberry> <4BE95C11.4070400@ias.edu> Message-ID: On 11/05/10 23:30, Prentice Bisbal wrote: > Thanks for the reply. I have read articles on xCAT and > was always intereted in it, I just never got around to > actually installing it and trying it myself. We're using xCAT on our SGI cluster and it works nicely, including its xdsh and xdshbak commands. Version 2 of xCAT is available from SourceForge and installable via a yum repository. > You would still need to authenticate to run commands > as root. Where ssh is used, this would require the > admin to have a key that's in root's authorized_keys > file to run a remote command as root. On a cluster set up with xCAT all that is taken care of for you through its kickstart and postscript stuff. cheers, Chris -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.lux at jpl.nasa.gov Fri May 14 12:05:59 2010 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Fri, 14 May 2010 12:05:59 -0700 Subject: [Beowulf] Using beowulf to unify or consolidate storage In-Reply-To: References: Message-ID: From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Didier Caamano Sent: Wednesday, May 05, 2010 11:44 AM To: beowulf at beowulf.org Subject: [Beowulf] Using beowulf to unify or consolidate storage Hello to everyone, ? I apologize if this email is out of place, but I have the following personal project and I have been searching over the internet to try to find the best possible solution. ?We are not a big company and I am trying to implement some sort of SAN or NAS to consolidate my storage. ?I've been reading through Beowulf Book, I have to admit I am just in the first pages, but before I continue on reading I wanted to ask the question so not to waste my time reading in case it is not possible. I have a whole bunch of PCs that are not longer in use and are just collecting dust, I'm trying to (in case it is possible) to somehow put them all to work as a single unit and use their hard drives or add more hard drives, to create a Storage Area Network. ?Is it possible, using Beowulf to achieve this goal. ?Are there any recommendations as to where to start with this? ?I am eager to learn new things, I have experience using BSD and GNU/Linux, I just want to know which direction to go in order to achieve my goal. Thanks and have a good day. -- Didier Caamano ------------ A fine question. Setting up a cluster using old computers is a fine way to get experience, but for an operational system newer hardware is more reliable, cheaper in operating costs, and reduced hassle. Imagine using arrays of a 100 old 2GB hard disks instead of a 160GB disk. Not only is the single disk probably faster, it's not at the end of its life, and consumes less power. Now, if you have a lot of parallelism, so you can take advantage of many disks in parallel, then splitting things up is viable. You need to look at YOUR particular needs. But even there, the hassles of working with old equipment often outweigh it, except if your time is free (e.g. you're fooling with it to learn about how to do it) As for whether a Beowulf is good as a SAN.. probably not the optimum scheme. From gdjacobs at gmail.com Fri May 14 19:11:33 2010 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Fri, 14 May 2010 21:11:33 -0500 Subject: [Beowulf] Using beowulf to unify or consolidate storage In-Reply-To: References: Message-ID: <4BEE02D5.2000108@gmail.com> Lux, Jim (337C) wrote: > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Didier Caamano > Sent: Wednesday, May 05, 2010 11:44 AM > To: beowulf at beowulf.org > Subject: [Beowulf] Using beowulf to unify or consolidate storage > > Hello to everyone, > > I apologize if this email is out of place, but I have the following personal project and I have been searching over the internet to try to find the best possible solution. We are not a big company and I am trying to implement some sort of SAN or NAS to consolidate my storage. I've been reading through Beowulf Book, I have to admit I am just in the first pages, but before I continue on reading I wanted to ask the question so not to waste my time reading in case it is not possible. > > I have a whole bunch of PCs that are not longer in use and are just collecting dust, I'm trying to (in case it is possible) to somehow put them all to work as a single unit and use their hard drives or add more hard drives, to create a Storage Area Network. Is it possible, using Beowulf to achieve this goal. Are there any recommendations as to where to start with this? I am eager to learn new things, I have experience using BSD and GNU/Linux, I just want to know which direction to go in order to achieve my goal. > > Thanks and have a good day. New HDDs would be optimal, but it's possible to use older equipment for clustered storage, especially if it's good quality or if you're just testing the software. You could take a hybrid approach. For example, if you have older server or workstation boards, you could equip them with a few new "dumb" 4 port SATA cards (PCIe, PCI-X and PCI in order of preference), 4X SATA drives per card, Intel GigE adaptors, your UNIX of choice and use any one of a number of SAN or cluster filesystem schemes (Lustre, Gluster, GFS, XFS over iSCSI, etc) to connect it all together. You can go use port multipliers if you want to trade speed for size, but I doubt is worth considering in terms of price. Do some reading on different SAN configurations and try to think of what would be most appropriate for what you're doing. It's going to be slow compared to more modern, purpose built hardware, and you will have to devote a lot of time to testing and tweaking until you have confidence in it's performance and reliability. On the other hand, you'll probably learn a lot regarding the technology. One thing you must not do when going down this path is rush the hardware into production. You're undertaking all the risk if your project doesn't pan out, so don't blow too much economic and political capital. Treat it as an R&D project, not as production, and assume that at some point you will move to newer hardware, an outside SAN/NAS solution, or something else as needs change. Size and spend accordingly. If you need production hardware as of yesterday, there are people on this list from industry (including reps from Penguin Computing, our gracious hosts) who would be happy to discuss options with you. -- Geoffrey D. Jacobs From atp at piskorski.com Sat May 15 03:24:54 2010 From: atp at piskorski.com (Andrew Piskorski) Date: Sat, 15 May 2010 06:24:54 -0400 Subject: [Beowulf] cluster scheduler for dynamic tree-structured jobs? Message-ID: <20100515102454.GA99295@piskorski.com> Folks, I could use some advice on which cluster job scheduler (batch queuing system) would be most appropriate for my particular needs. I've looked through docs for SGE, Slurm, etc., but without first-hand experience with each one it's not at all clear to me which I should choose... I've used Sun Grid Engine for this in the past, but the result was very klunky and hard to maintain. SGE seems to have all the necessary features underneath, but no good programming API, and its command-line tools often behave in ways that make them a poor substitute. Here's my current list of needs/wants, starting with the ones that probably make my use case more unusual: 1. I have lots of embarrassingly parallel tree-structured jobs which I dynamically generate and submit from top-level user code (which happens to be written in R). E.g., my user code generates 10 or 100 or 1000 jobs, and each of those jobs might itself generate N jobs. Any given job cannot complete until all its children complete. Also, multiple users may be submitting unrelated jobs at the same time, some of their jobs should have higher priority than others, etc. (The usual reasons for wanting to use a cluster scheduler in the first place, I think.) Thus, merely assigning the individual jobs to compute nodes is not enough, I need the cluster scheduler to also understand the tree relationships between the jobs. Without that, it'd be too easy to get into a live-lock situation, where all the nodes are tied up with jobs, none of which can complete because they are waiting for child jobs which cannot be scheduled. 2. Sometimes I can statically figure out the full tree structure of my jobs ahead of time, but other times I can't or won't, so I definitely need a scheduler that lets me submit new sub-jobs on the fly, from any node in the cluster. 3. The jobs are ultimately all submitted by a small group of people who talk to each other, so I don't really care about any fancy security, cost accounting, "grid" support, or other such features aimed at large and/or loosely coupled organizations. 4. I really, really want a good API for programmably interacting with the cluster scheduler and ALL of its features. I don't care too much what language the API is in as long as it's reasonably sane and I can readily write glue code to interface it to my language of choice. 5. Although I don't currently do any MPI programming, I would very much like the option to do so in the future, and integrate it smoothly with the cluster scheduler. I assume pretty much all cluster schedulers have that, though. (Erlang integration might also be nice.) 6. Each of my individual leaf-node jobs will typically take c. 3 to 30 minutes to complete, so my use shouldn't stress the scheduler's own performance too much. However, sometimes I screw that up and submit tons of jobs that each want to run for only a small amount of time, say 2 minutes or less, so it'd be nice if the scheduler is sufficiently efficient and low-latency to keep up with that. 7. When I submit a job, I should be able to easily (and optionally) give the scheduler my estimates of how much RAM and cpu time the job will need. The scheduler should track what resources the job ACTUALLY uses, and make it easy for me to monitor job status for both running and completed jobs, and then use that information to improve my resource estimates for future jobs. (AKA good APIs, yet again.) 8. Of course the scheduler must have a good way to track all the basic information about my nodes: CPU sockets and cores, RAM, etc. Ideally it'd also be straightforward for me to extend the database of node properties as I see fit. Bonus points if it uses a good database (e.g. SQLite, PostgreSQL) and a reasonable data model for that stuff. Thanks in advance for your help and advice! -- Andrew Piskorski http://www.piskorski.com/ From atp at piskorski.com Sat May 15 05:33:39 2010 From: atp at piskorski.com (Andrew Piskorski) Date: Sat, 15 May 2010 08:33:39 -0400 Subject: [Beowulf] Re: cluster scheduler for dynamic tree-structured jobs? In-Reply-To: <20100515102454.GA99295@piskorski.com> References: <20100515102454.GA99295@piskorski.com> Message-ID: <20100515123339.GA87410@piskorski.com> On Sat, May 15, 2010 at 06:24:54AM -0400, Andrew Piskorski wrote: > 1. I have lots of embarrassingly parallel tree-structured jobs which I > dynamically generate and submit from top-level user code (which > happens to be written in R). E.g., my user code generates 10 or 100 > or 1000 jobs, and each of those jobs might itself generate N jobs. > Any given job cannot complete until all its children complete. Condor's "MW" master-worker API and DAGMan both sound potentially useful for my tree-structured jobs. However... Does MW support multiple levels of masters and workers? (That's what I need.) The docs never mention it, not even when discussing the scalability limitations of a single master process, so I presume it does not. MW also requires both Condor and Condor-PVM. http://www.cs.wisc.edu/condor/mw/ http://www.cs.wisc.edu/condor/mw/overview.html http://www.cs.wisc.edu/condor/pvm/ Since Condor does not itself understand inter-job dependencies at all, it seems that two MW master programs running at the same time could readily deadlock each other. At least, I don't see anything in either MW or Condor proper that would prevent or ameliorate that risk. >From its docs, DAGMan is purely static, it has to know about all the jobs ahead of time before any of them start, and cannot dynamically submit new jobs (no good for me). It sits as a separate layer above Condor; Condor itself does not understand inter-job dependencies at all. DAGMan's docs also say it has no way to recover if even a single one of its jobs fail, it aborts the entire DAG. That seems strange, as I'd have thought that Condor itself must support some sort of job restart when a node goes down (or is otherwise removed from the Condor pool) - does it really not? http://www.cs.wisc.edu/condor/dagman/ http://www.cs.wisc.edu/condor/manual/v6.1/2_11Inter_job_Dependencies.html The DAGMan stuff sounds like a research hack that's not really fully supported by Condor. AFAICT MW and DAGMan are also entirely unrelated to each other. Does anybody actually use either of those tools? And of course, it's not clear whether Condor in general would really meet the needs I laid out earlier. -- Andrew Piskorski http://www.piskorski.com/ From atp at piskorski.com Sat May 15 06:01:22 2010 From: atp at piskorski.com (Andrew Piskorski) Date: Sat, 15 May 2010 09:01:22 -0400 Subject: [Beowulf] cluster scheduler implemented via MPI? Message-ID: <20100515130122.GA6088@piskorski.com> A cluster scheduler (or resource manager) like SGE, SLURM, or Torque can itself be viewed as a parallel application running on a cluster. So I'm wondering, have any such schedulers been implemented *as* an MPI program? (Or PVM or whatever else?) If not, why not? Does the MPI programming environment not provide a suitable substrate for what a cluster scheduler needs to do? -- Andrew Piskorski http://www.piskorski.com/ From skylar at cs.earlham.edu Sat May 15 07:33:08 2010 From: skylar at cs.earlham.edu (Skylar Thompson) Date: Sat, 15 May 2010 07:33:08 -0700 Subject: [Beowulf] cluster scheduler for dynamic tree-structured jobs? In-Reply-To: <20100515102454.GA99295@piskorski.com> References: <20100515102454.GA99295@piskorski.com> Message-ID: <4BEEB0A4.5080505@cs.earlham.edu> On 05/15/10 03:24, Andrew Piskorski wrote: > Folks, I could use some advice on which cluster job scheduler (batch > queuing system) would be most appropriate for my particular needs. > I've looked through docs for SGE, Slurm, etc., but without first-hand > experience with each one it's not at all clear to me which I should > choose... > > I've used Sun Grid Engine for this in the past, but the result was > very klunky and hard to maintain. SGE seems to have all the necessary > features underneath, but no good programming API, and its command-line > tools often behave in ways that make them a poor substitute. > > Here's my current list of needs/wants, starting with the ones that > probably make my use case more unusual: > > 1. I have lots of embarrassingly parallel tree-structured jobs which I > dynamically generate and submit from top-level user code (which > happens to be written in R). E.g., my user code generates 10 or 100 > or 1000 jobs, and each of those jobs might itself generate N jobs. > Any given job cannot complete until all its children complete. > > Also, multiple users may be submitting unrelated jobs at the same > time, some of their jobs should have higher priority than others, etc. > (The usual reasons for wanting to use a cluster scheduler in the first > place, I think.) > > Thus, merely assigning the individual jobs to compute nodes is not > enough, I need the cluster scheduler to also understand the tree > relationships between the jobs. Without that, it'd be too easy to get > into a live-lock situation, where all the nodes are tied up with jobs, > none of which can complete because they are waiting for child jobs > which cannot be scheduled. > I'm not quite sure I understand what you're doing, but if you make all your execution hosts submit hosts as well you can submit jobs within your running jobs. You can use "-now y -sync y" in your jobs to ensure that the parent doesn't exit until its children have exited. > 2. Sometimes I can statically figure out the full tree structure of my > jobs ahead of time, but other times I can't or won't, so I definitely > need a scheduler that lets me submit new sub-jobs on the fly, from any > node in the cluster. > > 3. The jobs are ultimately all submitted by a small group of people > who talk to each other, so I don't really care about any fancy > security, cost accounting, "grid" support, or other such features > aimed at large and/or loosely coupled organizations. > > 4. I really, really want a good API for programmably interacting with > the cluster scheduler and ALL of its features. I don't care too much > what language the API is in as long as it's reasonably sane and I can > readily write glue code to interface it to my language of choice. > I haven't looked at it much, but I think DRMAA will work for that in SGE. > 5. Although I don't currently do any MPI programming, I would very > much like the option to do so in the future, and integrate it smoothly > with the cluster scheduler. I assume pretty much all cluster > schedulers have that, though. (Erlang integration might also be nice.) > SGE does indeed do MPI integration. I doubt it does Erlang integration out of the box but the integration is just a collection of pre- and post-job scripts so you should be able to write it yourself if you have to. > 6. Each of my individual leaf-node jobs will typically take c. 3 to 30 > minutes to complete, so my use shouldn't stress the scheduler's own > performance too much. However, sometimes I screw that up and submit > tons of jobs that each want to run for only a small amount of time, > say 2 minutes or less, so it'd be nice if the scheduler is > sufficiently efficient and low-latency to keep up with that. > SGE's scheduler latency is tunable to a certain degree. As you decrease the maximum latency you increase the load so you might need beefier hardware to accommodate it. > 7. When I submit a job, I should be able to easily (and optionally) > give the scheduler my estimates of how much RAM and cpu time the job > will need. The scheduler should track what resources the job ACTUALLY > uses, and make it easy for me to monitor job status for both running > and completed jobs, and then use that information to improve my > resource estimates for future jobs. (AKA good APIs, yet again.) > SGE can give you this with requestable complexes, although I don't think it'll learn from your estimates. > 8. Of course the scheduler must have a good way to track all the basic > information about my nodes: CPU sockets and cores, RAM, etc. Ideally > it'd also be straightforward for me to extend the database of node > properties as I see fit. Bonus points if it uses a good database > (e.g. SQLite, PostgreSQL) and a reasonable data model for that stuff. > > Thanks in advance for your help and advice! > SGE does this and can make it available as XML. -- -- Skylar Thompson (skylar at cs.earlham.edu) -- http://www.cs.earlham.edu/~skylar/ From atp at piskorski.com Sat May 15 08:44:50 2010 From: atp at piskorski.com (Andrew Piskorski) Date: Sat, 15 May 2010 11:44:50 -0400 Subject: [Beowulf] cluster scheduler for dynamic tree-structured jobs? In-Reply-To: <4BEEB0A4.5080505@cs.earlham.edu> References: <4BEEB0A4.5080505@cs.earlham.edu> Message-ID: <20100515154450.GA71472@piskorski.com> On Sat, May 15, 2010 at 07:33:08AM -0700, Skylar Thompson wrote: > I'm not quite sure I understand what you're doing, but if you make all > your execution hosts submit hosts as well you can submit jobs within > your running jobs. You can use "-now y -sync y" in your jobs to ensure Yes, that's what I did with SGE, that part works fine. SGE's other behaviors often leave much to be desired. E.g., "reschedule_unknown". By default, SGE marks a node as down only when the node's execd daemon comes back *up*! So if the node hits a kernel oops, reboots, and successfully restarts its execd, everything is fine - SGE notices that the machine crashed, and reschedules whatever job was running on it at the time. But if the node just stays down permanently, or worse, if it goes entirely catatonic, SGE *never* considers the node down, and will *never* reschedule the job elsewhere! The job remains in limbo indefinitely until some human intervenes. Of course there is a setting to make SGE behave in a more sane way, it's called "reschedule_unknown". It basically defines a timeout, where if SGE can't get a response from a node within that time, SGE restarts that node's jobs elsewhere. This was all exceedingly non-obvious. I only figured it out by reading Templeton's detailed "FridayTutorial.pdf" slides discussing many practical aspects of SGE, which unfortunately have since vanished from the web: http://www.globusworld.org/documents/FridayTutorial.pdf http://gridengine.sunsource.net/nonav/source/browse/~checkout~/gridengine/doc/htmlman/htmlman5/sge_conf.html?pathrev=V62u2_TAG Unfortunately, even after the reschedule_unknown fix I still see occasional job lockups with SGE, where my master process stalls indefinitely until I manually notice and tell SGE to kill and restart some hung child job. I haven't yet sunk the debugging time into figuring out just what the heck is really going on there. (And it could well be something that's not SGE's fault at all, of course.) That isn't the only snafu I've had with SGE, just one of the more memorable one. I am by no means an SGE expert, nor even a particularly experienced user, but it has mostly struck me as klunky and rather programmer unfriendly. Basically, I ended up using SGE due to historical accident, and my hands-on experience with it has encouraged me to take a step back and evaluate other toolkit options. > > 4. I really, really want a good API for programmably interacting with > > the cluster scheduler and ALL of its features. I don't care too much > I haven't looked at it much, but I think DRMAA will work for that in SGE. Not as far as I could tell from reading the SGE docs a while back, no. It looked as if DRMAA only covers a very limited subset of SGE's functionality, not enough to cover the features I need. I did not (yet) check the source to see how SGE's DRMAA support is implemented, but the docs made it sound as if they were rolling it from scratch rather than simply building on top of some clear pre-existing SGE API. > > 8. Of course the scheduler must have a good way to track all the basic > > information about my nodes: CPU sockets and cores, RAM, etc. Ideally > > it'd also be straightforward for me to extend the database of node > SGE does this and can make it available as XML. Which reminds me, I need to look harder to figure out WHERE exactly SGE stores its node configuration data, and how I can perhaps extend it with additional information, like the network topology between my nodes. This is probably simple but it wasn't obvious from the (voluminous) SGE docs. -- Andrew Piskorski http://www.piskorski.com/ From raysonlogin at gmail.com Sat May 15 09:34:31 2010 From: raysonlogin at gmail.com (Rayson Ho) Date: Sat, 15 May 2010 11:34:31 -0500 Subject: [Beowulf] cluster scheduler for dynamic tree-structured jobs? In-Reply-To: <20100515154450.GA71472@piskorski.com> References: <4BEEB0A4.5080505@cs.earlham.edu> <20100515154450.GA71472@piskorski.com> Message-ID: On Sat, May 15, 2010 at 10:44 AM, Andrew Piskorski wrote: > Yes, that's what I did with SGE, that part works fine. ?SGE's other > behaviors often leave much to be desired. Just because the default settings of SGE do not follow your workflow does not mean that "SGE's other behaviors often leave much to be desired." There are SGE users who do exactly not want SGE to automatically re-run jobs due to unreachable nodes -- for example, a network failure can partition a single SGE cluster into 2 sub-clusters, and thus every job can be run twice if the default is to re-run whenever nodes are not reachable. The SGE mailing "users" list is always responsive (Thanks to Reuti and others who contribute), so anything you don't like or understand in SGE, you should: 1) Google (very important) 2) Check the SGE manpage, HOWTO, admin guide 3) Ask on the list http://gridengine.sunsource.net/maillist.html >> > 4. I really, really want a good API for programmably interacting with >> > the cluster scheduler and ALL of its features. ?I don't care too much > >> I haven't looked at it much, but I think DRMAA will work for that in SGE. DRMAA is for job submission and some job monitoring, and if you want to interact with your scheduler, like changing the scheduling algorithms, then I don't think it can be easily done with anything available in the free/opensource world or commercial market. Rayson > > Not as far as I could tell from reading the SGE docs a while back, no. > It looked as if DRMAA only covers a very limited subset of SGE's > functionality, not enough to cover the features I need. > > I did not (yet) check the source to see how SGE's DRMAA support is > implemented, but the docs made it sound as if they were rolling it > from scratch rather than simply building on top of some clear > pre-existing SGE API. > >> > 8. Of course the scheduler must have a good way to track all the basic >> > information about my nodes: ?CPU sockets and cores, RAM, etc. ?Ideally >> > it'd also be straightforward for me to extend the database of node > >> SGE does this and can make it available as XML. > > Which reminds me, I need to look harder to figure out WHERE exactly > SGE stores its node configuration data, and how I can perhaps extend > it with additional information, like the network topology between my > nodes. ?This is probably simple but it wasn't obvious from the > (voluminous) SGE docs. > > -- > Andrew Piskorski > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From skylar at cs.earlham.edu Sat May 15 10:45:58 2010 From: skylar at cs.earlham.edu (Skylar Thompson) Date: Sat, 15 May 2010 10:45:58 -0700 Subject: [Beowulf] cluster scheduler for dynamic tree-structured jobs? In-Reply-To: <20100515154450.GA71472@piskorski.com> References: <4BEEB0A4.5080505@cs.earlham.edu> <20100515154450.GA71472@piskorski.com> Message-ID: <4BEEDDD6.2080402@cs.earlham.edu> On 05/15/10 08:44, Andrew Piskorski wrote: >> SGE does this and can make it available as XML. >> > Which reminds me, I need to look harder to figure out WHERE exactly > SGE stores its node configuration data, and how I can perhaps extend > it with additional information, like the network topology between my > nodes. This is probably simple but it wasn't obvious from the > (voluminous) SGE docs. > > I think it depends on whether you're using text or BDB as your backend. If you're using text, it'll be in $SGE_ROOT/$SGE_CELL, with node-specific customizations in $SGE_ROOT/$SGE_CELL/local_conf. I'm not sure about BDB though. -- -- Skylar Thompson (skylar at cs.earlham.edu) -- http://www.cs.earlham.edu/~skylar/ From peter.st.john at gmail.com Sat May 15 15:40:36 2010 From: peter.st.john at gmail.com (Peter St. John) Date: Sat, 15 May 2010 18:40:36 -0400 Subject: [Beowulf] cluster scheduler implemented via MPI? In-Reply-To: <20100515130122.GA6088@piskorski.com> References: <20100515130122.GA6088@piskorski.com> Message-ID: I just want to remark that I'm curious about peer-to-peer scheduling, where nodes would negotiate among each other according to priorities, needs, and idle resources. Peter On Sat, May 15, 2010 at 9:01 AM, Andrew Piskorski wrote: > A cluster scheduler (or resource manager) like SGE, SLURM, or Torque > can itself be viewed as a parallel application running on a cluster. > So I'm wondering, have any such schedulers been implemented *as* an > MPI program? (Or PVM or whatever else?) If not, why not? Does the > MPI programming environment not provide a suitable substrate for what > a cluster scheduler needs to do? > > -- > Andrew Piskorski > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.travis at abdn.ac.uk Sun May 16 08:12:27 2010 From: a.travis at abdn.ac.uk (Tony Travis) Date: Sun, 16 May 2010 16:12:27 +0100 Subject: [Beowulf] cluster scheduler implemented via MPI? In-Reply-To: References: <20100515130122.GA6088@piskorski.com> Message-ID: <4BF00B5B.1070907@abdn.ac.uk> On 15/05/10 23:40, Peter St. John wrote: > I just want to remark that I'm curious about peer-to-peer scheduling, > where nodes would negotiate among each other according to priorities, > needs, and idle resources. Hello, Peter. I think SSI (Single System Image) does this to some extent: openMosix, for example, does not have a 'head' node in the sense of a centralised shcheduler and Kerrighed is similar. What both these systems have is a notion of the 'home' node where a job originates. Kerrighed can also use an openMosix-based scheduler. If you're interested, take a look at: http://www.kerrighed.org This is an active project - The openMosix project closed two years ago, and MOSIX2 is a commercial product. Apart from the fact that MOSIX2 is not open source, it looks very good. However, Kerrighed has some other interesting and potentially very useful features such as the ability to aggregate the distributed memory in a way that conventional (non-MPI) programs can access. Kerrighed also uses quite an efficient TIPC-based inter-kernel communication protocol. I'm now using 32-bit Kerrghed, but it's not really stable enough for a production environment, and most of the Kerrighed development work is being done on the 64-bit version. So, I've retired most of our 32-bit kit now and we will soon be running a 64-bit Kerrighed Beowulf :-) Bye, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis at abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From raysonlogin at gmail.com Sun May 16 14:19:49 2010 From: raysonlogin at gmail.com (Rayson Ho) Date: Sun, 16 May 2010 16:19:49 -0500 Subject: [Beowulf] cluster scheduler implemented via MPI? In-Reply-To: <20100515130122.GA6088@piskorski.com> References: <20100515130122.GA6088@piskorski.com> Message-ID: Batch schedulers and message passing libraries/environments are two totally different animals with different design goals. Using MPI for inter-node communications is not going to gain batch schedulers anything extra, but the limitations, for e.g. handling master node failover, dynamic node removal/additional (yes, I know MPI-2 has dynamic tasks), etc, are simply too great to workaround. Rayson On Sat, May 15, 2010 at 8:01 AM, Andrew Piskorski wrote: > A cluster scheduler (or resource manager) like SGE, SLURM, or Torque > can itself be viewed as a parallel application running on a cluster. > So I'm wondering, have any such schedulers been implemented *as* an > MPI program? ?(Or PVM or whatever else?) ?If not, why not? ?Does the > MPI programming environment not provide a suitable substrate for what > a cluster scheduler needs to do? > > -- > Andrew Piskorski > http://www.piskorski.com/ > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From Bill.Rankin at sas.com Mon May 17 10:02:19 2010 From: Bill.Rankin at sas.com (Bill Rankin) Date: Mon, 17 May 2010 13:02:19 -0400 Subject: [Beowulf] cluster scheduler for dynamic tree-structured jobs? In-Reply-To: <20100515102454.GA99295@piskorski.com> References: <20100515102454.GA99295@piskorski.com> Message-ID: Andrew Piskorski wrote: > Folks, I could use some advice on which cluster job scheduler (batch > queuing system) would be most appropriate for my particular needs. > I've looked through docs for SGE, Slurm, etc., but without first-hand > experience with each one it's not at all clear to me which I should > choose... I think that most of the ones out there will do what you need. I am most familiar with PBS Pro (since I used to work for them) and a little SGE. Are you considering commercial offerings or are you restricting yourself to the free ones? > 1. I have lots of embarrassingly parallel tree-structured jobs which I > dynamically generate and submit from top-level user code (which > happens to be written in R). E.g., my user code generates 10 or 100 > or 1000 jobs, and each of those jobs might itself generate N jobs. > Any given job cannot complete until all its children complete. So you generate the job list from the top of the tree, but need to process from the bottom? Under PBS Pro, you would use job dependencies to do this. Have a 'meta-job' that does a recursive descent of the tree and 1) generates a job for each node (initially in a 'held' state), then 2) generates jobs for all the children of the node and makes the job generated in #1 dependent upon the completion of the children, and then 3) recursively do #1 & #2 for all the children. Then release all the jobs from their held state. > Also, multiple users may be submitting unrelated jobs at the same > time, some of their jobs should have higher priority than others, etc. > (The usual reasons for wanting to use a cluster scheduler in the first > place, I think.) Yup. Pretty much part and partial of any of the Workload management offerings. > Thus, merely assigning the individual jobs to compute nodes is not > enough, I need the cluster scheduler to also understand the tree > relationships between the jobs. Right. Job dependencies do this in PBS Pro. > 2. Sometimes I can statically figure out the full tree structure of my > jobs ahead of time, but other times I can't or won't, so I definitely > need a scheduler that lets me submit new sub-jobs on the fly, from any > node in the cluster. Most jobs can have additional dependencies added at a later point in time, as long as the job still exists. Remember that once all the children of a job complete, then the job can be run. You can circumvent this by putting a hold on the job if you know that it has additional children you want to submit at a later point. > 3. The jobs are ultimately all submitted by a small group of people > who talk to each other, so I don't really care about any fancy > security, cost accounting, "grid" support, or other such features > aimed at large and/or loosely coupled organizations. If you can get everyone to cooperate and work with each other, that's usually the best solution. For the times you cannot then quotas, fair-share policies, and job prioritizations are the tools you use. > 4. I really, really want a good API for programmably interacting with > the cluster scheduler and ALL of its features. I don't care too much > what language the API is in as long as it's reasonably sane and I can > readily write glue code to interface it to my language of choice. Most of the commercial offerings have API and various GUI portals available. > 5. Although I don't currently do any MPI programming, I would very > much like the option to do so in the future, and integrate it smoothly > with the cluster scheduler. I assume pretty much all cluster > schedulers have that, though. (Erlang integration might also be nice.) Should be no problem for any of the major offerings. > 6. Each of my individual leaf-node jobs will typically take c. 3 to 30 > minutes to complete, so my use shouldn't stress the scheduler's own > performance too much. However, sometimes I screw that up and submit > tons of jobs that each want to run for only a small amount of time, > say 2 minutes or less, so it'd be nice if the scheduler is > sufficiently efficient and low-latency to keep up with that. That's actually a fairly challenging issue for many job scheduling engines and really depends on the total system/cluster size and configuration. Most should be able to handle it, but I will say that when you start getting down to that length of job, there are a lot of hidden gotchas that come to the surface, like disk I/O (if you are using shared data) and handling job failures (just to name a couple). The bottom line is that short jobs are very inefficient and you should try and avoid them if possible. > 7. When I submit a job, I should be able to easily (and optionally) > give the scheduler my estimates of how much RAM and cpu time the job > will need. The scheduler should track what resources the job ACTUALLY > uses, and make it easy for me to monitor job status for both running > and completed jobs, and then use that information to improve my > resource estimates for future jobs. (AKA good APIs, yet again.) Most of that is available through the job submission command line. > 8. Of course the scheduler must have a good way to track all the basic > information about my nodes: CPU sockets and cores, RAM, etc. Ideally > it'd also be straightforward for me to extend the database of node > properties as I see fit. Bonus points if it uses a good database > (e.g. SQLite, PostgreSQL) and a reasonable data model for that stuff. Again, management of node resources is part of pretty much all the offerings. Using a "true" database for this configuration data is usually not done (in my experience) mainly because it's pretty much overkill and has its own set of scaling limitations (in a 10000+ node cluster, you do not want all the node repeatedly accessing a single database for their configuration status). Good luck, -bill From rpnabar at gmail.com Tue May 18 08:21:01 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 18 May 2010 10:21:01 -0500 Subject: [Beowulf] strange ulimit+ssh problem: ulimit unlimited only works in certain cases Message-ID: I am trying to get my "ulimit -l" set to unlimited, for normal users, and not sure what's going wrong. I get the correct ulimit under only once particular scenario: If I become root and then su to a normal user. Otherwise it doesn't seem to work. The symptoms are: rpnabar at eu001>ssh eu002 ulimit -l 32 [rpnabar at eu001 root]$ ssh eu002 [rpnabar at eu002 ~]$ ulimit -l 32 [root at eu001 ~]# ulimit -l unlimited [root at eu001 ~]# su rpnabar [rpnabar at eu001 root]$ ulimit -l unlimited eu001 and eu002 are identical compute nodes running CentOS. Here's what all I've already tried: [rpnabar at eu001 root] cat /etc/security/limits.conf [snip] * hard memlock unlimited * soft memlock unlimited # End of file [rpnabar at eu001 root]$ cat /etc/ssh/sshd_config [snip] UsePrivilegeSeparation no [snip] [rpnabar at eu001 root]$ service sshd restart The last suggestion was on the basis of a RHEL knowledgebase article. Any other things that I can check for this? I'm stumped. -- Rahul From hahn at mcmaster.ca Tue May 18 09:19:14 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 18 May 2010 12:19:14 -0400 (EDT) Subject: [Beowulf] strange ulimit+ssh problem: ulimit unlimited only works in certain cases In-Reply-To: References: Message-ID: > [rpnabar at eu001 root]$ service sshd restart was the shell that executed this already unlimited? I've often found unexpected limits based on spawning daemons (sshd, scheduler) that inherited the limit when they were started... also, modern kernels have a proc/$pid/limits file which may be useful in debugging. From gmpc at sanger.ac.uk Tue May 18 09:23:45 2010 From: gmpc at sanger.ac.uk (Guy Coates) Date: Tue, 18 May 2010 17:23:45 +0100 Subject: [Beowulf] strange ulimit+ssh problem: ulimit unlimited only works in certain cases In-Reply-To: References: Message-ID: <4BF2BF11.4040200@sanger.ac.uk> On 18/05/10 16:21, Rahul Nabar wrote: > I am trying to get my "ulimit -l" set to unlimited, for normal users, > and not sure what's going wrong. I get the correct ulimit under only > once particular scenario: If I become root and then su to a normal > user. Otherwise it doesn't seem to work. The symptoms are: Check /etc/pam.d/su and /etc/pam.d/ssh; you may need add the following line to get ulimits to be set at login: session required pam_limits.so Cheers, Guy -- Dr. Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 x 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From rpnabar at gmail.com Tue May 18 09:34:06 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 18 May 2010 11:34:06 -0500 Subject: [Beowulf] strange ulimit+ssh problem: ulimit unlimited only works in certain cases In-Reply-To: <4BF2BF11.4040200@sanger.ac.uk> References: <4BF2BF11.4040200@sanger.ac.uk> Message-ID: On Tue, May 18, 2010 at 11:23 AM, Guy Coates wrote: > On 18/05/10 16:21, Rahul Nabar wrote: >> I am trying to get my "ulimit -l" set to unlimited, for normal users, >> and not sure what's going wrong. I get the correct ulimit under only >> once particular scenario: If I become root and then su to a normal >> user. Otherwise it doesn't seem to work. The symptoms are: > > Check /etc/pam.d/su and /etc/pam.d/ssh; you may need add the following > line to get ulimits to be set at login: > > > session ? ?required ? ? pam_limits.so Tried adding this line in both the files Guy mentioned and restarted sshd. Didn't work. In any case my sshd_config says: UsePAM no So not sure if PAM is getting used. -- Rahul From rpnabar at gmail.com Tue May 18 09:38:40 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 18 May 2010 11:38:40 -0500 Subject: [Beowulf] strange ulimit+ssh problem: ulimit unlimited only works in certain cases In-Reply-To: References: Message-ID: On Tue, May 18, 2010 at 11:19 AM, Mark Hahn wrote: >> [rpnabar at eu001 root]$ service sshd restart > > was the shell that executed this already unlimited? ?I've often found > unexpected limits based on spawning daemons (sshd, scheduler) > that inherited the limit when they were started... You are right Mark! When the spawning shell was a ulimit unlimited shell only then does it work. I first had to do a "su -". Never figured that'd be the case. Thanks! I've got to see now how I fix this at reboot time when sshd automatically starts up. Not sure what that spawning shell is at startup time. - Rahul From deadline at eadline.org Tue May 18 10:23:37 2010 From: deadline at eadline.org (Douglas Eadline) Date: Tue, 18 May 2010 13:23:37 -0400 (EDT) Subject: [Beowulf] 10 GigE webinar on Thursday In-Reply-To: References: Message-ID: <60213.192.168.1.213.1274203417.squirrel@mail.eadline.org> If you are interested in learning about some of the new features in 10GigE, you may like to attend a webinar I'm hosting on Thursday (1PM Eastern) HPC Clusters in 2010: Moving Forward with Ethernet http://www.linux-mag.com/id/7774 It is sponsored by IBM and Force10 Note: this is not a IB vs. Ethernet webinar. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From hahn at mcmaster.ca Tue May 18 10:57:49 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 18 May 2010 13:57:49 -0400 (EDT) Subject: [Beowulf] strange ulimit+ssh problem: ulimit unlimited only works in certain cases In-Reply-To: References: Message-ID: > I've got to see now how I fix this at reboot time when sshd > automatically starts up. Not sure what that spawning shell is at > startup time. the shell is, ultimately, started by init. if none of the rc scripts contain a ulimit, you should be fine. another option is to ulimit unlimited _in_ the sshd rc script... From dnlombar at ichips.intel.com Wed May 19 11:14:22 2010 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Wed, 19 May 2010 11:14:22 -0700 Subject: [Beowulf] strange ulimit+ssh problem: ulimit unlimited only works in certain cases In-Reply-To: References: Message-ID: <20100519181422.GA22677@nlxcldnl2.cl.intel.com> On Tue, May 18, 2010 at 08:21:01AM -0700, Rahul Nabar wrote: > I am trying to get my "ulimit -l" set to unlimited, for normal users, > and not sure what's going wrong. I get the correct ulimit under only > once particular scenario: If I become root and then su to a normal > user. Otherwise it doesn't seem to work. Did you look at PAM limits and /etc/security/limits.conf? -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From i.n.kozin at googlemail.com Fri May 21 08:15:06 2010 From: i.n.kozin at googlemail.com (Igor Kozin) Date: Fri, 21 May 2010 16:15:06 +0100 Subject: [Beowulf] bandwidth to GPU Message-ID: Hello everyone, I'm quite curios about the bandwidth to GPUs people are getting especially with NVIDIA C1060 or Fermi on Intel hosts with two 5520 chipsets. Using bandwidthTest from CUDA SDK and averaging the results over all cores and GPUs (we have S1070) I'm getting with memory=pageable 3672 MB/s host to device and 3023 MB/s device to host. With memory=pinned the numbers increase to 5499 MB/s and 5291 MB/s respectively which look okay too me. On a two chipset host 1) there is obviously asymmetry resulting in low and high numbers depending on affinity and, worryingly, 2) pinned bandwidth is a bit too low. memory=pageable host to device: 3702/3716 device to host: 2880/1807 memory=pinned host to device: 5751/4709 device to host: 3264/1873 If you happen to have numbers for ATI GPUs and/or AMD based hosts please post them too. Thanks, Igor -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpnabar at gmail.com Mon May 24 13:21:07 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Mon, 24 May 2010 15:21:07 -0500 Subject: [Beowulf] Bugfix for Broadcom NICs losing connectivity: Dell R410-610-710 affected Message-ID: In case it helps anyone using Dell R410 / 610 / 710 etc. servers: I have had machines lose their eth connections periodically (CentOS 5.4 bnx2 driver). Seems like a bug with the Broadcom NIC drivers. [luckily read of it on a Dell mailing list] Bug Reports: http://kbase.redhat.com/faq/docs/DOC-26837 http://patchwork.ozlabs.org/patch/51106 Not sure yet if this is exactly my issue but I'm giving it a shot now. Thought I'd post since, anecdotally I've seen many people use these servers on the list. -- Rahul -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpnabar at gmail.com Thu May 27 15:25:56 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu, 27 May 2010 17:25:56 -0500 Subject: [Beowulf] Is it necessary to increase size of the ARP cache? Message-ID: I have seen a few recommendations which recommend increasing the ARP cache size for networks where more than 512 hosts are on the same IP subnet. e.g. here: https://wiki.fysik.dtu.dk/niflheim/System_administration#kernel-arp-cache But is there a way of knowing from logs etc. if this fix is indeed needed for my network? I have two different subnets 192.168 and 10.0 each with approximately 300 servers on them (each server has a low and high speed eth card and hence the twin subnets) What is the effect of a suboptimal ARP cache size? Does that affect the latency of messages too? -- Rahul From lindahl at pbm.com Thu May 27 15:52:04 2010 From: lindahl at pbm.com (Greg Lindahl) Date: Thu, 27 May 2010 15:52:04 -0700 Subject: [Beowulf] Is it necessary to increase size of the ARP cache? In-Reply-To: References: Message-ID: <20100527225204.GB10366@bx9.net> On Thu, May 27, 2010 at 05:25:56PM -0500, Rahul Nabar wrote: > But is there a way of knowing from logs etc. if this fix is indeed needed > for my network? If you tcpdump and see a crapload of arp broadcasts, you know. > What is the effect of a suboptimal ARP cache size? Does that affect > the latency of messages too? It hurts latency because an arp reply has to come back before you can send your packet. -- greg From rpnabar at gmail.com Fri May 28 12:23:03 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 28 May 2010 14:23:03 -0500 Subject: [Beowulf] Is it necessary to increase size of the ARP cache? In-Reply-To: <20100527225204.GB10366@bx9.net> References: <20100527225204.GB10366@bx9.net> Message-ID: On Thu, May 27, 2010 at 5:52 PM, Greg Lindahl wrote: > On Thu, May 27, 2010 at 05:25:56PM -0500, Rahul Nabar wrote: > >> But is there a way of knowing from logs etc. if this fix is indeed needed >> for my network? > > If you tcpdump and see a crapload of arp broadcasts, you know. > Thanks Greg! -- Rahul From bart at attglobal.net Thu May 27 21:06:58 2010 From: bart at attglobal.net (Bart Jennings) Date: Thu, 27 May 2010 21:06:58 -0700 Subject: [Beowulf] Is it necessary to increase size of the ARP cache? In-Reply-To: <20100527225204.GB10366@bx9.net> References: <20100527225204.GB10366@bx9.net> Message-ID: <4BFF4162.2040202@attglobal.net> You could always write a script to capture the arp caches of some of your machines over time. If your caches are constantly full, then increasing might be warranted. You also might want to consider looking at your arp cache timeout value since all that new space won't matter if it's gone before you need to use it. On 5/27/2010 3:52 PM, Greg Lindahl wrote: > On Thu, May 27, 2010 at 05:25:56PM -0500, Rahul Nabar wrote: > > >> But is there a way of knowing from logs etc. if this fix is indeed needed >> for my network? >> > If you tcpdump and see a crapload of arp broadcasts, you know. > > >> What is the effect of a suboptimal ARP cache size? Does that affect >> the latency of messages too? >> > It hurts latency because an arp reply has to come back before you can > send your packet. > > -- greg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > From crhea at mayo.edu Tue May 25 12:40:56 2010 From: crhea at mayo.edu (Cris Rhea) Date: Tue, 25 May 2010 14:40:56 -0500 Subject: [Beowulf] Re: Bugfix for Broadcom NICs losing connectivity In-Reply-To: <201005251900.o4PJ0ElP016422@bluewest.scyld.com> References: <201005251900.o4PJ0ElP016422@bluewest.scyld.com> Message-ID: <20100525194056.GB16022@kaizen.mayo.edu> > In case it helps anyone using Dell R410 / 610 / 710 etc. servers: I have had > machines lose their eth connections periodically (CentOS 5.4 bnx2 driver). > Seems like a bug with the Broadcom NIC drivers. [luckily read of it on a > Dell mailing list] > > Bug Reports: > > http://kbase.redhat.com/faq/docs/DOC-26837 > http://patchwork.ozlabs.org/patch/51106 > > Not sure yet if this is exactly my issue but I'm giving it a shot now. > Thought I'd post since, anecdotally I've seen many people use these servers > on the list. > > -- > Rahul I've been following this on the Dell list as I have approx. 50 R410s in our cluster. One thing that isn't clear-- When this happens, do you lose all connectivity to the node (i.e., do you have to reboot the node to re-establish eth0)? My R410s are running CentOS 5.2 - 5.4 and I rarely have one go down. --- Cris -- Cristopher J. Rhea Mayo Clinic - Research Computing Facility 200 First St SW, Rochester, MN 55905 crhea at Mayo.EDU (507) 284-0587 From rreis at aero.ist.utl.pt Wed May 26 07:07:25 2010 From: rreis at aero.ist.utl.pt (Ricardo Reis) Date: Wed, 26 May 2010 15:07:25 +0100 (WEST) Subject: [Beowulf] recommendations for parallel IO Message-ID: Hi all We have a small cluster but some users need to use MPI-IO. We have a NFS3 shared partition but you would need to mount it with special options who would hurt performance. We are looking into a nice parallel file system to deploy in this context. We got 4 boxes with a 500Gb disk in each, for the moment, connected with Gb. We have another Gb connection dedicated to the MPI traffic. We need an open source solution, we are looking into PVFS and Gluster (but from what we see, Gluster doesn't quit fit the bill? It's more a distributed filesystem than a parallel filesystem... or are we taking the wrong turn on our reasoning, somewhere about this?) Anyway your thoughts and experience would be very welcomed so we can make a better decision. many thanks, Ricardo Reis 'Non Serviam' PhD candidate @ Lasef Computational Fluid Dynamics, High Performance Computing, Turbulence http://www.lasef.ist.utl.pt Cultural Instigator @ R?dio Zero http://www.radiozero.pt Keep them Flying! Ajude a/help Aero F?nix! http://www.aeronauta.com/aero.fenix http://www.flickr.com/photos/rreis/ < sent with alpine 2.00 > From rigved.sharma123 at gmail.com Mon May 31 11:57:23 2010 From: rigved.sharma123 at gmail.com (rigved sharma) Date: Tue, 1 Jun 2010 00:27:23 +0530 Subject: [Beowulf] tracejob error Message-ID: tracejob -n 6 123 /opt/PBS/server_priv/accounting/20100601: No such file or directory /opt/PBS/server_logs/20100601: No matching job records located /opt/PBS/mom_logs/20100601: No such file or directory /opt/PBS/sched_logs/20100601: No such file or directory *** glibc detected *** tracejob: malloc(): memory corruption: 0x0000000019919170 *** ======= Backtrace: ========= /lib64/libc.so.6[0x3845871cd1] /lib64/libc.so.6(__libc_malloc+0x7d)[0x3845872e8d] /lib64/libc.so.6(popen+0x23)[0x3845862a63] tracejob[0x401218] tracejob[0x401bcf] /lib64/libc.so.6(__libc_start_main+0xf4)[0x384581d8b4] tracejob[0x400e09] ======= Memory map: ======== -------------- next part -------------- An HTML attachment was scrubbed... URL: