mpich and scheduling issues
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Alexis Zubrow azubrow at galton.uchicago.eduTue Oct 29 11:24:16 PST 2002
- Previous message: Small cluster
- Next message: Bizarre problems when adding a PPC machine...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all. I was hoping to get some advice on different scheduling scenarios for dealing with multiple users on small cluster. I have done some reading of howto's and searching the web, but I wanted to poll your collective experience on how various schemes work in the real world. I know that this is very hardware and application specific, so I included a small summary of our system: hardware- 4 nodes (dual Athlon MP 1800+) gigabit connection may expand to as many as 8 nodes Number of users: presently around 8 active users around 30 infrequent to very infrequent users Applications- majority are serial applications alot of R, splus, matlab calculations aperiodically some intense parallel programs (e.g. MM5, parallel version of CMAQ, possibly some parallel development) present scheduling scheme- self-monitoring (i.e. politeness) Some ideas I had: 1.) Make the head node non-scheduled. Put a scheduling system (openPBS, Grid Engine, etc) on nodes2 through N. People can do their debugging and compiling on the head node, and they can run their more intense processes through the scheduler. 2.) Mosix on all nodes. For serial users they will not have to worry about anything. The openMosix documentation suggests that mpi and mosix work great together. But, I'm a bit concerned about what happens when you mix mpi and serial processes. For example, MM5 tries to run at 100% of the available cpu. So if you have a series of serial processes being distributed across all of your nodes, what happens when you try and run a MM5 job across 3 nodes (i.e. 6 processors)? Will all the other processes get shifted to 1 node, and the MM5 job will jug along normally? Will some of the processes get left behind, meaning that the whole program will wait for the slowest node? 3.) Some combination of #1 and #2. For example, if we expanded to 8 nodes, we could have mosix running on the first 2 nodes and a scheduler running on the back 6 nodes. Basically, I'm trying to balance ease of use for my users who want to primarily use serial applications, with a few users who need more intense parallel applications. For many of the users, I want to keep it as easy as possible, so the inexperienced user can use the system w/o much assistance. But, I want to have the control to efficiently run big mpich jobs. Any suggestions would be much appreciated. Thanks, Alexis
- Previous message: Small cluster
- Next message: Bizarre problems when adding a PPC machine...
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
