beowulf performance with MPI
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
dek_ml at konerding.com dek_ml at konerding.comSat Jun 24 14:21:48 PDT 2000
- Previous message: Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
- Next message: beowulf performance with MPI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Tony Skjellum writes: >You may find our free MPI - MPI/Pro for TCP+SMP for Linux - interesting. > >Anthony Skjellum, PhD, President (tony at mpi-softtech.com) >MPI Software Technology, Inc., Ste. 33, 101 S. Lafayette, Starkville, MS 39759 >+1-(662)320-4300 x15; FAX: +1-(662)320-4301; http://www.mpi-softtech.com I should mention that I downloaded this software and I found that it worked great. I was getting crappy scaling with my software of interest (AMBER6, see http://www.amber.ucsf.edu). My cluster is 6 dual P-III 600MHz w/ 256MB RAM, one of which is the master and 5 of which are compute servers. Interconnect is simply 100BT (eepro, 2.2.16) with a 100BT 8-port switch connecting them. The switch was only $150, nothing impressive. AMBER6 is compiled using either LAM or MPICH, the latest respective versions. AMBER was only going 4 times faster than 1 CPU using all 10 CPUs of the system. I was pretty much all but convinced that I needed to scale up the interconnect to giga-net or myrinet, at very high relative cost. However, I downloaded the MPI/Pro for TCP+SMP for Linux, and gave it a try with AMBER. The scaling is remarkably better! In particular, here are the numbers for the performance: SIMULATION SYSTEM: DHFR in water, 23558 atoms SIMULATION PARAMETERS: PME, 62.2x62.2x62.2 box, 1000 steps COMPUTER SYSTEM: 6 dual Pentium-III 600MHz (100MHz bus) running Red Hat 6.2 connected by 100BT switch. Total cost $15,000 at time of purchase, early 2000. Each machine approx $2460 + one 27GB hard drive ($250) + one 100BT switch ($149) AMBER COMPILATION: g77 (egcs-1.1.2), flags: -O3 -m486 -malign-double -ffast-math -fno-strength-reduce CPUs Time (sec) Speedup over 1 g77 CPU 1 5539 1.00 8 1429 3.79 (mpich) 10 1358 3.99 (mpich) 8 794 6.97 (mpipro) 10 692 8.00 (mpipro) "Time" is wallclock time spent actually calculating the simulation, not any setup or I/O time. I compared the profiling of the two simulations and it appears that much of the time savings came from a significantly faster MPI_ALLGATHERV, which AMBER uses to distribute out the new particle positions and velocities at the end of each timestep. The allgather occurs in a serialized section of the code, and therefore scaling is highly dependent on the performance of the implementation. I have spoken with MPI/Pro to find out a little bit more. Actually there is no specific SMP optimization of yet, in fact, communication will go through the localhost network code. However, the design is multi- threaded and doesn't poll the way MPICH does. I suspect also more effort has gone into optimizing some of the MPI routines which are implemented in MPICH with fairly naive code. Overall the program was quite easy to work with. After downloading the RPM and installing it on the master, I just created a file called "/etc/machines" listing all the client nodes by their hostnames, then recompiled my app with the MPI/Pro provided "mpicc" and "mpif77" scripts, and ran the app with the provided "mpirun" script. The syntax is very similar to MPICH, and it integrates straightforwardly with our queueing system, PBS, through the use of the PBS_NODEFILE enviroment variable. Dave
- Previous message: Beowulfs can compete with Supercomputers [was Beowulf: A theorical approach]
- Next message: beowulf performance with MPI
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
