Performance evaluation

Keith Murphy keith.murphy at
Sat Sep 21 12:32:56 PDT 2002

I agree with Brian, MM5 requires a low latency interconnect to get the best
out of the cluster and achieve the scalabilty and performance you want.
Scali our software partner has done a great deal of testing with MM5 and our
low latency interconnect and has seen good numbers.

MM5 MPP uses MPI as the transport layer between nodes. Scali's MPI (ScaMPI)
must be installed before compiling. The compilation process includes the
necessary ScaMPI files and libraries.
The MM5 configure file is set up to compile the MM5 package with the
Portland group set of compilers. The Gnu compilers cannot be used as the MM5
uses integer pointers (aka Cray pointers) and byteswap IO in its source
code, forcing the use of a compiler that supports this.

Multi (dual) processor nodes
In addition to MPI, the MM5 MPP package can be compiled with support for
OpenMP. With a cluster with multi (dual) processor nodes it is recommended
to use both OpenMP and MPI; the processors use OpenMP locally in each node,
whereas the nodes communicate with MPI. This reduces the memory requirements
in the nodes and the demand on the interconnect.

MM5 configuration
The performance of a MM5 model depends on the input data. The processor grid
in configure.user (PROCMIN_NS, PROCMIN_EW) should be set to a number that
match the number of nodes. Performance varies with processor grid, but if
you want to run on any number of nodes a 1x1 grid is fine. However, some
performance is lost compared to a 2x8 processor grid for a 16 node system.
Therefore, experimentation may be needed to get highest possible

Keith Murphy
Dolphin Interconnect
T: 818-597-2114
F: 818-597-2119
C: 818-292-5100

----- Original Message -----
From: "Brian Haymore" <brian at>
To: "Paul English" <tallpaul at>
Cc: <beowulf at>
Sent: Friday, September 20, 2002 3:39 PM
Subject: Re: Performance evaluation

> We run mm5 jobs here at the Univ of Utah.  In our initial tests we found
> mm5 to need a lower latency network then ethernet seems to offer.  We
> currently run over Giganet's VIA hardware at a 2x speedup over our
> initial runs over ethernet.  We also tested Myrinet, though an older rev
> then the 2000 product, and found similar speedups.  I can get you more
> detailed info if you need it, let me know.
> On Fri, 2002-09-20 at 15:02, Paul English wrote:
> >
> > We are running MM5 on a 12 node, 24 processor cluster. It does not seem
> > be running at full capacity and I'd like to figure out where we can get
> > the most bang/buck on performance improvements. The network is currently
> > only single 10/100, although the motherboards (Tyan S2462 w/ dual
> > 3c920) have a second 10/100 interface.
> >
> > I need to look at:
> > How saturated is the network?
> > Do I need to double (2nd 10/100) or multiply by 10 (gigabit) the
> > Is the problem latency based?
> >
> >
> > Thanks,
> > Paul
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at
> > To change your subscription (digest mode or unsubscribe) visit
> --
> Brian D. Haymore
> University of Utah
> Center for High Performance Computing
> 155 South 1452 East RM 405
> Salt Lake City, Ut 84112-0190
> Phone: (801) 585-1755, Fax: (801) 585-5366
> _______________________________________________
> Beowulf mailing list, Beowulf at
> To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list