Fwd: Re: [Beowulf] Why is communication so expensive for very small messages?

Jonathan Boyle j.boyle at manchester.ac.uk
Tue May 1 07:32:26 PDT 2007


Thanks, we're using 1.2.6, so we'll have to look into upgrading.


----------  Forwarded Message  ----------

Subject: Re: [Beowulf] Why is communication so expensive for very small  
messages?
Date: Tue, 24 Apr 2007 18:03:51 -0600
From: "Michael H. Frese" <Michael.Frese at NumerEx.com>
To: Jonathan Boyle <j.boyle at manchester.ac.uk>
Cc: beowulf at beowul.org

Sorry, the most recent version of mpich1 is 1.2.7.  The older version that
was doing the message aggregation was 1.2.1.

>You don't say which version of mpich you are using, but we found small
>messages taking 1 ms last fall.  Upgrading from an old version of mpich1
>(ca. 2001) to the most recent version (ca. 2005, 1.2.27?) fixed the
>problem.  The problem was probably one of the OS holding the teensy little
>messages hoping for more data to send it with -- message aggregation, I
>suppose it is called.  The newer version of mpich must have set the OS
>flags properly to prevent that.
>
>I can't tell you about mpich2, as we have no experience with that yet.

Mike Frese

At 09:54 AM 4/24/2007, you wrote:
>I apologise if this is a naive question, but I'm new to this world of
>beowulfs.
>
>I'm using C++/mpi, to get a feel for communication costs I ran tests using
>mpptest and my own programs.
>
>For 2 processor blocking calls, mpptest indicates a latency of about 30
>microseconds.
>
>However when I measure communication times in my own program using a loop as
>follows....
>
>MPI_Barrier(MPI_COMM_WORLD);
>start = MPI_Wtime();
>for (unsigned t=1; t<=5000; t++)
>{
>  if (my_rank==0)
>  {
>   MPI_Send(data, size, MPI_INT, 1, tag, MPI_COMM_WORLD);
>  }
>  else
>  {
>   MPI_Recv(data, size, MPI_INT, 0, tag, MPI_COMM_WORLD, &status);
>  }
>}
>end = MPI_Wtime();
>
>for size>=4, I get a latency of about 30 microseconds as expected, however
>for
>size<4, communication costs increase massively, and latency now appears to
> be 1ms!
>
>Firstly, I assume this isn't normal?
>
>Secondly, can anyone suggest what's going on, or where I can go for more
>information.
>
>Many thanks.
>
>We're using mpich.
>
>Processors are Intel(R) Xeon(TM) CPU 3.60GHz.
>
>Interconnects are Dell PowerConnect 5324 24-port gigabit switches.
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf

-------------------------------------------------------





More information about the Beowulf mailing list