[Beowulf] wall clock time for mpi_allreduce?

Sun Sep 12 05:52:53 PDT 2010

On 09/10/2010 10:46 PM, xingqiu yuan wrote:
> Hi
>
> I found that use of mpi_allreduce to calculate the global maximum and
> minimum takes very long time, any better alternatives to calculate the
> global maximum/minimum values?

There are several variations on this theme you can try, and some might 
work better than others.  All will be more verbose than the allreduce

repeated vector reductions.

1) Take M-vectors of length N so your vector you are reducing (index as 
1:N in F90, or 0:N-1 in C/C++) and do a maximum and minimum reduction.

2) Take vectors of length 2, and use pair reductions.  Every iteration 
you have 1/2 of the previous generation.  Would require something on the 
order of log_2(Vector_length) iterations.

This said, while allreduce is a collective and something of a 
heavyweight operation, you might be dealing with slowness due to 
something else.  I'd suggest some careful measurements of the time 
between some timing calipers to help you determine where things are 
spending time.  Allreduce and other collectives do require 
synchronization, so if something is delaying the synchronization, then 
it will appear slower.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/jackrabbit
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615