[Beowulf] Parallel Programming Question

Tue Jul 17 04:39:43 PDT 2007

Hi James:

James Roberson wrote:
> Hello,
> 
> I'm new to parallel programming and MPI.  I've developed a simulator in 
> C++ for which I would
> like to decrease the running time by using a Beowulf cluster.  I'm not 
> interested in optimizing
> speed, I'm just looking for a quick and easy way to significantly 
> improve the speed over running
> the program on a single machine.

I am not sure I grasp "I'm not interested in optimizing speed, I'm just 
looking for a quick and easy way to significantly improve the speed".

Basically I think you are saying you want to run it in parallel, in 
order to speed up the program.  Parallelization is an optimization.

> Basically, I need to learn how to parallelize a C++ function.  The 
> following functions in particular
> take the longest to run in my simulator.  The first implements LU 
> decomposition on a large matrix and
> the second implements the backsubstitution method to solve matrix division.

First, how large are your matrices?  Second, how many times is this 
called?

> 
> void NR::ludcmp(Mat_IO_DP &a, Vec_O_INT &indx, DP &d)
> {
>     const DP TINY=1.0e-20;
>     int i,imax,j,k;
>     DP big,dum,sum,temp;
> 
>     int n=a.nrows();
>     Vec_DP vv(n);
>     d=1.0;
>     for (i=0;i<n;i++) {
>         big=0.0;
>         for (j=0;j<n;j++)
>             if ((temp=fabs(a[i][j])) > big) big=temp;

(warning: coffee impaired, may not be correct in all aspects, notation, 
typing, rambling ....)

abs is a slow function.  Think about mathematical equivalents.  abs(x) 
maps x into the [0,) range.  Multiplication is much faster than abs.  If 
your (x) is small enough,

	temp=fabs(a[i][j]);
	if ( (temp*temp) > (big*big) ) big = temp;

might be a bit faster.

>         if (big == 0.0) nrerror("Singular matrix in routine ludcmp");
>         vv[i]=1.0/big;

Second, you might want to define delta = some_small_number  and compare 
big to that.  You would get a catastrophic loss of precision if you 
divided by a very small number, and amplify error in the process. 
Renormalization/scaling can be your friend, as well as row manipulation.

That said, I would suggest avoiding re-inventing wheels.  There are a 
number of excellent LU-decomposition and back substitution codes out 
there ...

> 
> (The functions are borrowed from the library provided by "Numerical 
> Recipes in C++")
> I'm currently calling these functions from the main loop with the lines:

ah.... that explains it...

> 
> NR::ludcmp(c,indx,d);
> 
> and
> 
> NR::lubksb(c,indx,xv);
> 
> where the variable 'c' is a large matrix (holding image pixel values) 
> and 'xv' is a vector
> used in backsubstitution. 
> All of the variables passed into these functions are of types defined in 
> " nr.h"
> 'c' is a Mat_DP  (double-precision matrix)
> 'indx' is a Vec_INT  (integer vector)
> 'd' is a DP    (double precision)
> 'xv' is a Vec_DP  (double precision vector)
> 
> Is there a simple way to call these functions which will cause the 
> cluster to distribute the
> load of operations?  Currently, when I run the program with a 50x50 
> array on a single machine,
> it takes about 5 minutes to process a single iteration through the 
> matrix division.

Huh?  What machine?  What compiler?  What options?

(more lack of coffee induced stupor here)

This should take well under 1 second.  O(N**3) operations (50*50*50 = 
0.125E+6 operations) on a 1 GFLOP machine ( 1E+9 FLOPS) should be 
achievable in well under 1 second (ball park of 0.125/1000 seconds times 
a constant)

Maybe I am missing something.  This *should* be fast.  Unless you are 
running on a Palm pilot or something like that ...

In octave, using an interpreted LU decomposition on a hilbert matrix

	h=hilb(50);
	a=time ; [l,u,p] = lu(h); b=time; b-a
	ans = 0.00063705

this is ball-park what I had estimated, within a constant factor 
(cough).  This is an Opteron 242 with 4 GB ram, so it isn't a speed demon...

-- 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615