Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Parallel Programming Question

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

James Roberson jamesisnotaturtle at gmail.com
Mon Jul 16 12:35:39 PDT 2007


Hello,

I'm new to parallel programming and MPI.  I've developed a simulator in C++
for which I would
like to decrease the running time by using a Beowulf cluster.  I'm not
interested in optimizing
speed, I'm just looking for a quick and easy way to significantly improve
the speed over running
the program on a single machine.
Basically, I need to learn how to parallelize a C++ function.  The following
functions in particular
take the longest to run in my simulator.  The first implements LU
decomposition on a large matrix and
the second implements the backsubstitution method to solve matrix division.

void NR::ludcmp(Mat_IO_DP &a, Vec_O_INT &indx, DP &d)
{
    const DP TINY=1.0e-20;
    int i,imax,j,k;
    DP big,dum,sum,temp;

    int n=a.nrows();
    Vec_DP vv(n);
    d=1.0;
    for (i=0;i<n;i++) {
        big=0.0;
        for (j=0;j<n;j++)
            if ((temp=fabs(a[i][j])) > big) big=temp;
        if (big == 0.0) nrerror("Singular matrix in routine ludcmp");
        vv[i]=1.0/big;
    }
    for (j=0;j<n;j++) {
        for (i=0;i<j;i++) {
            sum=a[i][j];
            for (k=0;k<i;k++) sum -= a[i][k]*a[k][j];
            a[i][j]=sum;
        }
        big=0.0;
        for (i=j;i<n;i++) {
            sum=a[i][j];
            for (k=0;k<j;k++) sum -= a[i][k]*a[k][j];
            a[i][j]=sum;
            if ((dum=vv[i]*fabs(sum)) >= big) {
                big=dum;
                imax=i;
            }
        }
        if (j != imax) {
            for (k=0;k<n;k++) {
                dum=a[imax][k];
                a[imax][k]=a[j][k];
                a[j][k]=dum;
            }
            d = -d;
            vv[imax]=vv[j];
        }
        indx[j]=imax;
        if (a[j][j] == 0.0) a[j][j]=TINY;
        if (j != n-1) {
            dum=1.0/(a[j][j]);
            for (i=j+1;i<n;i++) a[i][j] *= dum;
        }
    }
}



and...



void NR::lubksb(Mat_I_DP &a, Vec_I_INT &indx, Vec_IO_DP &b)
{
    int i,ii=0,ip,j;
    DP sum;

    int n=a.nrows();
    for (i=0;i<n;i++) {
        ip=indx[i];
        sum=b[ip];
        b[ip]=b[i];
        if (ii != 0)
            for (j=ii-1;j<i;j++) sum -= a[i][j]*b[j];
        else if (sum != 0.0)
            ii=i+1;
        b[i]=sum;
    }
    for (i=n-1;i>=0;i--) {
        sum=b[i];
        for (j=i+1;j<n;j++) sum -= a[i][j]*b[j];
        b[i]=sum/a[i][i];
    }
}

(The functions are borrowed from the library provided by "Numerical Recipes
in C++")
I'm currently calling these functions from the main loop with the lines:

NR::ludcmp(c,indx,d);

and

NR::lubksb(c,indx,xv);

where the variable 'c' is a large matrix (holding image pixel values) and
'xv' is a vector
used in backsubstitution.
All of the variables passed into these functions are of types defined in "
nr.h"
'c' is a Mat_DP  (double-precision matrix)
'indx' is a Vec_INT  (integer vector)
'd' is a DP    (double precision)
'xv' is a Vec_DP  (double precision vector)

Is there a simple way to call these functions which will cause the cluster
to distribute the
load of operations?  Currently, when I run the program with a 50x50 array on
a single machine,
it takes about 5 minutes to process a single iteration through the matrix
division.

Any help would be greatly appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.scyld.com/pipermail/beowulf/attachments/20070716/49875fff/attachment.html


More information about the Beowulf mailing list