[Beowulf] large array to run

Joe Landman landman at scalableinformatics.com
Thu Dec 13 18:01:27 PST 2007

Peter Skomoroch wrote:
> This reminds me of a similar issue I had.  What approaches do you take for
> large dense matrix multiplication in MPI, when the matrices are too large to
> fit into cluster memory?  If I hack up something to cache intermediate

Hi Peter:

> results to disk, the IO seems to drag everything to a halt and I'm looking
> for a better solution.  I'd like to use some libraries like PETSc, but how

   Disk memory has a latency of 10^-3 seconds or so, and a bandwidth of 
from 10^7 to 10^8 bytes/second.  Compare that to physical ram. 
Latencies of 10^-7 seconds or less, and bandwidths of 10^9 to 10^10 seconds.

   If you are going to do disk IO, pay that latency cost once for many 
pages, not once per page with seeks.

   Just like with other streaming calculations, you likely need to do 
some sort of double buffering.  That said, disk IO is not really the answer.

> would you work around memory limitations like this (short of building a
> bigger cluster)?

   20+ years ago I worked on a large dense Markov matrix calculation 
where after computing the relevant matrix elements and using them in the 
calculation, I would throw them away.  It was cheaper (less time 
consuming) than spilling them to disk and then trying to recover them 
later.  Then again, this was an IBM 3090 VF 180 ... so ...

   Since you are doing matrix multiplication, I might suggest looking at 
the Golub and Van Loan bible on Matrix Computations for some ideas. 
That said, Matrix multiplications are decomposable.  If you can 
reconstruct matrix elements easily (more quickly than storage/retrieval) 
this might be a good method.  Or if you can decompose it far enough, or 
if the problem has some sort of essential symmetry you can exploit in 
the matrix structure, this could help.  Symmetries not only imply 
conservation laws, they tend to reduce storage requirements.

   Out of curiousity, what size matrices are you using?  I know some of 
the structural folks can, with large enough DoF problems hit 10^8 or so 
on a side.  Not dense (usually with specific banded structure).

   And that brings up another possibility.  If you can perform various 
transforms on your matrix to get it into a well known form (banded, ...) 
this could make multiplications go much faster.

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615

More information about the Beowulf mailing list