[Beowulf] large array to run
Joe Landman
landman at scalableinformatics.com
Thu Dec 13 18:01:27 PST 2007
Peter Skomoroch wrote:
> This reminds me of a similar issue I had. What approaches do you take for
> large dense matrix multiplication in MPI, when the matrices are too large to
> fit into cluster memory? If I hack up something to cache intermediate
Hi Peter:
> results to disk, the IO seems to drag everything to a halt and I'm looking
> for a better solution. I'd like to use some libraries like PETSc, but how
Disk memory has a latency of 10^-3 seconds or so, and a bandwidth of
from 10^7 to 10^8 bytes/second. Compare that to physical ram.
Latencies of 10^-7 seconds or less, and bandwidths of 10^9 to 10^10 seconds.
If you are going to do disk IO, pay that latency cost once for many
pages, not once per page with seeks.
Just like with other streaming calculations, you likely need to do
some sort of double buffering. That said, disk IO is not really the answer.
> would you work around memory limitations like this (short of building a
> bigger cluster)?
20+ years ago I worked on a large dense Markov matrix calculation
where after computing the relevant matrix elements and using them in the
calculation, I would throw them away. It was cheaper (less time
consuming) than spilling them to disk and then trying to recover them
later. Then again, this was an IBM 3090 VF 180 ... so ...
Since you are doing matrix multiplication, I might suggest looking at
the Golub and Van Loan bible on Matrix Computations for some ideas.
That said, Matrix multiplications are decomposable. If you can
reconstruct matrix elements easily (more quickly than storage/retrieval)
this might be a good method. Or if you can decompose it far enough, or
if the problem has some sort of essential symmetry you can exploit in
the matrix structure, this could help. Symmetries not only imply
conservation laws, they tend to reduce storage requirements.
Out of curiousity, what size matrices are you using? I know some of
the structural folks can, with large enough DoF problems hit 10^8 or so
on a side. Not dense (usually with specific banded structure).
And that brings up another possibility. If you can perform various
transforms on your matrix to get it into a well known form (banded, ...)
this could make multiplications go much faster.
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 866 888 3112
cell : +1 734 612 4615
More information about the Beowulf
mailing list