[Beowulf] HPL Benchmarking and Optimization
dberkholz at gentoo.org
Fri Apr 4 19:14:22 PDT 2008
On 16:48 Fri 04 Apr , Ellis Wilson wrote:
> Thanks for your suggestions, I had already begun testing Goto BLAS
> when I got your email, and it has been thus far the most beneficial
> one to my particular application residing on a CD-ROM. MKL proved to
> be far too heavyweight (and I try to avoid closed source as often as
> possible). The only difficulties have come with the compilation of
> Goto BLAS (or anything, for that matter) on a static system such as a
> LiveCD. As I do not include in my LiveCD (in order to keep its total
> size and initrd loaded size down as low as possible) the portage tree,
> nothing can be emerged.
You could include only the small subset of the tree you need.
> This has required me to pursue a number of solutions, the first being
> to copy the full version of it directly into the tmpfs from a usb pen,
> uncompress it, chroot into that environment, compile on the
> architecture desired, exit the chroot, recompress, and put it back on
> the usb for later burning. Obviously, this requires a ton of work, so
> I came up with an easier fix that has interesting repercussions I'd
> like to hear from this list on:
> An NFS directory is mounted onto my system, which I chroot into,
> compile Goto-BLAS or ATLAS upon, and exit the chroot. Since the
> directory remains on my development system (which does use a
> harddrive) I have no issues with running out of RAM, moving this there
> or the other place, etc. However, upon compiling Goto-BLAS on an
> older P4 without HT and with 256MB RAM, it reported warnings due to
> "clock-skew" I've never seen previously. Is this due to the NFS
> mount? And if so, will it hurt my optimization of Goto BLAS or ATLAS?
> I still achieved 4GFlops on the P4 I had used that methodology upon,
> which was way above my previous findings using the reference library
> (obviously), but I still have my concerns that better optimization
> might occur with local compilation.
> Anyone think thats true/false?
In the case of Goto, the optimization will be a function of the compiler
and compiler flags you use, not whether you compile it on a local or
remote disk. For ATLAS, I suppose the difference in I/O could affect its
automatic tuning if for some reason it's not doing it all in memory, but
that seems unlikely.
More information about the Beowulf