[Beowulf] GPU diagnostics?

Joe Landman landman at scalableinformatics.com
Mon Mar 30 16:09:31 PDT 2009


Greg Lindahl wrote:
> On Mon, Mar 30, 2009 at 06:31:17PM -0400, Joe Landman wrote:
> 
>> This said, there really isn't a memory checker for GPUs just yet.  Could  
>> be done, and probably should be ...
> 
> But will it be like memtest86, which isn't as good as HPL at finding
> problems? If you've got DGEMM for your GPU, you're there.

Heh... I erased the paragraph where I tore into using memtest* as 
anything other than a gross checker ... felt it wasn't too relevant.

We run a few parallel codes as our testers.  Beats the heck out of the 
system (you can hear the fans spin up on variable speed systems). 
Specifically, we purposefully (computationally) overload the unit and 
make sure we don't throw EDACs/MCEs.

Yeah, *GEMM is good (some GPU cards don't do DGEMMs on them though ... 
older nVidia/ATI don't).

Too bad Cuda won't run on the ATIs.  Would really make maintaining this 
thing easy.

If people can live with SGEMMs, and other FFT-like things, we can 
probably leverage (and make available) an older code we used a while 
ago.  Actually, for another project, we just did a DGETF and a few other 
ports.  Let me know if you want me to clean it up and make it available.

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615



More information about the Beowulf mailing list