[Beowulf] GPU diagnostics?

Joe Landman landman at scalableinformatics.com
Mon Mar 30 10:10:17 PDT 2009


David Mathog wrote:
> Have any of you CUDA folks produced diagnostic programs you run during
> "burn in" of new GPU based systems, in order to weed out problem units
> before putting them into service?  Minimally,  something resembling
> memtest86, to be used to find buggy memory associated with the GPU?
> Optimally, it would also more directly exercise the GPU's capabilities.
> 
> I asked on the NV linux forum if there were any official Nvidia graphics
> card diagnostic programs, and nobody there answered with one.  This was
> originally with respect to some VDPAU issues, where it looked at first
> like there might be a hardware problem on a small set of systems,
> including mine, although in the end it turned out to be an uninitialized
> variable (it was not my code).   There was no objective way to
> demonstrate for VDPAU based software that "this graphics card is
> functioning normally" to help sort this out.  I figured the CUDA folks
> should have something like this, else how could you trust the results
> from the GPU calculations?

Vendors have an nVidia supplied *GEMM based burn in test.  Been thinking 
about a set of diagnostics end users can run as a sanity check.





-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the Beowulf mailing list