[Beowulf] GPU diagnostics?

David Mathog mathog at caltech.edu
Mon Mar 30 15:02:00 PDT 2009


Donald Becker wrote:
> On Mon, 30 Mar 2009, David Mathog wrote:
> 
> > Joe Landman wrote:
> > > Vendors have an nVidia supplied *GEMM based burn in test.  Been
thinking 
> > > about a set of diagnostics end users can run as a sanity check.
> > 
> > My suspicion is that vendors run such burn in tests only for a very
> > brief time.  That time being "the minimum time required to find the
> > percentage of failed units above which it would cost us more if they
> > were found to be bad in the field" - and not a second longer.
> 
> I don't know about other vendors, but that's not Penguin's approach.

By "vendor" I meant graphics card vendors, not cluster or HPC vendors. 
My interest in this sort of diagnostic arose in relation to an
inexpensive graphics card bought at Newegg.  I was asking here
specifically because it seemed likely that HPC vendors _would_ have
the sort of GPU diagnostic I was seeking, and might be willing to share
it.  (As opposed to the tool Joe referred to, which seems not to be
generally available.)

Regards,

David Mathog
mathog at caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech



More information about the Beowulf mailing list