[Beowulf] GPU diagnostics?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
David Mathog mathog at caltech.eduMon Mar 30 09:56:02 PDT 2009
- Previous message: [Beowulf] TCP connect error: ECONNREFUSED.
- Next message: [Beowulf] GPU diagnostics?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Have any of you CUDA folks produced diagnostic programs you run during "burn in" of new GPU based systems, in order to weed out problem units before putting them into service? Minimally, something resembling memtest86, to be used to find buggy memory associated with the GPU? Optimally, it would also more directly exercise the GPU's capabilities. I asked on the NV linux forum if there were any official Nvidia graphics card diagnostic programs, and nobody there answered with one. This was originally with respect to some VDPAU issues, where it looked at first like there might be a hardware problem on a small set of systems, including mine, although in the end it turned out to be an uninitialized variable (it was not my code). There was no objective way to demonstrate for VDPAU based software that "this graphics card is functioning normally" to help sort this out. I figured the CUDA folks should have something like this, else how could you trust the results from the GPU calculations? Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech
- Previous message: [Beowulf] TCP connect error: ECONNREFUSED.
- Next message: [Beowulf] GPU diagnostics?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
