[Beowulf] gpu benchmark

Paul McIntosh paul.mcintosh at monash.edu
Wed Aug 31 16:00:24 PDT 2016


This was something we developed as a result of a single GPU returning "0"
out of a bunch. This is very old so I am not sure how easy it is to bring to
life under a new environment

https://launchpad.net/clamity

Cheers,

Paul

-----Original Message-----
From: Beowulf [mailto:beowulf-bounces at beowulf.org] On Behalf Of Michael Di
Domenico
Sent: Wednesday, 31 August 2016 10:39 PM
To: Beowulf Mailing List <Beowulf at beowulf.org>
Subject: [Beowulf] gpu benchmark

I'm looking for a benchmark that can keep a gpu busy (ideally both compute
and memory) for 2 or 3 hours.  but here's the kick, at the end of the
benchmark it needs to check it's answers

i'm trying to hunt down some potentially bad hardware.  linpack works great
for the bulk of it, but the nodes i have, have more gpu power then ram
available, so a linpack run using the full ram of a single box doesn't run
long enough.  and successive runs one after another doesn't seem to trigger
it

i can trigger the error using large mpi jobs across many nodes, but that
doesn't let me isolate which gpu was at fault.  and since these are GTX
level cards, no ecc and no error diagnostics on the console...

i'm going through my bag of tricks, but haven't come up with anything just
yet.
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To
change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list