[Beowulf] Cannot run offload codes with Intel Xeon Phi cards deployed via xCAT
samuel at unimelb.edu.au
Tue Aug 20 20:51:10 PDT 2013
-----BEGIN PGP SIGNED MESSAGE-----
Hi xCAT and Beowulf folks,
Has anyone successfully run an offload code (say Intel's
xhpl_offload_intel64) using Xeon Phi's that have been deployed via xCAT?
We've got a user trying to use a prerelease version of NAMD that does
offload but it fails saying it can't find the cards.
Setting OFFLOAD_REPORT=2 shows errors of:
and strace reveals that ioctl() on the filehandle returned by an open
on /dev/mic/scif returns ECONNREFUSED:
5672 open("/dev/mic/scif", O_RDWR) = 3
5672 ioctl(3, 0xc0087303, 0x7fff1c780f20) = -1 ECONNREFUSED
Reading the Intel "mic" kernel driver source code that's only returned
at one point in the driver, and the include file explains it as:
* - The destination was not listening for connections or refused the
* connection request.
So it's sounding like the uos that's getting deployed on the MICs is
not complete, or something isn't getting loaded on boot.
I've replicated the same error with xhpl_offload_intel64 (which used
to work before we changed to xCAT deploying the Phi's) so it's not the
Can I ask someone with a Xeon Phi working for offload purposes to send
me the output of "lsmod" and "ps -aef" please? That way I can diff
it with what I have to see what (if anything) I'm missing.
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
-----END PGP SIGNATURE-----
More information about the Beowulf