Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Infinipath memory parity errors

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Nifty niftyompi Mitch niftyompi at niftyegg.com
Wed Aug 13 17:12:40 PDT 2008


On Wed, Aug 13, 2008 at 05:03:46PM +0100, Dave Love wrote:
> [I know in an ideal world the vendor between us and PathScale^WQlogic
> would sort this out.]
> 
> I'm interested in the cause (and possible cure!) of intermittent errors
> on various nodes in our Infinipath system which stop MPI jobs with
> kernel messages like this, in case anyone's familiar with them:
> 
>   lvinfi095:21.Hardware problem: {[RXE EAGERTID Memory Parity]}
> 
> They seem to be new with an upgrade to Linux 2.6.22 from 2.6.11, but
> probably just manifested themselves in some other way previously.
> 
> Google didn't produce any leads, and a brief look in the source suggests
> that tracking it down where it's generated in the ib_ipath module is
> non-trivial and likely won't tell me a lot.
> 
> For what it's worth, the adaptors are
> 
>   06:00.0 InfiniBand: PathScale, Inc InfiniPath HT-400 (rev 02)
> 
> in two different sorts of Supermicro whose model numbers I don't know.
> 

Dave,

Which driver is active?  Which Infinipath software release
is installed?  The tool "ipath_control -i" can show which...

The kernel.org/ofed driver does not have as rich a set of error recovery
code for this card as the shipped driver.   The recovery code was seen
as a badness and not accepted by the kernel.org folk....

With a kernel update the driver will not have been recompiled
and the kernel.org driver would become active.   
Look for this stuff in the Install Guide.

	#   To rebuild the drivers, do the following (as root):
	# cd /usr/src/infinipath/drivers
	# ./make-install.sh
	# /etc/init.d/infinipath restart






 






-- 
	T o m  M i t c h e l l 
	Got a great hat... now what.




More information about the Beowulf mailing list