[Beowulf] Anyone with really large clusters seeing memory leaks with OFED 1.5 for tcp based apps?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Skylar Thompson skylar at cs.earlham.eduSun Jan 31 17:17:27 PST 2010
- Previous message: [Beowulf] Anyone with really large clusters seeing memory leaks with OFED 1.5 for tcp based apps?
- Next message: [Beowulf] GPU Beowulf Clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Joe Landman wrote: > Hi folks > > Trying to trace something annoying down, and see if we are running > into something that is known. > > OFED 1.5 on a 2.6.30.10 kernel. Running a file system atop IPoIB > (many reasons, none I care to get into here at the moment). Under > light load, the file system gradually grabs memory. Possibly a leak, > not entirely sure. Could be the OFED stack underneath. Backing file > system is xfs. That is has been (on this hardware in other > situations) rock solid stable. Here, xfs, OFED/IPoIB all toss their > cookies (and fail allocations) under moderate to heavy load. > > Working with the file system vendor on this. I am not sure we have > the answer nailed, so I wanted to see who out there is running a big ( > >512 nodes) cluster, doing large data transfers (preferably over > IPoIB), for data storage, and running a late model OFED. If you fall > into this category, please let me know, as I'd like to ask a few > questions offline about any observed OFED/IPoIB failure modes. I am > not convinced it is OFED/IPoIB, but I'd like to see what other people > have run into ... if anything. > > Thanks! We're running at OFED 1.4 for our GPFS cluster, with RDMA used for data and IPoIB used for metadata and backups. We're looking at an upgrade to 1.5 so if you do find anything out I'd be very interested in knowing. -- -- Skylar Thompson (skylar at cs.earlham.edu) -- http://www.cs.earlham.edu/~skylar/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 251 bytes Desc: OpenPGP digital signature Url : http://www.scyld.com/pipermail/beowulf/attachments/20100131/19df04c2/signature.bin
- Previous message: [Beowulf] Anyone with really large clusters seeing memory leaks with OFED 1.5 for tcp based apps?
- Next message: [Beowulf] GPU Beowulf Clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
