[Beowulf] Can one Infiniband net support MPI and a parallel filesystem?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Gerry Creager gerry.creager at tamu.eduMon Sep 1 21:38:52 PDT 2008
- Previous message: [Beowulf] gpgpu
- Next message: [Beowulf] Stroustrup regarding multicore
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Craig Tierney wrote: > Joe Landman wrote: >> Craig Tierney wrote: >>> Chris Samuel wrote: >>>> ----- "I Kozin (Igor)" <i.kozin at dl.ac.uk> wrote: >>>> >>>>>> Generally speaking, MPI programs will not be fetching/writing data >>>>>> from/to storage at the same time they are doing MPI calls so there >>>>>> tends to not be very much contention to worry about at the node >>>>>> level. >>>>> I tend to agree with this. >>>> >>>> But that assumes you're not sharing a node with other >>>> jobs that may well be doing I/O. >>>> >>>> cheers, >>>> Chris >>> >>> I am wondering, who shares nodes in cluster systems with >>> MPI codes? We never have shared nodes for codes that need >> >> The vast majority of our customers/users do. Limited resources, they >> have to balance performance against cost and opportunity cost. >> >> Sadly not every user has an infinite budget to invest in contention >> free hardware (nodes, fabrics, or disks). So they have to maximize >> the utilization of what they have, while (hopefully) not trashing the >> efficiency too badly. >> >>> multiple cores since be built our first SMP cluster >>> in 2001. The contention for shared resources (like memory >>> bandwidth and disk IO) would lead to unpredictable code performance. >> >> Yes it does. As does OS jitter and other issues. >> >>> Also, a poorly behaved program can cause the other codes on >>> that node to crash (which we don't want). >> >> Yes this happens as well, but some users simply have no choice. >> >>> >>> Even at TACC (62000+ cores) with 16 cores per node, nodes >>> are dedicated to jobs. >> >> I think every user would love to run on a TACC like system. I think >> most users have a budget for something less than 1/100th the size. >> Its easy to forget how much resource (un)availability constrains >> actions when you have very large resources to work with. >> > > TACC probably wasn't a good example for the "rest of us". It hasn't been > difficult to dedicate nodes to jobs when the number of cores was 2 or 4. > We now have some 8 core nodes, and we are wondering if the policy of > not sharing nodes is going to continue, or at least modified to minimize > waste. Last time I asked (recently...) TACC intends to continue scheduling per-node, even with 16 cores/node. Sorry to be late with this but the hurricane season is getting interesting and e-mail's taken a bit of a hit. -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843
- Previous message: [Beowulf] gpgpu
- Next message: [Beowulf] Stroustrup regarding multicore
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
