From kus at free.net Mon Sep 1 10:34:53 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 01 Sep 2008 21:34:53 +0400 Subject: [Beowulf] gpgpu In-Reply-To: Message-ID: I performed some simplest estimation for possible performance improvements using "dgemm on FirerStream 9250". It's extremally good for GPGPU example. The source data for 9250: peak DP performance 200 GFLOPS, GDDR3 RAM 1 Gbyte. 1 Gbyte can hold 3 DP(64 bit) matrixes (n x n) for n=6000 - they require 864 Mbytes. Let me suppose that real performance of FireStream will be 90% of peak value (I'm afraid, that reality will be more bad), i.e. 180 GFLOPS. dgemm requires 2*n^3 FP operations (I neglect n^2 operations for matrix addition and scaling), i.e. 432 GFLOP The calculation time will be 432/180 = 2.4 sec We'll need for dgemm calculation also 4 matrix transmissions: 3 to GPGPU, 1 - from GPGPU to main memory of server. It's 1152 Gbytes of data. For PCI-e x16 v.2 peak throughput value is 8 GB/s, therefore transmission time will be about 0.144 sec (I don't know what may be real throughput for PCIe). The total calc. time is therefore about 2.54 sec. On dual socket quad core Xeon server w/3 Ghz E5472 (8 cores) the peak performance is 96 GFLOPS. Parallelized dgemm will give, I believe, about 80% of peak - i.e. 77 GFLOPS; therefore calcualtion time is 432/77= 5.6 sec. Speedup is 2.2 times. Price increase - I don't know, for example from $4500 to $6500 (if Firestream costs $2000, but may be $1000 as Igor Kozin wrote here), it's about 1.4 times. But I think there will be not too many job which require matrix multiplication for *dense* matrixes w/such large (6000 x 6000) sizes; for sparse matrixes the dimensions, I beleive, will be lower. Mikhail From libo at buaa.edu.cn Mon Sep 1 17:43:16 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Tue, 2 Sep 2008 08:43:16 +0800 Subject: [Beowulf] gpgpu References: Message-ID: <000f01c90c94$e84e2ba0$6300a8c0@LIBO> Hello, It seemed that you had got a very good example for GPGPU. As I said before, it's not the time for GPGPU to do the DP calculation at the moment. If you can bear SP computation, you will find more about it. NVidia just sent me some special offer about their Tesla platforms, which said that the workstation equipped with two GTX280 level professional cards costs about $5000, not bad. But my intention is still to lower the core frequency of a gaming card, and use it for computation. Regards, Li, Bo ----- Original Message ----- From: "Mikhail Kuzminsky" To: "Kozin, I (Igor)" Cc: Sent: Tuesday, September 02, 2008 1:34 AM Subject: Re: [Beowulf] gpgpu >I performed some simplest estimation for possible performance > improvements using "dgemm on FirerStream 9250". > It's extremally good for GPGPU example. > > The source data for 9250: peak DP performance 200 GFLOPS, GDDR3 RAM 1 > Gbyte. > > 1 Gbyte can hold 3 DP(64 bit) matrixes (n x n) for n=6000 - they > require 864 Mbytes. > Let me suppose that real performance of FireStream will be 90% of peak > value (I'm afraid, that reality will be more bad), i.e. 180 GFLOPS. > > dgemm requires 2*n^3 FP operations (I neglect n^2 operations for > matrix addition and scaling), i.e. 432 GFLOP > The calculation time will be 432/180 = 2.4 sec > > We'll need for dgemm calculation also 4 matrix transmissions: 3 to > GPGPU, 1 - from GPGPU to main memory of server. > It's 1152 Gbytes of data. > > For PCI-e x16 v.2 peak throughput value is 8 GB/s, therefore > transmission time will be about 0.144 sec (I don't know what may be > real throughput for PCIe). > > The total calc. time is therefore about 2.54 sec. > > On dual socket quad core Xeon server w/3 Ghz E5472 (8 cores) the peak > performance is 96 GFLOPS. Parallelized dgemm will give, I believe, > about 80% of peak - i.e. 77 GFLOPS; therefore calcualtion time is > 432/77= 5.6 sec. > > Speedup is 2.2 times. Price increase - I don't know, for example from > $4500 to $6500 (if Firestream costs $2000, but may be $1000 as Igor > Kozin wrote here), it's about 1.4 times. > > But I think there will be not too many job which require matrix > multiplication for *dense* matrixes w/such large (6000 x 6000) sizes; > for sparse matrixes the dimensions, I beleive, will be lower. > > Mikhail > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From matt at technoronin.com Mon Sep 1 18:44:53 2008 From: matt at technoronin.com (Matt Lawrence) Date: Mon, 1 Sep 2008 20:44:53 -0500 (CDT) Subject: [Beowulf] gpgpu In-Reply-To: <000f01c90c94$e84e2ba0$6300a8c0@LIBO> References: <000f01c90c94$e84e2ba0$6300a8c0@LIBO> Message-ID: On Tue, 2 Sep 2008, Li, Bo wrote: > It seemed that you had got a very good example for GPGPU. As I said > before, it's not the time for GPGPU to do the DP calculation at the > moment. If you can bear SP computation, you will find more about it. > NVidia just sent me some special offer about their Tesla platforms, > which said that the workstation equipped with two GTX280 level > professional cards costs about $5000, not bad. But my intention is still > to lower the core frequency of a gaming card, and use it for > computation. Are those the chips that overheat and pull loose from the carrier? -- Matt It's not what I know that counts. It's what I can remember in time to use. From libo at buaa.edu.cn Mon Sep 1 19:14:08 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Tue, 2 Sep 2008 10:14:08 +0800 Subject: [Beowulf] gpgpu References: <000f01c90c94$e84e2ba0$6300a8c0@LIBO> Message-ID: <001c01c90ca1$99966c90$6300a8c0@LIBO> Hello, Not at all. I lowered the frequency for stability, actually it works fine at the default frequency, but I don't want to take any risks. Regards, Li, Bo ----- Original Message ----- From: "Matt Lawrence" To: "who's afraid of" Sent: Tuesday, September 02, 2008 9:44 AM Subject: Re: [Beowulf] gpgpu > On Tue, 2 Sep 2008, Li, Bo wrote: > >> It seemed that you had got a very good example for GPGPU. As I said >> before, it's not the time for GPGPU to do the DP calculation at the >> moment. If you can bear SP computation, you will find more about it. >> NVidia just sent me some special offer about their Tesla platforms, >> which said that the workstation equipped with two GTX280 level >> professional cards costs about $5000, not bad. But my intention is still >> to lower the core frequency of a gaming card, and use it for >> computation. > > Are those the chips that overheat and pull loose from the carrier? > > -- Matt > It's not what I know that counts. > It's what I can remember in time to use. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From libo at buaa.edu.cn Mon Sep 1 19:19:52 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Tue, 2 Sep 2008 10:19:52 +0800 Subject: [Beowulf] gpgpu References: <000f01c90c94$e84e2ba0$6300a8c0@LIBO> Message-ID: <001f01c90ca2$66d9e100$6300a8c0@LIBO> Gaming card is not supposed to have the same stability at the default frequency, but with the 10 times price difference, it is still a very good choice. Two card system cost us only $1,000 and provides about 1.6TFlops SP capability. Regards, Li, Bo ----- Original Message ----- From: "Matt Lawrence" To: "who's afraid of" Sent: Tuesday, September 02, 2008 9:44 AM Subject: Re: [Beowulf] gpgpu > On Tue, 2 Sep 2008, Li, Bo wrote: > >> It seemed that you had got a very good example for GPGPU. As I said >> before, it's not the time for GPGPU to do the DP calculation at the >> moment. If you can bear SP computation, you will find more about it. >> NVidia just sent me some special offer about their Tesla platforms, >> which said that the workstation equipped with two GTX280 level >> professional cards costs about $5000, not bad. But my intention is still >> to lower the core frequency of a gaming card, and use it for >> computation. > > Are those the chips that overheat and pull loose from the carrier? > > -- Matt > It's not what I know that counts. > It's what I can remember in time to use. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Mon Sep 1 21:38:52 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Mon, 01 Sep 2008 23:38:52 -0500 Subject: [Beowulf] Can one Infiniband net support MPI and a parallel filesystem? In-Reply-To: <48A2F5D7.9080406@noaa.gov> References: <1278943052.86771218524490230.JavaMail.root@mail.vpac.org> <48A1D1D8.6040405@noaa.gov> <48A2C50F.5010305@scalableinformatics.com> <48A2F5D7.9080406@noaa.gov> Message-ID: <48BCC35C.6010204@tamu.edu> Craig Tierney wrote: > Joe Landman wrote: >> Craig Tierney wrote: >>> Chris Samuel wrote: >>>> ----- "I Kozin (Igor)" wrote: >>>> >>>>>> Generally speaking, MPI programs will not be fetching/writing data >>>>>> from/to storage at the same time they are doing MPI calls so there >>>>>> tends to not be very much contention to worry about at the node >>>>>> level. >>>>> I tend to agree with this. >>>> >>>> But that assumes you're not sharing a node with other >>>> jobs that may well be doing I/O. >>>> >>>> cheers, >>>> Chris >>> >>> I am wondering, who shares nodes in cluster systems with >>> MPI codes? We never have shared nodes for codes that need >> >> The vast majority of our customers/users do. Limited resources, they >> have to balance performance against cost and opportunity cost. >> >> Sadly not every user has an infinite budget to invest in contention >> free hardware (nodes, fabrics, or disks). So they have to maximize >> the utilization of what they have, while (hopefully) not trashing the >> efficiency too badly. >> >>> multiple cores since be built our first SMP cluster >>> in 2001. The contention for shared resources (like memory >>> bandwidth and disk IO) would lead to unpredictable code performance. >> >> Yes it does. As does OS jitter and other issues. >> >>> Also, a poorly behaved program can cause the other codes on >>> that node to crash (which we don't want). >> >> Yes this happens as well, but some users simply have no choice. >> >>> >>> Even at TACC (62000+ cores) with 16 cores per node, nodes >>> are dedicated to jobs. >> >> I think every user would love to run on a TACC like system. I think >> most users have a budget for something less than 1/100th the size. >> Its easy to forget how much resource (un)availability constrains >> actions when you have very large resources to work with. >> > > TACC probably wasn't a good example for the "rest of us". It hasn't been > difficult to dedicate nodes to jobs when the number of cores was 2 or 4. > We now have some 8 core nodes, and we are wondering if the policy of > not sharing nodes is going to continue, or at least modified to minimize > waste. Last time I asked (recently...) TACC intends to continue scheduling per-node, even with 16 cores/node. Sorry to be late with this but the hurricane season is getting interesting and e-mail's taken a bit of a hit. -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From lindahl at pbm.com Wed Sep 3 02:04:17 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Wed, 3 Sep 2008 02:04:17 -0700 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> Message-ID: <20080903090417.GA15987@bx9.net> On Thu, Aug 28, 2008 at 11:54:05AM -0400, Peter St. John wrote: > I think a physicist programming is like an astronomer grinding lenses (maybe > nobody does that anymore). Some astronomers (in the old days) ground their > own lenses and ended up contributing to optics; others never looked through > telescopes, they do math on the measurements taken by others. This is the 2nd funniest posting in this thread. Did you notice that ground-based telescopes recently started being much, much bigger? These new lenses were invented and made in Arizona by an astronomer, who figured out how to spin molten glass into roughly the right shape, instead of taking a huge, flat, thick piece of glass and grinding it into the shape of a mirror. http://www.npr.org/templates/story/story.php?storyId=4773461 Our community does this kind of stuff because it wouldn't happen otherwise. The funniest posting in this thread was when rgb failed to notce that Perry had compared the difficulty of directing physics research to the difficulty of writing a program. Some computer programs are hard. Most aren't. So it's a dumb comparison. I don't know what to make of Vincent saying that I sound like an average guy who watches TV. I haven't watched TV much since 1983, but I have spent a lot of time as an astronomy graduate student doing supercomputing, and then working with scientific programmers. This isn't meant to encourage anyone to continue discussing any of this. I did want to point out how misinformed most of the "discussion" was. That's in addition to being pointless. Yeah, I'm probably a bit grouchy because my car's parking lights don't turn off anymore after the final dust storm at Burning Man. The owner's manual says it can't happen. Must have been written by a computer scientist :-) -- greg From rgb at phy.duke.edu Wed Sep 3 04:20:16 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 3 Sep 2008 07:20:16 -0400 (EDT) Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <20080903090417.GA15987@bx9.net> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> Message-ID: On Wed, 3 Sep 2008, Greg Lindahl wrote: > The funniest posting in this thread was when rgb failed to notce that > Perry had compared the difficulty of directing physics research to the > difficulty of writing a program. Some computer programs are hard. Most > aren't. So it's a dumb comparison. I didn't quite fail to notice;-) I just offered to explain my own research if anybody was interested. No takers, of course -- which is good as it would take me a LONG time as the science is nontrivial:-) I was also getting a bit tired of the thread as this particular thesis (that scientists make poor computer programmer and/or must hire programmers in order to do good science using computers) was so absurd that -- after writing out a longish response and just throwing up my hands in disgust and deleting it instead of posting -- I tried to gently bow out. > I don't know what to make of Vincent saying that I sound like an > average guy who watches TV. I haven't watched TV much since 1983, but It just means that Vincent is a narrowly brilliant wacko. Narrowly possibly brilliant -- I never know quite what to make of chess or go masters who never do anything constructive. Clearly requires some serious neurons, but isn't there ANYTHING in the world that they can turn all that grey matter to to the benefit of humankind? But you know that. > I have spent a lot of time as an astronomy graduate student doing > supercomputing, and then working with scientific programmers. > > This isn't meant to encourage anyone to continue discussing any of > this. I did want to point out how misinformed most of the "discussion" > was. That's in addition to being pointless. I still don't think it was originally pointless. People read the list and then go write proposals. Twenty proposals budgeting one grad student and a computer programmer are twenty proposals that won't get funded. So who knows, MAYBE it saved some poor soul's research program. But probably not -- people aren't that stupid. > Yeah, I'm probably a bit grouchy because my car's parking lights don't > turn off anymore after the final dust storm at Burning Man. The > owner's manual says it can't happen. Must have been written by a > computer scientist :-) Or Murphy. I just like to think of matter as being, y'know, this collection of spinning clouds of "stuff" that is all really soft, ultimately, and fails to hold its shape, structure, form, and purpose a whole lot faster than people realize. The key cylinder in my son's junker jaguar ('92) decided yesterday to ignore the jag's bizarre key for the same reason. I'm sure it will cost me a bunch of money, sigh. rgb > > -- greg > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From rgb at phy.duke.edu Wed Sep 3 05:01:08 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 3 Sep 2008 08:01:08 -0400 (EDT) Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> Message-ID: On Wed, 3 Sep 2008, Robert G. Brown wrote: >> I don't know what to make of Vincent saying that I sound like an >> average guy who watches TV. I haven't watched TV much since 1983, but > > It just means that Vincent is a narrowly brilliant wacko. Narrowly Jesus, I shouldn't be allowed near a keyboard before I have my coffee. Vincent, I apologize. This isn't funny (although somehow, at the time...) This is clearly uncalled for ad hominem crap and a product of a mix of pre-coffee crankiness and a profound lack of sleep. I love chess. I love go. I suck at both of them, and they are very, very hard problems and humans learn a lot from trying to solve them. If I offended you (and I don't see how I could miss, sorry) I apologize. If I offended Greg, Peter, or anyone else, I apologize again. I think I'll go crawl back under a rock for a while with the rest of the exoskeletal mindless creatures. rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From james.p.lux at jpl.nasa.gov Wed Sep 3 06:44:44 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 3 Sep 2008 06:44:44 -0700 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <20080903090417.GA15987@bx9.net> Message-ID: On 9/3/08 2:04 AM, "Greg Lindahl" wrote: > On Thu, Aug 28, 2008 at 11:54:05AM -0400, Peter St. John wrote: > >> I think a physicist programming is like an astronomer grinding lenses (maybe >> nobody does that anymore). Some astronomers (in the old days) ground their >> own lenses and ended up contributing to optics; others never looked through >> telescopes, they do math on the measurements taken by others. > > This is the 2nd funniest posting in this thread. Did you notice that > ground-based telescopes recently started being much, much bigger? > These new lenses were invented and made in Arizona by an astronomer, > who figured out how to spin molten glass into roughly the right shape, > instead of taking a huge, flat, thick piece of glass and grinding it > into the shape of a mirror. > > http://www.npr.org/templates/story/story.php?storyId=4773461 > > > > > ---- Ahem.. Reflectors, not lenses And, actually, the fact that a spinning body of liquid assumes a parabolic shape has been known for centuries (Kepler?), and, in fact, as early as 1850, an astronomer (Ernesto Capocci) proposed and built a telescope using liquid metal (e.g. Mercury) for a reflector. He probably wasn?t unique, as there are mentions of a Mr. Buchan in notes by Brewster (as in Brewster angle) about the same time. There?s a fascinating thesis by Brad Gibson from Univ of Vancouver that gives a dozen or so pages of all the problems faced with liquid metal telescopes (ripples, etc.) > What Dr Angel and the folks in Arizona have done is build an enormous spinning oven and worked out the process controls (more of an engineering task than a science, one, I might add.. Being an Engineer, I think these distinctions are important, not that new science isn't being done here). They also still have to do a conventional polishing step, but, at least the general figure of the mirror?s surface is already close to what it needs to be. (Interestingly, there?s apparently an article about this in Science News back in Feb 1985, which is when the latest work in LMTs got going at Laval) Jim Lux From perry at piermont.com Wed Sep 3 07:12:04 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 03 Sep 2008 10:12:04 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <20080903090417.GA15987@bx9.net> (Greg Lindahl's message of "Wed\, 3 Sep 2008 02\:04\:17 -0700") References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> Message-ID: <878wu9xqrv.fsf@snark.cb.piermont.com> Greg Lindahl writes: > The funniest posting in this thread was when rgb failed to notce that > Perry had compared the difficulty of directing physics research to the > difficulty of writing a program. Some computer programs are hard. Most > aren't. So it's a dumb comparison. If you say so. Most of the programmers I know go through three stages. When they're starting out, as they're writing their very first programs, they think writing software is complicated and that they don't know nearly enough. Then, when they've gotten to the point where they have been doing it a while and are reasonably familiar with a language or two, they think writing software is straightforward. As with the time where new pilots know enough to fly reasonably but don't have a lot of hours, this is when the programmer is the most dangerous to himself and to others. Finally, if they're really good programmers, after a few years they begin to think writing good software is monstrously difficult, as hard as the hardest human endeavors, and that they only understand enough to muddle through it. This is when they can finally be trusted. You can tell the incompetent people by the fact that they never get past stage 2. Perry From perry at piermont.com Wed Sep 3 07:14:43 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 03 Sep 2008 10:14:43 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: (Robert G. Brown's message of "Wed\, 3 Sep 2008 07\:20\:16 -0400 $EDT$") References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> Message-ID: <874p4xxqng.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: > I was also getting a bit tired of the thread as this particular > thesis (that scientists make poor computer programmer and/or must > hire programmers in order to do good science using computers) You totally got my point wrong. I said exactly the opposite. I believe that scientists must spend enough time to become good computer programmers -- they must neither leave the task to others nor can they underestimate the amount of difficulty involved in the software. How it is possible that people managed to read that much and hear exactly the inverse of my central thesis, I don't understand at all. Perhaps everyone just hears what they want to. Perry From smulcahy at aplpi.com Wed Sep 3 07:31:31 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Wed, 03 Sep 2008 15:31:31 +0100 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <874p4xxqng.fsf@snark.cb.piermont.com> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> <874p4xxqng.fsf@snark.cb.piermont.com> Message-ID: <48BE9FC3.4080306@aplpi.com> Perry E. Metzger wrote: > How it is possible that people managed to read that much and hear > exactly the inverse of my central thesis, I don't understand at > all. Perhaps everyone just hears what they want to. Sheesh, I resisted for a long time but .... The scenario above pretty much sums up the situation I see with one of the softer sides of software engineering - the requirements gathering, which I'd see as fundamental to a successful (software, or indeed general IT project). IMHO, the most important part of most projects is figuring out what the heck the "stakeholder"[1] wants in the first place. No matter how good your programming is, if your requirements are wrong - you're heading in the wrong direction entirely (a bit like building a really neat spacecraft and then launching it towards Pluto instead of Mars[2]). This is SoftwareAnalysisAndDesign at beowulf.org right? -stephen [1] Am I the only one that can't help using that word and visualing a Van Helsing type waving a wooden stake around? Whether the typical project stakeholder is trying to drive the stake through the heart of the project or the heart of nasties trying to drag the project down is an exercise for the reader. [2] Some of those with a background in directing spacecraft lurking on this list may poke holes in my analogy by noting a trajectory to Pluto would take you right my Mars which will really take from my point. -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From larry.stewart at sicortex.com Wed Sep 3 08:20:02 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Wed, 03 Sep 2008 11:20:02 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <874p4xxqng.fsf@snark.cb.piermont.com> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> <874p4xxqng.fsf@snark.cb.piermont.com> Message-ID: <48BEAB22.9080406@sicortex.com> This discussion of letting scientists program reminds me of something that really impressed me about an earlier generation of folks at, I think, CERN. They had, for those days, a big real-time processing problem to process detector data, and they couldn't afford commercial computers to do it, so they built their own racks full of limited 360 clones to do the job. The programming AND the iron was completely incidental to their true goals. They regarded computers and programming as means, rather than ends in themselves, yet were not afraid to step outside their box anymore than a woodworker is afraid to build a jig or grind a chisel to achieve her ends. -- -Larry / Sector IX From james.p.lux at jpl.nasa.gov Wed Sep 3 08:58:21 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 3 Sep 2008 08:58:21 -0700 Subject: Software engineering Re: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BE9FC3.4080306@aplpi.com> Message-ID: On 9/3/08 7:31 AM, "stephen mulcahy" wrote: Perry E. Metzger wrote: > How it is possible that people managed to read that much and hear > exactly the inverse of my central thesis, I don't understand at > all. Perhaps everyone just hears what they want to. Sheesh, I resisted for a long time but .... The scenario above pretty much sums up the situation I see with one of the softer sides of software engineering - the requirements gathering, which I'd see as fundamental to a successful (software, or indeed general IT project). IMHO, the most important part of most projects is figuring out what the heck the "stakeholder"[1] wants in the first place. --- And that's assuming the stakeholder really understands what they want.. Often it evolves as understanding improves (this is one of the arguments for RAD and XP). No matter how good your programming is, if your requirements are wrong - you're heading in the wrong direction entirely (a bit like building a really neat spacecraft and then launching it towards Pluto instead of Mars[2]). ----- All depends on the alignments of planets and stars.. I wouldn't go so far as to say things are planned using astrology, but we (JPL) are probably one of the few businesses around that can use the motions of heavenly bodies to predict our business base and workforce requirements. Every 26 months as Earth comes into trine with Mars is an auspicious time for launch (you want to launch at a time that is roughly half the trip length before closest approach) This is SoftwareAnalysisAndDesign at beowulf.org right? --- you betcha.. When it's not HardwareAnalysisAndDesign... Jim Lux -stephen [1] Am I the only one that can't help using that word and visualing a Van Helsing type waving a wooden stake around? --- Cecil Adams of "The Straight Dope" says that wooden stakes only work on some kinds of beasts. It's apparently a geographic thing.. Other places you need silver bullets, garlic, or something else. -------------- next part -------------- An HTML attachment was scrubbed... URL: From prentice at ias.edu Wed Sep 3 09:10:52 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 03 Sep 2008 12:10:52 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <878wu9xqrv.fsf@snark.cb.piermont.com> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> <878wu9xqrv.fsf@snark.cb.piermont.com> Message-ID: <48BEB70C.40306@ias.edu> This discussion is still completely off-topic. This is a list about computing issues relating to beowulf clusters, not software engineering at large, sociology or psychology. -- Prentice From kyron at neuralbs.com Wed Sep 3 09:27:10 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 03 Sep 2008 12:27:10 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BEB70C.40306@ias.edu> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> <878wu9xqrv.fsf@snark.cb.piermont.com> <48BEB70C.40306@ias.edu> Message-ID: <48BEBADE.3090202@neuralbs.com> Prentice Bisbal wrote: > This discussion is still completely off-topic. This is a list about > computing issues relating to beowulf clusters, not software engineering > at large, sociology or psychology. > Well, it seems the Beowulf mailing list is more vivid on those issues, my recent posts _were_ about HPC technicality and my Google Summer of Code project, the Gentoo Beowulf Clusering LiveCD and both got completely ignored (well...almost) Eric From gerry.creager at tamu.edu Wed Sep 3 09:39:25 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed, 03 Sep 2008 11:39:25 -0500 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BEB70C.40306@ias.edu> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> <878wu9xqrv.fsf@snark.cb.piermont.com> <48BEB70C.40306@ias.edu> Message-ID: <48BEBDBD.5030903@tamu.edu> Prentice Bisbal wrote: > This discussion is still completely off-topic. This is a list about > computing issues relating to beowulf clusters, not software engineering > at large, sociology or psychology. Actually, it's approached software engineering on a socio-pathological level by now... -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From kyron at neuralbs.com Wed Sep 3 10:06:19 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 03 Sep 2008 13:06:19 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BEBADE.3090202@neuralbs.com> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> <878wu9xqrv.fsf@snark.cb.piermont.com> <48BEB70C.40306@ias.edu> <48BEBADE.3090202@neuralbs.com> Message-ID: <48BEC40B.3010208@neuralbs.com> Eric Thibodeau wrote: > Prentice Bisbal wrote: >> This discussion is still completely off-topic. This is a list about >> computing issues relating to beowulf clusters, not software engineering >> at large, sociology or psychology. >> > Well, it seems the Beowulf mailing list is more vivid on those issues, > my recent posts _were_ about HPC technicality and my Google Summer of > Code project, the Gentoo Beowulf Clusering LiveCD and both got > completely ignored (well...almost) > > Eric Hehe, I pulled an RGB and CTRL-ENTERed too quickly... I forgot to mention that, nonetheless, I do enjoy reading these sometimes solilloquyish responses and, as a "student being sucked dry by getting him to do all the HPC clustering stuff" for the department, I find many of the comments pertinent and, at the very least, encouraging in the "hey, I'm not alone" sense of it. Cheers, Eric From james.p.lux at jpl.nasa.gov Wed Sep 3 10:22:12 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 3 Sep 2008 10:22:12 -0700 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BEB70C.40306@ias.edu> Message-ID: I would say that the single biggest problem in HPC today is not getting sufficient hardware horsepower, but in effectively using that power. 10 years ago, just getting a cluster going was a bit of a challenge, in terms of knowing what hardware to get, how to interconnect it, etc, but now, a lot of that is cookbook (or available turnkey from a variety of vendors... A very different matter from when Sterling, et al wrote their book back in 98/99). Sure, there are still hardware issues that are worthy of discussion on this list (details of interconnects, etc.), but one doesn?t see the discussions about topologies that one saw back then. The hardware is now to the point where you rack up the computers, hook them all to a very fast switch with huge bisection bandwidth, and you?re done. However, the topic of taking a simple problem and effectively parallelizing it (either at a EP level as can be done with some Monte Carlo or systematic simulations, or at a fine grained level, as with matrix numerical modeling) is very much grist for the mill. After all, what are all those folks building parallelizing/vectorizing compilers trying to do but reduce the substantial software engineering/design problem, so that a scientist or engineer can just write their problem out in simple form, and have ?the backend? figure out how to do it efficiently (or at all). There are many problems which are, by their nature, software design complex enough that it is not reasonable to have the person ?asking the question? also be knowledgeable enough to manage the substantial software development project. This would be true, if for no other reason than managing a software development effort takes a different skill set than asking good science or engineering questions. So, the real challenge facing builders (in the larger sense) of Beowulfs is in developing methods to get the work actually done, and if that requires developing skills in ?eliciting requirements? or, more probably, ?communicating between software speak and science speak?, then this is an appropriate place to do it (if not here, then where *would* be a place where it?s more germane.. I can't think of one off hand) It's sort of like our discussions about communicating with the facilities folks about power requirements or HVAC. Someone building a cluster needs to know something about this to be an intelligent consumer, but nobody expects the scientist to be down there sweating copper pipes for the chiller or cabling up the EPO button for the UPS. The list is valuable because there *are* folks here who do know how to sweat pipes, manage software projects, and interpret the electrical code, and you can ask a question about such things and get a host of responses, some more useful than others. Jim On 9/3/08 9:10 AM, "Prentice Bisbal" wrote: > This discussion is still completely off-topic. This is a list about > computing issues relating to beowulf clusters, not software engineering > at large, sociology or psychology. > > > From peter.st.john at gmail.com Wed Sep 3 10:34:49 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Wed, 3 Sep 2008 13:34:49 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: References: <48BEB70C.40306@ias.edu> Message-ID: I'm thinking that multicore will make topology interesting again, because of the difference between intercore on a common chip vs going through a nic to even the fastest fabric. Peter On 9/3/08, Lux, James P wrote: > > I would say that the single biggest problem in HPC today is not getting > sufficient hardware horsepower, but in effectively using that power. 10 > years ago, just getting a cluster going was a bit of a challenge, in terms > of knowing what hardware to get, how to interconnect it, etc, but now, a > lot > of that is cookbook (or available turnkey from a variety of vendors... A > very different matter from when Sterling, et al wrote their book back in > 98/99). Sure, there are still hardware issues that are worthy of discussion > on this list (details of interconnects, etc.), but one doesn?t see the > discussions about topologies that one saw back then. The hardware is now > to > the point where you rack up the computers, hook them all to a very fast > switch with huge bisection bandwidth, and you?re done. > > However, the topic of taking a simple problem and effectively parallelizing > it (either at a EP level as can be done with some Monte Carlo or systematic > simulations, or at a fine grained level, as with matrix numerical modeling) > is very much grist for the mill. > > After all, what are all those folks building parallelizing/vectorizing > compilers trying to do but reduce the substantial software > engineering/design problem, so that a scientist or engineer can just write > their problem out in simple form, and have ?the backend? figure out how to > do it efficiently (or at all). > > There are many problems which are, by their nature, software design complex > enough that it is not reasonable to have the person ?asking the question? > also be knowledgeable enough to manage the substantial software development > project. This would be true, if for no other reason than managing a > software > development effort takes a different skill set than asking good science or > engineering questions. > > So, the real challenge facing builders (in the larger sense) of Beowulfs is > in developing methods to get the work actually done, and if that requires > developing skills in ?eliciting requirements? or, more probably, > ?communicating between software speak and science speak?, then this is an > appropriate place to do it (if not here, then where *would* be a place > where > it?s more germane.. I can't think of one off hand) > > It's sort of like our discussions about communicating with the facilities > folks about power requirements or HVAC. Someone building a cluster needs > to > know something about this to be an intelligent consumer, but nobody expects > the scientist to be down there sweating copper pipes for the chiller or > cabling up the EPO button for the UPS. > > The list is valuable because there *are* folks here who do know how to > sweat > pipes, manage software projects, and interpret the electrical code, and you > can ask a question about such things and get a host of responses, some more > useful than others. > > Jim > > > > On 9/3/08 9:10 AM, "Prentice Bisbal" wrote: > > > This discussion is still completely off-topic. This is a list about > > computing issues relating to beowulf clusters, not software engineering > > at large, sociology or psychology. > > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ispmarin at gmail.com Wed Sep 3 10:35:21 2008 From: ispmarin at gmail.com (Ivan Marin) Date: Wed, 03 Sep 2008 14:35:21 -0300 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BEC40B.3010208@neuralbs.com> References: <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> <20080903090417.GA15987@bx9.net> <878wu9xqrv.fsf@snark.cb.piermont.com> <48BEB70C.40306@ias.edu> <48BEBADE.3090202@neuralbs.com> <48BEC40B.3010208@neuralbs.com> Message-ID: <48BECAD9.6030502@gmail.com> I second Eric. I've been following this discussion, and identified myself in several different ways... The RGB's definition of a physicist was just hilarious (and sadly true.). Learned a lot on this thread, and the "not alone" feeling is at least reconforting. Ivan Eric Thibodeau escreveu: > Eric Thibodeau wrote: >> Prentice Bisbal wrote: >>> This discussion is still completely off-topic. This is a list about >>> computing issues relating to beowulf clusters, not software engineering >>> at large, sociology or psychology. >>> >> Well, it seems the Beowulf mailing list is more vivid on those >> issues, my recent posts _were_ about HPC technicality and my Google >> Summer of Code project, the Gentoo Beowulf Clusering LiveCD and both >> got completely ignored (well...almost) >> >> Eric > Hehe, I pulled an RGB and CTRL-ENTERed too quickly... I forgot to > mention that, nonetheless, I do enjoy reading these sometimes > solilloquyish responses and, as a "student being sucked dry by getting > him to do all the HPC clustering stuff" for the department, I find > many of the comments pertinent and, at the very least, encouraging in > the "hey, I'm not alone" sense of it. > > Cheers, > > Eric > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From james.p.lux at jpl.nasa.gov Wed Sep 3 10:39:27 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 3 Sep 2008 10:39:27 -0700 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: Message-ID: On 9/3/08 10:34 AM, "Peter St. John" wrote: I'm thinking that multicore will make topology interesting again, because of the difference between intercore on a common chip vs going through a nic to even the fastest fabric. Peter Yes, indeed.. Actually, it may be more like deja vu, because the core has it's own little address space, and then the space available through the fabric (which looks a lot, conceptually, like the pile of 386 PC with 10BaseT). Even more interesting will be that to effectively use them, some conceptual thought will have to be put into effectively using the techniques for communicating among processes, which don't necessarily run in lockstep systolic array fashion (or SIMD). Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry.stewart at sicortex.com Wed Sep 3 11:42:26 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Wed, 03 Sep 2008 14:42:26 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: References: Message-ID: <48BEDA92.6070504@sicortex.com> Lux, James P wrote: > > > > On 9/3/08 10:34 AM, "Peter St. John" wrote: > > I'm thinking that multicore will make topology interesting again, > because of the difference between intercore on a common chip vs > going through a nic to even the fastest fabric. > Peter > It is probably worth putting numbers on statements like this. For example, a main memory reference on a fast processor these days is around 80 nanoseconds. Sending a message to a process on another node on a fast IB network is getting to 1.2 microseconds. Communicating to another thread on the same socket is probably not much faster than a memory reference since you have to thrash a cache-line or two back and forth between cores. The numbers for SiCortex stuff are similar: 80 ns for memory, 1 microsecond for MPI nearest-neighbor, 1.3 microseconds for max-diameter. Core to core via shared memory is about 300 ns, IIRC. We think of messaging to other nodes as taking a long time, but it isn't really so. It is perfectly reasonable to think of programs that communicate every 1000 flops or so, in the same way we think of 15-50 flops per cache miss as "reasonable". So I am deeply skeptical of the current furor about how we need new programming models for "multicore chips". We have models that work perfectly well for 100-1000 core clusters, lets use them. -- -Larry / Sector IX -------------- next part -------------- An HTML attachment was scrubbed... URL: From perry at piermont.com Wed Sep 3 12:59:46 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 03 Sep 2008 15:59:46 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BEDA92.6070504@sicortex.com> (Lawrence Stewart's message of "Wed\, 03 Sep 2008 14\:42\:26 -0400") References: <48BEDA92.6070504@sicortex.com> Message-ID: <878wu9t2z1.fsf@snark.cb.piermont.com> Lawrence Stewart writes: >> On 9/3/08 10:34 AM, "Peter St. John" wrote: >> >> I'm thinking that multicore will make topology interesting again, >> because of the difference between intercore on a common chip vs >> going through a nic to even the fastest fabric. >> Peter > > It is probably worth putting numbers on statements like this. For > example, a main memory reference on a fast processor these days is > around 80 nanoseconds. Sending a message to a process on another > node on a fast IB network is getting to 1.2 microseconds. > Communicating to another thread on the same socket is probably not > much faster than a memory reference since you have to thrash a > cache-line or two back and forth between cores. Quite. It is possible that future generations of multi-core architectures will do differently, but right now, a multi-core chip looks a lot (to software) like a normal SMP setup. (I do wonder a lot whether a return to vector architectures might make more sense than multi-core -- there is at least a lot of precedent for making use of vector silicon with good compilers.) > So I am deeply skeptical of the current furor about how we need new > programming models for "multicore chips". We have models that work > perfectly well for 100-1000 core clusters, lets use them. Well, not quite. The HPC community is very good at using such things, so it isn't going to have trouble. The issue is not for people doing scientific computing, but for people doing "normal" applications. Beyond the scope of this mailing list of course. -- Perry E. Metzger perry at piermont.com From herborn at usna.edu Wed Sep 3 13:19:56 2008 From: herborn at usna.edu (Steve Herborn) Date: Wed, 3 Sep 2008 16:19:56 -0400 Subject: [Beowulf] Stroustrup regarding multicore In-Reply-To: References: <6.2.5.6.2.20080826081513.04f14298@swcp.com> <87prnv20ky.fsf@snark.cb.piermont.com> <87d4jv1zwo.fsf@snark.cb.piermont.com> <20080826140446.0479b7b3@localhost.localdomain> <87zlmzy6iw.fsf@snark.cb.piermont.com> <49D2DE14-1F0A-44B9-B05F-BA49D3766C7E@sanger.ac.uk> <9E8A6C2B-DB97-4BCC-97E2-A97EE5A11FA7@xs4all.nl> <0974FCFCC905405B8278F2D805667336@dynamic.usna.edu> <87prntyydv.fsf@snark.cb.piermont.com> Message-ID: <2196A5AE3175497A9788E9523217FB75@dynamic.usna.edu> I guess that is why I have always preferred a RAD/JAD environment to a strict Waterfall one. One can spend eons creating the world's most perfect spec, but then the problem changes. From my perspective ideally the Subject Matter Expert & the Programmer become tied at the hip with each taking away a little of the others knowledge by the time the "project" is over & done with. When I first started programming in support of Electronic Warfare I didn't know doodle about RADAR parametric data, know I at least know how to spell it. :-) _____ From: Peter St. John [mailto:peter.st.john at gmail.com] Sent: Thursday, August 28, 2008 11:54 AM To: Perry E. Metzger Cc: Steve Herborn; Beowulf at beowulf.org Subject: Re: [Beowulf] Stroustrup regarding multicore I agree entirely with Perry here. I'd take it further: even in the case of giving the machinist instructions, "12x12 with holes here and here", it would help if the machinist has some sense of what you are building. Will the product be hot enough so that metal expands and contracts? humidity? should the finish be gloss or matte? copper, aluminum...? The machinist will help you build a better gizmo if he has some feel for what your are building, what you need the part for. He will have relevant experience from his machinist perspective. Mixed unit tactics are the path to victory, but the mixed units perform better together if they have some sense of each other's jobs. Some of us specialize in a very focused way, some of us generalize, but to work as a team we need to learn some of each other's jobs. I think a physicist programming is like an astronomer grinding lenses (maybe nobody does that anymore). Some astronomers (in the old days) ground their own lenses and ended up contributing to optics; others never looked through telescopes, they do math on the measurements taken by others. Some computer scientists don't program; some mathematicians can hardly use email. But most of us learn bits and pieces of each others jobs, in varying degrees; it's necessary to communicate effectively, and what's the subject for one guy is a tool for another guy. Peter On 8/28/08, Perry E. Metzger wrote: "Steve Herborn" writes: > However, that being said I would think that it is usually easier to teach a > Scientist to code, then a coder the PhD level of the science. I think either is fine -- you wind up with someone who knows both. The problem is when you try to segregate the two skills. I think I finally have the right analogy. A physicist is interested in advancing physics, not in advancing mathematics, but as the tools of physics are all made of math, he cannot ignore the math or hope to turn to a specialized hired mathematician who knows no physics to do his math thinking for him. The math and the physics are integrated -- you need one mind to see both in order to get anywhere. Writing good software for physics problems is no different. The physics and the software are one. You can complain that you want to do physics, not computing, but that's exactly like complaining you want to do physics and not math. Indeed, software pretty much *IS* math. The attempt seems to be to somehow treat the computer science as though it were the software equivalent of machine shop work. You're building a new instrument, so you draw up the parts, and then you ask a machinist to make them. "I need a sheet of metal 12cm by 12cm with holes here and here." It is somehow imagined that you can do that with the software -- you make some vague guesses about what you might need and write a spec (which is imagined to be like a blueprint for a part) and ask a "software machinist" to make it for you. Unfortunately, this misses the point -- the computer programming is not like machining the parts for the instrument, it is like *designing* the instrument. That requires both knowledge of both fields, not just of one. It is not at all like machining. This is of course a serious problem. It takes at least several years of effort to become facile with computer software just as it takes several years of effort to become facile with calculus, differential equations, etc., etc., and fundamentally one wants to be doing science, not math or computer programming, but I can't see any real way around it in the long term if progress is to be made. Incidentally, THIS IS NOT A NEW ARGUMENT. It was only a little over a century ago that people scoffed at the idea that engineers needed to learn higher mathematics. "I'm trying to build a bridge, not to do math!" was the general sort of attitude that was common. Eventually, people realized that there was no way around it, you just had to spend the time to learn the math or you couldn't be productive. I expect something similar is going to happen here. Perry -- Perry E. Metzger perry at piermont.com _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerry.creager at tamu.edu Wed Sep 3 15:46:14 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed, 03 Sep 2008 17:46:14 -0500 Subject: Software engineering Re: [Beowulf] Stroustrup regarding multicore In-Reply-To: References: Message-ID: <48BF13B6.5060101@tamu.edu> Lux, James P wrote: > > > > On 9/3/08 7:31 AM, "stephen mulcahy" wrote: > > > > Perry E. Metzger wrote: > > How it is possible that people managed to read that much and hear > > exactly the inverse of my central thesis, I don't understand at > > all. Perhaps everyone just hears what they want to. > > Sheesh, I resisted for a long time but .... > > The scenario above pretty much sums up the situation I see with one of > the softer sides of software engineering - the requirements gathering, > which I'd see as fundamental to a successful (software, or indeed > general IT project). IMHO, the most important part of most projects is > figuring out what the heck the "stakeholder"[1] wants in the first > place. > > --- And that?s assuming the stakeholder really understands what they > want.. Often it evolves as understanding improves (this is one of > the arguments for RAD and XP). Rule 1: Never let an oceanographer with 2 FORTRAN courses design or maintain any software project with more than 16 lines of code checked into the repository. Rule 2: Oceanographers with less than 2 formal FORTRAN classes have decided they're really nacent software engineers because they've mastered most of the buzzwords, and thus will give you all sorts of "requirements" which are typically orthogonal to any software design or engineering training you've had. Rule 3: If you finally convince the denizens of Rule 2 that their application cannot be written as spec'd, they are suddenly experts in "Service Oriented Architecture" and Software as a Service. And you're not. So there. Just believe me. Really. Cause I said so. Don't ask me how learned these truths. Oh, and the bar is slightly higher for meteorologists, by a couple of additional formal software classes. But in the grand scheme of things, it's not much higher... > No matter how good your programming is, if your requirements are > wrong - you're heading in the wrong direction entirely (a bit like > building a really neat spacecraft and then launching it towards Pluto > instead of Mars[2]). ibid. Been there, done that. But not for spacecraft. Or, I could talk about computer scientists just now discovering what discipline experts do for the Data-Net NSF call that's out now, but that's another thread... > ----- All depends on the alignments of planets and stars.. I > wouldn?t go so far as to say things are planned using astrology, but > we (JPL) are probably one of the few businesses around that can use > the motions of heavenly bodies to predict our business base and > workforce requirements. Every 26 months as Earth comes into trine > with Mars is an auspicious time for launch (you want to launch at a > time that is roughly half the trip length before closest approach) Show-off :-) > This is SoftwareAnalysisAndDesign at beowulf.org right? > > > --- you betcha.. When it?s not HardwareAnalysisAndDesign... > > Jim Lux > > > -stephen > > [1] Am I the only one that can't help using that word and visualing a > Van Helsing type waving a wooden stake around? They're grasping it to keep me from driving it through their hearts... Oh, yeah. That meeting is already over. > --- Cecil Adams of ?The Straight Dope? says that wooden stakes only > work on some kinds of beasts. It?s apparently a geographic thing.. > Other places you need silver bullets, garlic, or something else. My personal preference is a garlic-flavored wooden stake. I keep the silver bullets for backup when I missed with the stake. -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From james.p.lux at jpl.nasa.gov Wed Sep 3 15:50:39 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 3 Sep 2008 15:50:39 -0700 Subject: Software engineering Re: [Beowulf] Stroustrup regarding multicore In-Reply-To: <48BF13B6.5060101@tamu.edu> References: <48BF13B6.5060101@tamu.edu> Message-ID: > --- Cecil Adams of "The Straight Dope" says that wooden stakes only > work on some kinds of beasts. It's apparently a geographic thing.. > Other places you need silver bullets, garlic, or something else. My personal preference is a garlic-flavored wooden stake. I keep the silver bullets for backup when I missed with the stake. Because it's really, really important... http://www.straightdope.com/columns/read/37/whats-the-best-way-to-kill-a-vampire From libo at buaa.edu.cn Wed Sep 3 23:34:00 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Thu, 4 Sep 2008 14:34:00 +0800 Subject: [Beowulf] Re: Beowulf Digest, Vol 55, Issue 2 References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> Message-ID: <002001c90e58$3c20c020$6300a8c0@LIBO> Hello, Is it too expensive for the platform? The easy solution is: And X48 level motherboard with CF support, about $150 Q6600 Processor, about $170 Two 4870X2 $1,100 Two Seagate SATA Harddisk 500G for Raid1, about $140 4*2G DDR2 RAM, about $150 PSU 1000W, about $200 A big box, about $100 That's all, in total, $2,010. Regards, Li, Bo ----- Original Message ----- From: Maurice Hilarius To: beowulf at beowulf.org Cc: kus at free.net ; libo at buaa.edu.cn ; i.kozin at dl.ac.uk Sent: Thursday, September 04, 2008 6:51 AM Subject: Re: Beowulf Digest, Vol 55, Issue 2 Li, bo wrote: .. From: "Li, Bo" Subject: Re: [Beowulf] gpgpu Hello, It seemed that you had got a very good example for GPGPU. As I said before, it's not the time for GPGPU to do the DP calculation at the moment. If you can bear SP computation, you will find more about it. NVidia just sent me some special offer about their Tesla platforms, which said that the workstation equipped with two GTX280 level professional cards costs about $5000, not bad. But my intention is still to lower the core frequency of a gaming card, and use it for computation. Regards, Li, Bo Looking at AMD/ATI Firestream and 4850 pricing, it is not too bad: AMD FIRESTREAM 9250 STREAM PROCESSOR (P/N: 100-505563) $880 VISIONTEK RADEON HD4870X2 2GB PCI-E (P/N: 900250) $575 VISIONTEK RADEON HD 4870 512MB PCI-E (P/N: 900244) $355 The 4870 and X2 also run the AMD code. So, given a decent machine, with 4 cores and a pair of the 4870X2, one can achieve some pretty amazing GPU performance levels for a system well under $4,000. With dualX2s ( 4 GPU engines) around $4700 ( extra PSU capacity and cooling is needed for that level). I hear that AMD have a new Firestream coming, with the 48x0 family chips on it, but that will likely be a bit on the pricier side.. Anyway, the Firestream has GPUs with Double-Precision Floating Point. Something the nVidia offerings do not. Worth considering. http://ati.amd.com/technology/streamcomputing/product_firestream_9250.html SDK: http://ati.amd.com/technology/streamcomputing/sdkdwnld.html -- With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:maurice at harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 -------------- next part -------------- An HTML attachment was scrubbed... URL: From maurice at harddata.com Wed Sep 3 15:51:52 2008 From: maurice at harddata.com (Maurice Hilarius) Date: Wed, 03 Sep 2008 16:51:52 -0600 Subject: [Beowulf] Re: Beowulf Digest, Vol 55, Issue 2 In-Reply-To: <200809021901.m82J0IaS029547@bluewest.scyld.com> References: <200809021901.m82J0IaS029547@bluewest.scyld.com> Message-ID: <48BF1508.7000406@harddata.com> Li, bo wrote: > .. > From: "Li, Bo" > Subject: Re: [Beowulf] gpgpu > > Hello, > It seemed that you had got a very good example for GPGPU. As I said before, it's not the time for GPGPU to do the DP calculation at the moment. If you can bear SP computation, you will find more about it. > NVidia just sent me some special offer about their Tesla platforms, which said that the workstation equipped with two GTX280 level professional cards costs about $5000, not bad. But my intention is still to lower the core frequency of a gaming card, and use it for computation. > Regards, > Li, Bo > Looking at AMD/ATI Firestream and 4850 pricing, it is not too bad: AMD FIRESTREAM 9250 STREAM PROCESSOR (P/N: 100-505563) $880 VISIONTEK RADEON HD4870X2 2GB PCI-E (P/N: 900250) $575 VISIONTEK RADEON HD 4870 512MB PCI-E (P/N: 900244) $355 The 4870 and X2 also run the AMD code. So, given a decent machine, with 4 cores and a pair of the 4870X2, one can achieve some pretty amazing GPU performance levels for a system well under $4,000. With dualX2s ( 4 GPU engines) around $4700 ( extra PSU capacity and cooling is needed for that level). I hear that AMD have a new Firestream coming, with the 48x0 family chips on it, but that will likely be a bit on the pricier side.. Anyway, the Firestream has GPUs with Double-Precision Floating Point. Something the nVidia offerings do not. Worth considering. http://ati.amd.com/technology/streamcomputing/product_firestream_9250.html SDK: http://ati.amd.com/technology/streamcomputing/sdkdwnld.html -- With our best regards, //Maurice W. Hilarius Telephone: 01-780-456-9771/ /Hard Data Ltd. FAX: 01-780-456-9772/ /11060 - 166 Avenue email:maurice at harddata.com/ /Edmonton, AB, Canada http://www.harddata.com// / T5X 1Y3/ / -------------- next part -------------- An HTML attachment was scrubbed... URL: From Craig.Tierney at noaa.gov Thu Sep 4 08:56:13 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Thu, 04 Sep 2008 09:56:13 -0600 Subject: [Beowulf] Re: Beowulf Digest, Vol 55, Issue 2 In-Reply-To: <48BF1508.7000406@harddata.com> References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> Message-ID: <48C0051D.2020700@noaa.gov> ... stuff deleted > > AMD FIRESTREAM 9250 STREAM PROCESSOR (P/N: 100-505563) $880 > VISIONTEK RADEON HD4870X2 2GB PCI-E (P/N: 900250) $575 > VISIONTEK RADEON HD 4870 512MB PCI-E (P/N: 900244) $355 > > The 4870 and X2 also run the AMD code. > > So, given a decent machine, with 4 cores and a pair of the 4870X2, one > can achieve some pretty amazing GPU > performance levels for a system well under $4,000. > > With dualX2s ( 4 GPU engines) around $4700 ( extra PSU capacity and > cooling is needed for that level). > > I hear that AMD have a new Firestream coming, with the 48x0 family chips > on it, but that will likely be a bit on the pricier side.. > > Anyway, the Firestream has GPUs with Double-Precision Floating Point. > Something the nVidia offerings do not. > > Worth considering. > This is not correct. The NVIDIA GT200 series supports IEEE DP FP in hardware. NVIDIA only has 1 DP FP unit per streaming processor (24 on the GTX280) which is 1/8 the number of units of single-precision floating point (each thread has its own unit). So the max DP FP rate on a GTX280 is about 90 Gflops. Does anyone know the peak bandwidth of the new Firestream cards? I looked around and all I could find is that it uses GDDR3. The wikipedia entry says the max bandwidth of the 9250 is 63.5 GB/s. This is less than half the GTX280 (max at 140 GB/s, measured using stream like app at 115 GB/s). If it is true, the GTX280 may be better for memory bound codes. That is, if we can write efficient code for them and leave the whole problem on the GPU to avoid memory bandwidth issues across the bus. Craig > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Craig Tierney (craig.tierney at noaa.gov) From kus at free.net Thu Sep 4 09:51:51 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu, 04 Sep 2008 20:51:51 +0400 Subject: [Beowulf] Re: Beowulf Digest, Vol 55, Issue 2 In-Reply-To: <002001c90e58$3c20c020$6300a8c0@LIBO> Message-ID: In message from "Li, Bo" (Thu, 4 Sep 2008 14:34:00 +0800): >Hello, >Is it too expensive for the platform? >The easy solution is: >And X48 level motherboard with CF support, about $150 >Q6600 Processor, about $170 >Two 4870X2 $1,100 Do somebody know, are ACML routines parallelized for using of few GPGPUs ? Mikhail >Two Seagate SATA Harddisk 500G for Raid1, about $140 >4*2G DDR2 RAM, about $150 >PSU 1000W, about $200 >A big box, about $100 > >That's all, in total, $2,010. >Regards, >Li, Bo > ----- Original Message ----- > From: Maurice Hilarius > To: beowulf at beowulf.org > Cc: kus at free.net ; libo at buaa.edu.cn ; i.kozin at dl.ac.uk > Sent: Thursday, September 04, 2008 6:51 AM > Subject: Re: Beowulf Digest, Vol 55, Issue 2 > > > Li, bo wrote: >.. >From: "Li, Bo" >Subject: Re: [Beowulf] gpgpu > >Hello, >It seemed that you had got a very good example for GPGPU. As I said >before, it's not the time for GPGPU to do the DP calculation at the >moment. If you can bear SP computation, you will find more about it. >NVidia just sent me some special offer about their Tesla platforms, >which said that the workstation equipped with two GTX280 level >professional cards costs about $5000, not bad. But my intention is >still to lower the core frequency of a gaming card, and use it for >computation. >Regards, >Li, Bo > Looking at AMD/ATI Firestream and 4850 pricing, it is not too bad: > > AMD FIRESTREAM 9250 STREAM PROCESSOR (P/N: 100-505563) $880 > VISIONTEK RADEON HD4870X2 2GB PCI-E (P/N: 900250) > $575 > VISIONTEK RADEON HD 4870 512MB PCI-E (P/N: 900244) > $355 > > The 4870 and X2 also run the AMD code. > > So, given a decent machine, with 4 cores and a pair of the 4870X2, >one can achieve some pretty amazing GPU > performance levels for a system well under $4,000. > > With dualX2s ( 4 GPU engines) around $4700 ( extra PSU capacity and >cooling is needed for that level). > > I hear that AMD have a new Firestream coming, with the 48x0 family >chips on it, but that will likely be a bit on the pricier side.. > > Anyway, the Firestream has GPUs with Double-Precision Floating >Point. > Something the nVidia offerings do not. > > Worth considering. > > http://ati.amd.com/technology/streamcomputing/product_firestream_9250.html > > SDK: > http://ati.amd.com/technology/streamcomputing/sdkdwnld.html > > > > > -- > With our best regards, > > Maurice W. Hilarius Telephone: 01-780-456-9771 > Hard Data Ltd. FAX: 01-780-456-9772 > 11060 - 166 Avenue email:maurice at harddata.com > Edmonton, AB, Canada http://www.harddata.com/ > T5X 1Y3 From prentice at ias.edu Thu Sep 4 11:37:23 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 04 Sep 2008 14:37:23 -0400 Subject: [Beowulf] Infiniband Subnet Manager In-Reply-To: <48B69CEE.3040802@ias.edu> References: <48B69CEE.3040802@ias.edu> Message-ID: <48C02AE3.9040806@ias.edu> Prentice Bisbal wrote: > Since an infiniband fabric needs a subnet mananger, should the master > node have an IB HCA and be connected to the IB network in order to run > the subnet manager? > > My logic behind this is that the master node will be full > enterprise-level hardware (redundant every thing), and should never go > down or be rebooted during normal use. I expect the nodes to go down > more frequently (not fully redundant hardware, higher operating loads, > etc.). > > Exactly what functions does the subnet manager perform, and what happens > if it disappears from the IB fabric? > > I've been doing research into IB all day yesterday, and I'm continuing > today, so please no RTFM answers. > I've gotten a lot of response to my IB questions that I posed to the list. Thanks for all your help. All of my questions have been answered. It turns out, as some as you pointed out, that my switch will have a built-in subnet manager, so I won't need to run one on a node. -- Prentice From atp at piskorski.com Thu Sep 4 12:54:20 2008 From: atp at piskorski.com (Andrew Piskorski) Date: Thu, 4 Sep 2008 15:54:20 -0400 Subject: [Beowulf] Nvidia GT200, double precision vs. native pair In-Reply-To: <48C0051D.2020700@noaa.gov> References: <48C0051D.2020700@noaa.gov> Message-ID: <20080904195420.GC34060@piskorski.com> On Thu, Sep 04, 2008 at 09:56:13AM -0600, Craig Tierney wrote: > Subject: Re: [Beowulf] Re: Beowulf Digest, Vol 55, Issue 2 > This is not correct. The NVIDIA GT200 series supports IEEE DP FP in > hardware. NVIDIA only has 1 DP FP unit per streaming processor (24 > on the GTX280) which is 1/8 the number of units of single-precision > floating point (each thread has its own unit). So the max DP FP > rate on a GTX280 is about 90 Gflops. So has anyone taken those 8 single-precision floating point units and tried using them to get double-precision or better accuracy? Perhaps using the "native-pair" and "speculative precision" approaches discussed here: http://aggregate.org/NPAR/ The 2006 paper there talks about doing so on a Nvidia GeForce 6800 Ultra, on which a (c. 64 bit) native-pair calculation took about 10x the clock cycles of a single 32 bit flop (better for sqrt). -- Andrew Piskorski http://www.piskorski.com/ From niftyompi at niftyegg.com Thu Sep 4 15:16:36 2008 From: niftyompi at niftyegg.com (Nifty niftyompi Mitch) Date: Thu, 4 Sep 2008 15:16:36 -0700 Subject: [Beowulf] Infiniband Subnet Manager In-Reply-To: <48C02AE3.9040806@ias.edu> References: <48B69CEE.3040802@ias.edu> <48C02AE3.9040806@ias.edu> Message-ID: <20080904221636.GA4234@hpegg.wr.niftyegg.com> On Thu, Sep 04, 2008 at 02:37:23PM -0400, Prentice Bisbal wrote: > Prentice Bisbal wrote: > > Since an infiniband fabric needs a subnet mananger, should the master > > node have an IB HCA and be connected to the IB network in order to run > > the subnet manager? > > > > My logic behind this is that the master node will be full > > enterprise-level hardware (redundant every thing), and should never go > > down or be rebooted during normal use. I expect the nodes to go down > > more frequently (not fully redundant hardware, higher operating loads, > > etc.). > > > > Exactly what functions does the subnet manager perform, and what happens > > if it disappears from the IB fabric? > > > > I've been doing research into IB all day yesterday, and I'm continuing > > today, so please no RTFM answers. > > > > I've gotten a lot of response to my IB questions that I posed to the > list. Thanks for all your help. All of my questions have been answered. > It turns out, as some as you pointed out, that my switch will have a > built-in subnet manager, so I won't need to run one on a node. > I should add that a built in subnet manager is extra $$. Also they tend to run on a modest dedicated processor card. The modest dedicated card solutions have limited RAM and will not support a gonzo big fabric. A rule of thumb, depending on the subnet manager on the card the run out of memory recources trip point is about 144 ports. The richer the statistics gathered and retained the larger the footprint is. Large fabrics will need a host based subnet manager. -- T o m M i t c h e l l Got a great hat... now what. From libo at buaa.edu.cn Thu Sep 4 22:06:49 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Fri, 5 Sep 2008 13:06:49 +0800 Subject: [Beowulf] Re: GPU boards and cluster servers. References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> Message-ID: <002401c90f15$38e48500$6300a8c0@LIBO> Hello, It seems your platform is more suitable for a cluster. Great, and when are the products available? And is there any software support from you? Regards, Li, Bo ----- Original Message ----- From: Maurice Hilarius To: Li, Bo Cc: kus at free.net ; i.kozin at dl.ac.uk ; Beowulf Mailing List Sent: Friday, September 05, 2008 9:36 AM Subject: GPU boards and cluster servers. Li, Bo wrote: Hello, Is it too expensive for the platform? The easy solution is: And X48 level motherboard with CF support, about $150 Q6600 Processor, about $170 Two 4870X2 $1,100 Two Seagate SATA Harddisk 500G for Raid1, about $140 4*2G DDR2 RAM, about $150 PSU 1000W, about $200 A big box, about $100 That's all, in total, $2,010. Regards, Li, Bo True, to a point. Most people will not use a desktop board for a cluster. Too I/O bound. Finally the memory capacity of these desktop boards is pretty limiting. Typically 8GB maximum. Generally a XEON or Opteron chipset and CPUs will be the choice. Also, for most GPU/FPU performance work, the memory bandwidth bottleneck on the Intel product is too much of a negative factor. Lastly, for clusters, most want a rackmount chassis. We developed a 2U designed for a server board and 2 GPU boards. The big challenge there is power. We use dual 600W PSUs. One for motherboard, and one for dual GPU boards. -- With our best regards, Maurice W. Hilarius Telephone: 01-780-456-9771 Hard Data Ltd. FAX: 01-780-456-9772 11060 - 166 Avenue email:maurice at harddata.com Edmonton, AB, Canada http://www.harddata.com/ T5X 1Y3 -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at moonet.co.uk Fri Sep 5 00:51:55 2008 From: andrew at moonet.co.uk (andrew holway) Date: Fri, 5 Sep 2008 08:51:55 +0100 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <002401c90f15$38e48500$6300a8c0@LIBO> References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> Message-ID: The new Dell R5400 Rackmount workstation is ideal for this. You can slip two Xeons, 16GB ram and two chunky graphics cards in there. ta Andy On Fri, Sep 5, 2008 at 6:06 AM, Li, Bo wrote: > Hello, > It seems your platform is more suitable for a cluster. Great, and when are > the products available? And is there any software support from you? > Regards, > Li, Bo > > ----- Original Message ----- > From: Maurice Hilarius > To: Li, Bo > Cc: kus at free.net ; i.kozin at dl.ac.uk ; Beowulf Mailing List > Sent: Friday, September 05, 2008 9:36 AM > Subject: GPU boards and cluster servers. > Li, Bo wrote: > > Hello, > Is it too expensive for the platform? > The easy solution is: > And X48 level motherboard with CF support, about $150 > Q6600 Processor, about $170 > Two 4870X2 $1,100 > Two Seagate SATA Harddisk 500G for Raid1, about $140 > 4*2G DDR2 RAM, about $150 > PSU 1000W, about $200 > A big box, about $100 > > That's all, in total, $2,010. > Regards, > Li, Bo > > True, to a point. > Most people will not use a desktop board for a cluster. > Too I/O bound. > Finally the memory capacity of these desktop boards is pretty limiting. > Typically 8GB maximum. > > > Generally a XEON or Opteron chipset and CPUs will be the choice. > > Also, for most GPU/FPU performance work, the memory bandwidth bottleneck on > the Intel product is too much of a negative factor. > > Lastly, for clusters, most want a rackmount chassis. > We developed a 2U designed for a server board and 2 GPU boards. > The big challenge there is power. > > We use dual 600W PSUs. One for motherboard, and one for dual GPU boards. > > > -- > With our best regards, > > Maurice W. Hilarius Telephone: 01-780-456-9771 > Hard Data Ltd. FAX: 01-780-456-9772 > 11060 - 166 Avenue email:maurice at harddata.com > Edmonton, AB, Canada http://www.harddata.com/ > T5X 1Y3 > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From i.kozin at dl.ac.uk Fri Sep 5 04:36:53 2008 From: i.kozin at dl.ac.uk (Kozin, I (Igor)) Date: Fri, 5 Sep 2008 12:36:53 +0100 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: Message-ID: > The new Dell R5400 Rackmount workstation is ideal for this. You can > slip two Xeons, 16GB ram and two chunky graphics cards in there. The slots in R5400 are PCIe gen1 and 300W total for the graphics might be a bit too low. The best I've seen so far are 1U HP DL160G5 servers which offer two PCIe 16x Gen2 slots. Granted, you will not be able to fit in a powerful graphics card in there but a Tesla setup works quite well. There is a very interesting recent report published by HP http://www.hp.com/techservers/hpccn/hpccollaboration/ADCatalyst/download s/accelerating-HPCUsing-GPUs.pdf They benchmarked DL160G5 (with single processor => pretty low cost of the host server) with S870 attached to it. Observed peak performance on SGEMM was about 200 GFLOPS which is much lower than the theoretical peak 512 GFLOPS (even much less than 350 sustained claimed by Nvidia). When they factor in i/o, the performance rapidly approaches that of Intel quad-core. That's not to say GPUs are useless even at single precision; some results are pretty good. The team promised to benchmark FireStream next. > Generally a XEON or Opteron chipset and CPUs will be the choice. > > Also, for most GPU/FPU performance work, the memory bandwidth bottleneck > on the Intel product is too much of a negative factor. Yes, memory bandwidth can be a problem for Intel servers. Now. But we all know this is going to change soon. More surprisingly Opteron based servers do not offer PCIe Gen2 just yet. Perhaps it was long time ago when I checked it last time. The paper cited above indicates very significant impact of PCIe Gen2 on the bandwidth. From gerry.creager at tamu.edu Fri Sep 5 05:51:58 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Fri, 05 Sep 2008 07:51:58 -0500 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> Message-ID: <48C12B6E.4090903@tamu.edu> At $6k US, and requiring me to get Vista, I'd rather build a system starting with, e.g., an Asus motherboard. I save one-third the price and I don't have to file the environmental impact statement on the flawed OS. I also get NICs I can easily set to accommodate Jumbo Frames. gerry andrew holway wrote: > The new Dell R5400 Rackmount workstation is ideal for this. You can > slip two Xeons, 16GB ram and two chunky graphics cards in there. > > ta > > Andy > > On Fri, Sep 5, 2008 at 6:06 AM, Li, Bo wrote: >> Hello, >> It seems your platform is more suitable for a cluster. Great, and when are >> the products available? And is there any software support from you? >> Regards, >> Li, Bo >> >> ----- Original Message ----- >> From: Maurice Hilarius >> To: Li, Bo >> Cc: kus at free.net ; i.kozin at dl.ac.uk ; Beowulf Mailing List >> Sent: Friday, September 05, 2008 9:36 AM >> Subject: GPU boards and cluster servers. >> Li, Bo wrote: >> >> Hello, >> Is it too expensive for the platform? >> The easy solution is: >> And X48 level motherboard with CF support, about $150 >> Q6600 Processor, about $170 >> Two 4870X2 $1,100 >> Two Seagate SATA Harddisk 500G for Raid1, about $140 >> 4*2G DDR2 RAM, about $150 >> PSU 1000W, about $200 >> A big box, about $100 >> >> That's all, in total, $2,010. >> Regards, >> Li, Bo >> >> True, to a point. >> Most people will not use a desktop board for a cluster. >> Too I/O bound. >> Finally the memory capacity of these desktop boards is pretty limiting. >> Typically 8GB maximum. >> >> >> Generally a XEON or Opteron chipset and CPUs will be the choice. >> >> Also, for most GPU/FPU performance work, the memory bandwidth bottleneck on >> the Intel product is too much of a negative factor. >> >> Lastly, for clusters, most want a rackmount chassis. >> We developed a 2U designed for a server board and 2 GPU boards. >> The big challenge there is power. >> >> We use dual 600W PSUs. One for motherboard, and one for dual GPU boards. >> >> >> -- >> With our best regards, >> >> Maurice W. Hilarius Telephone: 01-780-456-9771 >> Hard Data Ltd. FAX: 01-780-456-9772 >> 11060 - 166 Avenue email:maurice at harddata.com >> Edmonton, AB, Canada http://www.harddata.com/ >> T5X 1Y3 >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From prentice at ias.edu Fri Sep 5 07:48:16 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 05 Sep 2008 10:48:16 -0400 Subject: [Beowulf] Infiniband Subnet Manager In-Reply-To: <20080904221636.GA4234@hpegg.wr.niftyegg.com> References: <48B69CEE.3040802@ias.edu> <48C02AE3.9040806@ias.edu> <20080904221636.GA4234@hpegg.wr.niftyegg.com> Message-ID: <48C146B0.1020102@ias.edu> Nifty niftyompi Mitch wrote: >> I've gotten a lot of response to my IB questions that I posed to the >> list. Thanks for all your help. All of my questions have been answered. >> It turns out, as some as you pointed out, that my switch will have a >> built-in subnet manager, so I won't need to run one on a node. >> > > I should add that a built in subnet manager is extra $$. Also they tend > to run on a modest dedicated processor card. The modest dedicated card > solutions have limited RAM and will not support a gonzo big fabric. > A rule of thumb, depending on the subnet manager on the card the run out of memory > recources trip point is about 144 ports. > > The richer the statistics gathered and retained the larger the footprint is. > > Large fabrics will need a host based subnet manager. Thanks my cluster is only 64 nodes, so that shouldn't be a problem. -- Prentice From rgb at phy.duke.edu Fri Sep 5 08:29:02 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri, 5 Sep 2008 11:29:02 -0400 (EDT) Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C12B6E.4090903@tamu.edu> References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> <48C12B6E.4090903@tamu.edu> Message-ID: On Fri, 5 Sep 2008, Gerry Creager wrote: > At $6k US, and requiring me to get Vista, I'd rather build a system starting > with, e.g., an Asus motherboard. I save one-third the price and I don't have > to file the environmental impact statement on the flawed OS. I also get NICs > I can easily set to accommodate Jumbo Frames. If you talk to a Dell rep, you can ALMOST invariably get any server-class system they sell without an operating system or with Linux installed, especially if you are ordering in quantity. Just FYI -- otherwise I don't disagree with anything you said, especially Vista of Evil. Although hey, it runs great on 4 GB and up systems, at least if you don't run large applications on it... or so I'm told. rgb > > gerry > > andrew holway wrote: >> The new Dell R5400 Rackmount workstation is ideal for this. You can >> slip two Xeons, 16GB ram and two chunky graphics cards in there. >> >> ta >> >> Andy >> >> On Fri, Sep 5, 2008 at 6:06 AM, Li, Bo wrote: >>> Hello, >>> It seems your platform is more suitable for a cluster. Great, and when are >>> the products available? And is there any software support from you? >>> Regards, >>> Li, Bo >>> >>> ----- Original Message ----- >>> From: Maurice Hilarius >>> To: Li, Bo >>> Cc: kus at free.net ; i.kozin at dl.ac.uk ; Beowulf Mailing List >>> Sent: Friday, September 05, 2008 9:36 AM >>> Subject: GPU boards and cluster servers. >>> Li, Bo wrote: >>> >>> Hello, >>> Is it too expensive for the platform? >>> The easy solution is: >>> And X48 level motherboard with CF support, about $150 >>> Q6600 Processor, about $170 >>> Two 4870X2 $1,100 >>> Two Seagate SATA Harddisk 500G for Raid1, about $140 >>> 4*2G DDR2 RAM, about $150 >>> PSU 1000W, about $200 >>> A big box, about $100 >>> >>> That's all, in total, $2,010. >>> Regards, >>> Li, Bo >>> >>> True, to a point. >>> Most people will not use a desktop board for a cluster. >>> Too I/O bound. >>> Finally the memory capacity of these desktop boards is pretty limiting. >>> Typically 8GB maximum. >>> >>> >>> Generally a XEON or Opteron chipset and CPUs will be the choice. >>> >>> Also, for most GPU/FPU performance work, the memory bandwidth bottleneck >>> on >>> the Intel product is too much of a negative factor. >>> >>> Lastly, for clusters, most want a rackmount chassis. >>> We developed a 2U designed for a server board and 2 GPU boards. >>> The big challenge there is power. >>> >>> We use dual 600W PSUs. One for motherboard, and one for dual GPU boards. >>> >>> >>> -- >>> With our best regards, >>> >>> Maurice W. Hilarius Telephone: 01-780-456-9771 >>> Hard Data Ltd. FAX: 01-780-456-9772 >>> 11060 - 166 Avenue email:maurice at harddata.com >>> Edmonton, AB, Canada http://www.harddata.com/ >>> T5X 1Y3 >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From diep at xs4all.nl Fri Sep 5 11:23:58 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 5 Sep 2008 20:23:58 +0200 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> <48C12B6E.4090903@tamu.edu> Message-ID: AFAIK only rich government departments do business with companies such as DELL. If you're real big you HAVE to sign some sort of deal with a big store anyway. DELL delivers very old junk for the price you can get newer junk usual. Big companies have big overhead, maybe sometimes simply because they decided they WANT x% profit on all deals. I remember at some company i worked for a few months, that i got from DELL hardware that wasn't getting sold even anymore. Obviously that means the service contract in question is just wasting government money. The big trick all salesman understand and civil servant type managers/ directors do not so well understand is that something that's new this year, that if you deliver that 2 years from now, that it's total outdated. Not long ago i heard that a specific company bought off a service contract delivering XT machines. For those here who want to run artificial intelligent software they all have a big need for crunching power at a low price, much in contradiction to the rest here which just wants double precision AND big bandwidth AND big ram usually AND 100% reliability. At 100% reliability and BIG ram AND big bandwidth to huge RAM, there is a big price. Optimizations in that category tree searching software and such (encryption is just a subsection of it) happen at a level which most guys here at this list will never understand. It is much better optimized than other software. There happens sometimes the kind of optimizations that hardware engineers, not exactly layman, sometimes say: "oh dear is that the case?", when talking to them. Suppose you search for the holy grail at a GPU and 1 gpu's RAM is bad, so all your calculations there failed. Heh, you won't even notice it soon, as you have no 'result' that is deterministically verifiable. An example is a parameter optimization i want to perform for my chessprogram. I would need to write a new program for it which is huge at a GPU, basically that program would only be the evaluation function of my program. Such optimization runs are embarrassingly parallel. What matters is simply how many instructions a cycle i can push through. Having at home a few GPU's do that crunching work is very attractive. Biggest problem of GPU's is that i have no money to buy a machine to put a GPU in, let alone buy GPU's just to toy with it. So that's why a friend of mine is hopefully gonna run it at a core or 160 Xeons at TU-Delft, when the machines are idle and not getting used by others. Probably the best project name for this is Ikarus, wasn't it that it is already an existing chess programs name, as the final goal is to explore possibilities after auto recognizing new patterns in a later phase. Will run for years. The big difference in this type of crunching power is that if something goes wrong, that's not a problem; If a bit flips or whatever, it all doesn't matter. I just need the best parameter set it can find for me when searching for the holy grail. I understand Bo Li there very well. He wants the maximum amount of crunching power and can do with 32 bits. A good 1000 watt psu here is around a 100 euro. For under 500 euro you can assemble a great box, then only add the GPU's. Getting 40% performance out of a videocard is very impressive by the way, especially if i consider that no one around me with different types of software (from statistical software to monte carlo to multimedia encoding) doesn't get anywhere near that performance out of it. Yet the difference is, is that all persons here who are cheering for the GPU crunching power, are the same type of guys. Though on paper the software is doing something total different, they all search for some sort of holy grail in an embarrassingly parallel manner. The failed attempts are usually game tree searches that need to combine somehow results using hashtables and/or FFT type tries. Vincent On Sep 5, 2008, at 5:29 PM, Robert G. Brown wrote: > On Fri, 5 Sep 2008, Gerry Creager wrote: > >> At $6k US, and requiring me to get Vista, I'd rather build a >> system starting with, e.g., an Asus motherboard. I save one-third >> the price and I don't have to file the environmental impact >> statement on the flawed OS. I also get NICs I can easily set to >> accommodate Jumbo Frames. > > If you talk to a Dell rep, you can ALMOST invariably get any > server-class system they sell without an operating system or with > Linux > installed, especially if you are ordering in quantity. > > Just FYI -- otherwise I don't disagree with anything you said, > especially Vista of Evil. Although hey, it runs great on 4 GB and up > systems, at least if you don't run large applications on it... or > so I'm > told. > > rgb > >> >> gerry >> >> andrew holway wrote: >>> The new Dell R5400 Rackmount workstation is ideal for this. You can >>> slip two Xeons, 16GB ram and two chunky graphics cards in there. >>> ta >>> Andy >>> On Fri, Sep 5, 2008 at 6:06 AM, Li, Bo wrote: >>>> Hello, >>>> It seems your platform is more suitable for a cluster. Great, >>>> and when are >>>> the products available? And is there any software support from you? >>>> Regards, >>>> Li, Bo >>>> ----- Original Message ----- >>>> From: Maurice Hilarius >>>> To: Li, Bo >>>> Cc: kus at free.net ; i.kozin at dl.ac.uk ; Beowulf Mailing List >>>> Sent: Friday, September 05, 2008 9:36 AM >>>> Subject: GPU boards and cluster servers. >>>> Li, Bo wrote: >>>> Hello, >>>> Is it too expensive for the platform? >>>> The easy solution is: >>>> And X48 level motherboard with CF support, about $150 >>>> Q6600 Processor, about $170 >>>> Two 4870X2 $1,100 >>>> Two Seagate SATA Harddisk 500G for Raid1, about $140 >>>> 4*2G DDR2 RAM, about $150 >>>> PSU 1000W, about $200 >>>> A big box, about $100 >>>> That's all, in total, $2,010. >>>> Regards, >>>> Li, Bo >>>> True, to a point. >>>> Most people will not use a desktop board for a cluster. >>>> Too I/O bound. >>>> Finally the memory capacity of these desktop boards is pretty >>>> limiting. >>>> Typically 8GB maximum. >>>> Generally a XEON or Opteron chipset and CPUs will be the choice. >>>> Also, for most GPU/FPU performance work, the memory bandwidth >>>> bottleneck on >>>> the Intel product is too much of a negative factor. >>>> Lastly, for clusters, most want a rackmount chassis. >>>> We developed a 2U designed for a server board and 2 GPU boards. >>>> The big challenge there is power. >>>> We use dual 600W PSUs. One for motherboard, and one for dual GPU >>>> boards. >>>> -- >>>> With our best regards, >>>> Maurice W. Hilarius Telephone: 01-780-456-9771 >>>> Hard Data Ltd. FAX: 01-780-456-9772 >>>> 11060 - 166 Avenue email:maurice at harddata.com >>>> Edmonton, AB, Canada http://www.harddata.com/ >>>> T5X 1Y3 >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf at beowulf.org >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > > -- > Robert G. Brown Phone(cell): 1-919-280-8443 > Duke University Physics Dept, Box 90305 > Durham, N.C. 27708-0305 > Web: http://www.phy.duke.edu/~rgb > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From perry at piermont.com Fri Sep 5 13:19:16 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri, 05 Sep 2008 16:19:16 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: (Vincent Diepeveen's message of "Fri\, 5 Sep 2008 20\:23\:58 +0200") References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> <48C12B6E.4090903@tamu.edu>

Message-ID: <871vzys5vf.fsf@snark.cb.piermont.com> Vincent Diepeveen writes: > AFAIK only rich government departments do business with companies > such as DELL. I often buy they're equipment when I'm just looking for one ordinary 1U rackmount and such -- they're often the lowest price vendor or nearly so. If I have to do something unusual, I don't talk to them, but that's not surprising as they specialize in providing standard stuff cheap, not in providing unusual things. Perry From csamuel at vpac.org Sun Sep 7 01:58:32 2008 From: csamuel at vpac.org (Chris Samuel) Date: Sun, 7 Sep 2008 18:58:32 +1000 (EST) Subject: [Beowulf] hang-up of HPC Challenge In-Reply-To: <1152250.21220687590853.JavaMail.csamuel@ubuntu> Message-ID: <13080772.41220687850031.JavaMail.csamuel@ubuntu> ----- "Mikhail Kuzminsky" wrote: Hi Mikhail, Sorry for the delay in getting back to you, work has been keeping me very occupied! > In message from Chris Samuel (Wed, 20 Aug 2008 > 11:12:52 +1000 (EST)): > > >Does the code crash, does it just stop & idle, does it > >busy loop, does the node oops, does it lockup, etc ? > > I beleive that program crash is not hangup. When I wrote > about Linux hangup, I means that Linux don't response to > any interrupts - from keyboard, from ssh client requests etc. That really sounds like either your hitting a kernel or hardware issues - might be worth trying out the BreakIn tool that Jason posted about elsewhere on the list: http://www.advancedclustering.com/software/breakin.html > I use 2.6.22.5-31 kernel from SuSE 10.3 distribution. That's pretty old now, I'd strongly suggest trying out the current mainline kernel on there, this works pretty well on our SuperMicro based Barcelona cluster. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From maurice at harddata.com Fri Sep 5 07:32:50 2008 From: maurice at harddata.com (Maurice Hilarius) Date: Fri, 05 Sep 2008 08:32:50 -0600 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: Message-ID: <48C14312.4010502@harddata.com> Kozin, I (Igor) wrote: > .. > Yes, memory bandwidth can be a problem for Intel servers. Now. But we > all know this is going to change soon. > "Soon" ? We are hearing about a year. In the meantime AMD "Shanghai" with 4 cores on 45nm process ships this year. Also 6MB of L3 cache. This bump basically puts the AMD line even with the Intel on clock speeds with Intel. Meaning Intel will have to drop the prices on the higher clocks to be competitive, a good thing for all of us. Come mid 2009 AMD releases "Istanbul" with 6 cores, 6MB L3 cache. HT3 means the memory bandwidth interconnects double in speed, pulling them well ahead of the new Intel designs in terms of memory bandwidth. And, remember these still use simple DDR2. No FBDIMMS, DDR3, or other expensive tricks. If we are forward looking, expect 8 and 12 core Opterons by the end of 2009. Fiorano chipset comes then too, with 4 x PCI-E @ 2.0 > More surprisingly Opteron based servers do not offer PCIe Gen2 just yet. > No, but not really needed (yet) Barcelona already offers 2 links to the CPUs at 8GB/sec, so supporting enough PCI-E lanes for any board designs is easy. When you can support 4 x PCI-E 16 lanes RIGHT NOW ( as opposed to 4 x PCI-E 8 lanes on Intel chipsets), Why do you need PCI-E 2.0? Especially when you have the memory bandwidth to support it. Until Intel chipsets get better bandwidth to RAM all the slots in the world are irrelevant. > Perhaps it was long time ago when I checked it last time. The paper > cited above indicates very significant impact of PCIe Gen2 on the > bandwidth. > > They do, on the newer chipsets. -- With our best regards, //Maurice W. Hilarius Telephone: 01-780-456-9771/ /Hard Data Ltd. FAX: 01-780-456-9772/ /11060 - 166 Avenue email:maurice at harddata.com/ /Edmonton, AB, Canada http://www.harddata.com// / T5X 1Y3/ / -------------- next part -------------- An HTML attachment was scrubbed... URL: From timchipman at myrealbox.com Fri Sep 5 11:13:30 2008 From: timchipman at myrealbox.com (Tim Chipman) Date: Fri, 05 Sep 2008 14:13:30 -0400 Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 Message-ID: <1220638410.4385019ctimchipman@myrealbox.com> Very likely a hopeless question, with this little information, but just in case: Does anyone have any 'real world' experience with 'both' of these CPUs, in terms of relative performance for 'whatever work you do' ? I realize the xeon is a faster mhz part (3.16ghz xeon vs 2.3ghz opteron) so I'm more concerned with "relative performance per mhz" I'm involved with a cluster project, and we have 2 options from our vendor, - 25 compute nodes, dual-quadcore intel 5460, or - 32 compute nodes, dual-quadcore amd 2356 >From what I've been able to glean (Spec.org / SpecCPU 2006), - the intel chips have better integer performance - the amd chips have better FPU performance so the likely anticipated real-world performance result .. will depend on how a given application blends / balances things. Any comments / thoughts are certainly appreciated. --Tim Chipman From thpierce at gmail.com Sat Sep 6 08:26:35 2008 From: thpierce at gmail.com (Tom Pierce) Date: Sat, 6 Sep 2008 11:26:35 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> <48C12B6E.4090903@tamu.edu>

Message-ID: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> On Fri, Sep 5, 2008 at 2:23 PM, Vincent Diepeveen wrote: > AFAIK only rich government departments do business with companies such as > DELL. I buy DELL servers for a cluster at a commercial chemical company. They are a good price for standard systems. They also have a great Linux support organization in Austin Texas. Good equipment and high quality support for issues that arise over time. It is a cost effective solution, and Dell clusters keep popping up at US Universities as well. Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsten.aulbert at aei.mpg.de Mon Sep 8 11:16:31 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Mon, 08 Sep 2008 20:16:31 +0200 Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 In-Reply-To: <1220638410.4385019ctimchipman@myrealbox.com> References: <1220638410.4385019ctimchipman@myrealbox.com> Message-ID: <48C56BFF.6000304@aei.mpg.de> Hi Tim, Tim Chipman wrote: > Does anyone have any 'real world' experience with 'both' of these CPUs, in terms of relative performance for 'whatever work you do' ? > > I realize the xeon is a faster mhz part (3.16ghz xeon vs 2.3ghz opteron) so I'm more concerned with "relative performance per mhz" > > I'm involved with a cluster project, and we have 2 options from our vendor, > > - 25 compute nodes, dual-quadcore intel 5460, or > - 32 compute nodes, dual-quadcore amd 2356 > >>From what I've been able to glean (Spec.org / SpecCPU 2006), > > - the intel chips have better integer performance > - the amd chips have better FPU performance > > so the likely anticipated real-world performance result .. will depend on how a given application blends / balances things. We have "only" the Quad-Xeon boxes with E5435 and these are quite fast, indeed it seems that FFTW seems to run faster on Xeons but I have not made any benchmarks for the past ~ 9 months, so I don't know about the latest Opterons. I think you need to come up with a real world scenario of what will be run on the cluster and maybe compile a little benchmark yourself ans aks the vendor to run both (or get hands on on both boxes. I think that's the only "fair" comparison that's possible. HTH Carsten From tom.elken at qlogic.com Mon Sep 8 11:23:48 2008 From: tom.elken at qlogic.com (Tom Elken) Date: Mon, 8 Sep 2008 11:23:48 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> Message-ID: <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Tom Pierce ... It is a cost effective solution, and Dell clusters keep popping up at US Universities as well. Tom The same is true at UK Universities. -Tom From prentice at ias.edu Mon Sep 8 11:58:36 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 08 Sep 2008 14:58:36 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> Message-ID: <48C575DC.2000608@ias.edu> Tom Elken wrote: > From: beowulf-bounces at beowulf.org > [mailto:beowulf-bounces at beowulf.org] On Behalf Of Tom Pierce > > > ... It is a cost effective solution, and Dell clusters keep > popping up at US Universities as well. > > Tom > > The same is true at UK Universities. > > -Tom I think these trends have more to do with the cheap cost of Dell Hardware and Dell's sales force and marketing to upper management than they do with any technical advantages Dell has over the competition. I have no problem with Dell Hardware. There's nothing wrong with it, and Beowulf clusters are *supposed* to be based on affordable commodity hardware. I didn't see much basis for the earlier post disparaging Dell hardware. However, If you're buying a "turn-key" cluster solution based on advertised "clustering services", I'd be cautious with *any* of the big vendors, where you're known mostly as a customer ID in their CRM database, and you're dealing with salespeople and not technical people. Especially if they offer any kind of "customization" -- more than likely, any customization will be at additional costs, and you'll still have to do some reconfiguration to get it set up the way yo want (and is therefore no longer a turn-key system). I've got a few stories, but the guilty shall remain nameless. Getting back to hardware, I've always been impressed with the robustness of HP Proliant hardware -- Prentice From landman at scalableinformatics.com Mon Sep 8 12:01:59 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 08 Sep 2008 15:01:59 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> Message-ID: <48C576A7.3020900@scalableinformatics.com> Tom Elken wrote: > From: beowulf-bounces at beowulf.org > [mailto:beowulf-bounces at beowulf.org] On Behalf Of Tom Pierce > > > ... It is a cost effective solution, and Dell clusters keep > popping up at US Universities as well. > > Tom > > The same is true at UK Universities. > > -Tom Don't know about the UK universities, but more than a couple US ones have signed single source agreements with Dell (or HP or [insert the large tier 1 vendor of your choice]). Makes selling things to these folks often a little more of a challenge ... not for pricing reasons, but purely for paperwork reasons. I find it amusing on good days when we are asked to write a sole-source memo for our customers. I won't comment on whether or not this is the right thing to do, or even a good thing to do for the universities. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From apittman at concurrent-thinking.com Mon Sep 8 12:05:37 2008 From: apittman at concurrent-thinking.com (Ashley Pittman) Date: Mon, 08 Sep 2008 20:05:37 +0100 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C575DC.2000608@ias.edu> References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> <48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> Message-ID: <1220900737.4641.31.camel@bruce.priv.wark.uk.streamline-computing.com> On Mon, 2008-09-08 at 14:58 -0400, Prentice Bisbal wrote: > I think these trends have more to do with the cheap cost of Dell > Hardware and Dell's sales force and marketing to upper management than > they do with any technical advantages Dell has over the competition. > > I have no problem with Dell Hardware. There's nothing wrong with it, and > Beowulf clusters are *supposed* to be based on affordable commodity > hardware. I didn't see much basis for the earlier post disparaging Dell > hardware. > > However, If you're buying a "turn-key" cluster solution based on > advertised "clustering services", I'd be cautious with *any* of the big > vendors, where you're known mostly as a customer ID in their CRM > database, and you're dealing with salespeople and not technical people. > Especially if they offer any kind of "customization" -- more than > likely, any customization will be at additional costs, and you'll still > have to do some reconfiguration to get it set up the way yo want (and is > therefore no longer a turn-key system). I've got a few stories, but the > guilty shall remain nameless. You don't have to buy Dell hardware direct from Dell, there are plenty of people who will sell you dell nodes with value-add hardware and software. Ashley, From landman at scalableinformatics.com Mon Sep 8 12:26:40 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 08 Sep 2008 15:26:40 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C575DC.2000608@ias.edu> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> Message-ID: <48C57C70.3010704@scalableinformatics.com> Prentice Bisbal wrote: [...] > Getting back to hardware, I've always been impressed with the robustness > of HP Proliant hardware Of course, the dirty little (not so) secret of tier 1 systems are that they are all built by the same 2-3 contract manufacturers, from the same parts troughs ...... .... there is an economic reason for this. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From prentice at ias.edu Mon Sep 8 13:18:43 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 08 Sep 2008 16:18:43 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C57C70.3010704@scalableinformatics.com> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <48C57C70.3010704@scalableinformatics.com> Message-ID: <48C588A3.5090606@ias.edu> Joe Landman wrote: > Prentice Bisbal wrote: > > [...] > >> Getting back to hardware, I've always been impressed with the robustness >> of HP Proliant hardware > > Of course, the dirty little (not so) secret of tier 1 systems are that > they are all built by the same 2-3 contract manufacturers, from the same > parts troughs ...... > > .... there is an economic reason for this. I'm sure. The bike business is the same way. For the most part (there are exceptions), a handful of bike factories make all the bikes for the different bike companies, the country of manufacture determines the cost of manufacturing and price: Japan: Highest quality and price Taiwan: Middle quality and price China: Lowest quality and price Usually the highest-end models are made in-house, so this doesn't apply the them. Even at the same factory, each vendor can specify the quality of manufacture (quality of materials/components, tolerances, and thoroughness of Q/A testing), and pay accordingly. I'm sure even in the computer world a similar rule applies. $ = cheap components, $$= better components, etc. -- Prentice From landman at scalableinformatics.com Mon Sep 8 14:07:22 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 08 Sep 2008 17:07:22 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C588A3.5090606@ias.edu> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <48C57C70.3010704@scalableinformatics.com> <48C588A3.5090606@ias.edu> Message-ID: <48C5940A.606@scalableinformatics.com> Prentice Bisbal wrote: > I'm sure even in the computer world a similar rule applies. $ = cheap > components, $$= better components, etc. A Xeon is a Xeon is a Xeon. Some RAM DIMM builders use ... ah ... less than spectacular ... parts. But peel off some of the carefully applied labels on the tier-1 units and you find some ... interesting ... things beneath (usually the labels that say you void your warranty if you remove them). -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From prentice at ias.edu Mon Sep 8 14:13:07 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 08 Sep 2008 17:13:07 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C5940A.606@scalableinformatics.com> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <48C57C70.3010704@scalableinformatics.com> <48C588A3.5090606@ias.edu> <48C5940A.606@scalableinformatics.com> Message-ID: <48C59563.2080404@ias.edu> Joe Landman wrote: > Prentice Bisbal wrote: > >> I'm sure even in the computer world a similar rule applies. $ = cheap >> components, $$= better components, etc. > > A Xeon is a Xeon is a Xeon. > > Some RAM DIMM builders use ... ah ... less than spectacular ... parts. > > But peel off some of the carefully applied labels on the tier-1 units > and you find some ... interesting ... things beneath (usually the labels > that say you void your warranty if you remove them). > Been there, done that. I love that SGI wanted $4k for a fibre-channel HBA for an Origin 350, which was made by QLogic, and bought the exact same thing for only $800 from CDW. Of course, SGI would refuse to support it if I ever had a tech support issue, but that never happened. I know Sun and the other big Unix Co's did the same thing, which is why we all use Linux on commodity hardware these days. -- Prentice From perry at piermont.com Mon Sep 8 14:15:27 2008 From: perry at piermont.com (Perry E. Metzger) Date: Mon, 08 Sep 2008 17:15:27 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C5940A.606@scalableinformatics.com> (Joe Landman's message of "Mon\, 08 Sep 2008 17\:07\:22 -0400") References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> <48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <48C57C70.3010704@scalableinformatics.com> <48C588A3.5090606@ias.edu> <48C5940A.606@scalableinformatics.com> Message-ID: <877i9mfifk.fsf@snark.cb.piermont.com> Joe Landman writes: > Prentice Bisbal wrote: > >> I'm sure even in the computer world a similar rule applies. $ = cheap >> components, $$= better components, etc. > > A Xeon is a Xeon is a Xeon. > > Some RAM DIMM builders use ... ah ... less than spectacular ... parts. > > But peel off some of the carefully applied labels on the tier-1 units > and you find some ... interesting ... things beneath (usually the > labels that say you void your warranty if you remove them). There is considerable difference in quality between different motherboards, even if all the Xeons you put in them are the same. Another big price/quality tradeoff: ECC vs. non-ECC memory. -- Perry E. Metzger perry at piermont.com From landman at scalableinformatics.com Mon Sep 8 14:15:56 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 08 Sep 2008 17:15:56 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C59563.2080404@ias.edu> References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <48C57C70.3010704@scalableinformatics.com> <48C588A3.5090606@ias.edu> <48C5940A.606@scalableinformatics.com> <48C59563.2080404@ias.edu> Message-ID: <48C5960C.4080807@scalableinformatics.com> Prentice Bisbal wrote: > Been there, done that. > > I love that SGI wanted $4k for a fibre-channel HBA for an Origin 350, Imagine what is was like on the "splainin" end of that ... some of us kept saying "you can't paint a box purple and charge 3x the price" while some of them were saying "yes we can". Made for some fun internal discussions. > which was made by QLogic, and bought the exact same thing for only $800 > from CDW. Of course, SGI would refuse to support it if I ever had a tech > support issue, but that never happened. I know Sun and the other big > Unix Co's did the same thing, which is why we all use Linux on commodity > hardware these days. :) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From andrew at moonet.co.uk Mon Sep 8 14:18:35 2008 From: andrew at moonet.co.uk (andrew holway) Date: Mon, 8 Sep 2008 22:18:35 +0100 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> References: <200809021901.m82J0IaS029547@bluewest.scyld.com> <48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO> <48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO> <48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> Message-ID: As an integrator I often find myself on the path of least resistance when specifying kit. With certain vendors I am forced to do a lot trudging around to get the solution I need or there is a myriad of partner scheme rubbish to wade through to get to the man who knows the kit and can give me the answer and price I need quickly enough for the return date. I mean its all the same $417 served with different shovels right? :) On Sat, Sep 6, 2008 at 4:26 PM, Tom Pierce wrote: > > > On Fri, Sep 5, 2008 at 2:23 PM, Vincent Diepeveen wrote: >> >> AFAIK only rich government departments do business with companies such as >> DELL. > > > I buy DELL servers for a cluster at a commercial chemical company. They are > a good price for standard systems. They also have a great Linux support > organization in Austin Texas. Good equipment and high quality support for > issues that arise over time. It is a cost effective solution, and Dell > clusters keep popping up at US Universities as well. > > Tom > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From jeff.johnson at wsm.com Mon Sep 8 16:14:58 2008 From: jeff.johnson at wsm.com (Jeff Johnson) Date: Mon, 08 Sep 2008 16:14:58 -0700 Subject: [Beowulf] Re: Re: GPU boards and cluster servers. In-Reply-To: <200809082117.m88LHAOd032147@bluewest.scyld.com> References: <200809082117.m88LHAOd032147@bluewest.scyld.com> Message-ID: <48C5B1F2.4080606@wsm.com> > A Xeon is a Xeon is a Xeon. > This is a very true statement. Unfortunately for many, the commonality ends where the processor and socket meet. There is a great deal of deviation in motherboard designs. Some are much better than others and it is not always based on the location of the factory (China versus Japan, USA, etc). Intel, as an example, releases a reference design and bios definition that is usually the gold standard for a particular type of platform. Many companies will take this design and modify it to fit their needs and in the process streamline the design. That is usually a cost driven exercise. The end result is usually a decent server platform that performs well in a file/print or even enterprise space. Unfortunately for those who have been bitten, we all know there is a difference between how a well tuned HPC environment can put demands on hardware, challenge tolerances and expose any of the cost driven "changes" made to a motherboard. I have seen HPC sites that can make a Dell 1950 stumble with little difficulty. Granted, that requires a well tuned environment and skilled savvy users that can place demands on machines where the various shortcomings are exposed. A fair number of the previously mentioned sole-source contracts end up being penny-wise and dollar-foolish as well as detrimental to the people actually relying on the hardware for productive research. -- Best Regards, Jeff Johnson President / CTO Western Scientific, Inc jeff.johnson at wsm.com http://www.wsm.com 5444 Napa Street - San Diego, CA 92110 Tel 800.443.6699 +001.619.220.6580 Fax +001.619.220.6590 "Braccae tuae aperiuntur" From landman at scalableinformatics.com Mon Sep 8 16:46:47 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 08 Sep 2008 19:46:47 -0400 Subject: [Beowulf] Re: Re: GPU boards and cluster servers. In-Reply-To: <48C5B1F2.4080606@wsm.com> References: <200809082117.m88LHAOd032147@bluewest.scyld.com> <48C5B1F2.4080606@wsm.com> Message-ID: <48C5B967.5070801@scalableinformatics.com> Jeff Johnson wrote: > >> A Xeon is a Xeon is a Xeon. >> > This is a very true statement. > > Unfortunately for many, the commonality ends where the processor and > socket meet. There is a great deal of deviation in motherboard designs. > Some are much better than others and it is not always based on the > location of the factory (China versus Japan, USA, etc). Absolutely. > Intel, as an example, releases a reference design and bios definition > that is usually the gold standard for a particular type of platform. > Many companies will take this design and modify it to fit their needs > and in the process streamline the design. That is usually a cost driven > exercise. Lets all say it together now ... Broadcomm NICs on motherboards ... :( > The end result is usually a decent server platform that performs well in > a file/print or even enterprise space. Unfortunately for those who have > been bitten, we all know there is a difference between how a well tuned > HPC environment can put demands on hardware, challenge tolerances and > expose any of the cost driven "changes" made to a motherboard. Yeah. Not just motherboards. All components. Things that work really well in an office context, that an IT person wouldn't give a second thought to deploying do not work well, if at all, in an HPC environment. We have seen this again and again, usually in the contexts (curiously enough) of storage design, networking, as well as computing system design. > I have seen HPC sites that can make a Dell 1950 stumble with little > difficulty. Granted, that requires a well tuned environment and skilled > savvy users that can place demands on machines where the various > shortcomings are exposed. A fair number of the previously mentioned Nothing helps you debug a design like a real use case, live, in action, on your system. > sole-source contracts end up being penny-wise and dollar-foolish as well > as detrimental to the people actually relying on the hardware for > productive research. :( I have never seen a reduction in competition as productive for the consumer in that market. Reduction in choice, increase in price, decrease in consumer leverage. This is good ... how? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From lindahl at pbm.com Mon Sep 8 17:11:05 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 8 Sep 2008 17:11:05 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C575DC.2000608@ias.edu> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> Message-ID: <20080909001105.GE997@bx9.net> On Mon, Sep 08, 2008 at 02:58:36PM -0400, Prentice Bisbal wrote: > I think these trends have more to do with the cheap cost of Dell > Hardware and Dell's sales force and marketing to upper management than > they do with any technical advantages Dell has over the competition. I was involved in the top 2 Dell systems in the UK. Both were competetive bids, with a cluster integrator other than Dell, and Dell hardware. So no, marketing wasn't a big driver, but low cost was. -- greg From libo at buaa.edu.cn Mon Sep 8 18:17:52 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Tue, 9 Sep 2008 09:17:52 +0800 Subject: [Beowulf] Re: GPU boards and cluster servers. References: <200809021901.m82J0IaS029547@bluewest.scyld.com><48BF1508.7000406@harddata.com> <002001c90e58$3c20c020$6300a8c0@LIBO><48C08D34.1060704@harddata.com> <002401c90f15$38e48500$6300a8c0@LIBO><48C12B6E.4090903@tamu.edu>

<25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com><6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> Message-ID: <003401c91219$eacc9c10$6300a8c0@LIBO> Hello, I have got the same opinions with you. Dell's systems maybe cheap, but lacking of robustness or reliabilities. I did experience a few hardware failure with Dell's servers and they failed to give me any helpful supports except re-installing the OS. Regards, Li, Bo ----- Original Message ----- From: "Prentice Bisbal" To: "Beowulf Mailing List" Sent: Tuesday, September 09, 2008 2:58 AM Subject: Re: [Beowulf] Re: GPU boards and cluster servers. > Tom Elken wrote: >> From: beowulf-bounces at beowulf.org >> [mailto:beowulf-bounces at beowulf.org] On Behalf Of Tom Pierce >> >> >> ... It is a cost effective solution, and Dell clusters keep >> popping up at US Universities as well. >> >> Tom >> >> The same is true at UK Universities. >> >> -Tom > > I think these trends have more to do with the cheap cost of Dell > Hardware and Dell's sales force and marketing to upper management than > they do with any technical advantages Dell has over the competition. > > I have no problem with Dell Hardware. There's nothing wrong with it, and > Beowulf clusters are *supposed* to be based on affordable commodity > hardware. I didn't see much basis for the earlier post disparaging Dell > hardware. > > However, If you're buying a "turn-key" cluster solution based on > advertised "clustering services", I'd be cautious with *any* of the big > vendors, where you're known mostly as a customer ID in their CRM > database, and you're dealing with salespeople and not technical people. > Especially if they offer any kind of "customization" -- more than > likely, any customization will be at additional costs, and you'll still > have to do some reconfiguration to get it set up the way yo want (and is > therefore no longer a turn-key system). I've got a few stories, but the > guilty shall remain nameless. > > Getting back to hardware, I've always been impressed with the robustness > of HP Proliant hardware > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From carsten.aulbert at aei.mpg.de Tue Sep 9 00:53:48 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Tue, 09 Sep 2008 09:53:48 +0200 Subject: [Beowulf] Monitoring crashing machines Message-ID: <48C62B8C.7060102@aei.mpg.de> Hi all, I would tend to guess this problem is fairly common and many solutions are already in place, so I would like to enquirer about your solutions to the problem: In our large cluster we have certain nodes going down with I/O hard disk errors. We have some suspicion about the causes but would like to investigate this further. However, the log files don't show much if anything at all (which is understandably given that the log files reside on disk and we are hitting I/O disk errors). Albeit the console shows some interesting messages but cannot scroll back long enough. My question now, is there a cute little way to gather all the console outputs of > 1000 nodes? The nodes don't have physical serial cables attached to them - nor do we want to use many concentrators to achieve this - but the off-the-shelf Supermicro boxes all have an IPMI card installed and SoL works quite ok. Initially, conserver.com looked nice and we also found an IPMI interface for it, but that comes with two downsides: (1) it blocks IPMI access (I have yet to find out if a secondary user can use SoL when another user is using this already, but I doubt it) and (2) it simply does not catch messages appearing in dmesg (simple ones like plugging in a USB keyboard), but that may be a configuration problem on our side. Also we tried (r)syslog but somehow this does not get all the messages either, even when using something like *.* @loghost. For the time being we are experimenting with using "script" in many "screen" environment which should be able to monitor ipmitool's SoL output, but somehow that strikes me as inefficient as well. So, my question boils down to: How do people solve this problem? Thanks a lot Cheers Carsten -- Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics Callinstrasse 38, 30167 Hannover, Germany Phone/Fax: +49 511 762-17185 / -17193 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31 From reuti at staff.uni-marburg.de Tue Sep 9 01:59:14 2008 From: reuti at staff.uni-marburg.de (Reuti) Date: Tue, 9 Sep 2008 10:59:14 +0200 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C62B8C.7060102@aei.mpg.de> References: <48C62B8C.7060102@aei.mpg.de> Message-ID: <1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> Hi, Am 09.09.2008 um 09:53 schrieb Carsten Aulbert: > Hi all, > > I would tend to guess this problem is fairly common and many solutions > are already in place, so I would like to enquirer about your solutions > to the problem: > > In our large cluster we have certain nodes going down with I/O hard > disk > errors. We have some suspicion about the causes but would like to > investigate this further. However, the log files don't show much if > anything at all (which is understandably given that the log files > reside > on disk and we are hitting I/O disk errors). Albeit the console shows > some interesting messages but cannot scroll back long enough. > > My question now, is there a cute little way to gather all the console > outputs of > 1000 nodes? The nodes don't have physical serial cables > attached to them - nor do we want to use many concentrators to achieve > this - but the off-the-shelf Supermicro boxes all have an IPMI card > installed and SoL works quite ok. I setup syslog-ng on the nodes to log to the headnode. There each node will have a distinct file e.g. "/var/log/nodes/node42.messages". If you are interested, I could post my configuration files for headnode and clients. -- Reuti > > Initially, conserver.com looked nice and we also found an IPMI > interface > for it, but that comes with two downsides: (1) it blocks IPMI > access (I > have yet to find out if a secondary user can use SoL when another user > is using this already, but I doubt it) and (2) it simply does not > catch > messages appearing in dmesg (simple ones like plugging in a USB > keyboard), but that may be a configuration problem on our side. > > Also we tried (r)syslog but somehow this does not get all the messages > either, even when using something like *.* @loghost. > > For the time being we are experimenting with using "script" in many > "screen" environment which should be able to monitor ipmitool's SoL > output, but somehow that strikes me as inefficient as well. > > So, my question boils down to: How do people solve this problem? > > Thanks a lot > > Cheers > > Carsten > > -- > Dr. Carsten Aulbert - Max Planck Institute for Gravitational Physics > Callinstrasse 38, 30167 Hannover, Germany > Phone/Fax: +49 511 762-17185 / -17193 > http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/ > list/31 > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From carsten.aulbert at aei.mpg.de Tue Sep 9 02:00:39 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Tue, 09 Sep 2008 11:00:39 +0200 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> References: <48C62B8C.7060102@aei.mpg.de> <1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> Message-ID: <48C63B37.5020402@aei.mpg.de> Hi thanks for the reply Reuti wrote: > I setup syslog-ng on the nodes to log to the headnode. There each node > will have a distinct file e.g. "/var/log/nodes/node42.messages". If you > are interested, I could post my configuration files for headnode and > clients. Does this capture (almost) everything what happens to a machine? w have not yet looked into syslog-ng but a looks into your config files would be very nice. Thanks Carsten From geoff at galitz.org Tue Sep 9 03:40:21 2008 From: geoff at galitz.org (Geoff Galitz) Date: Tue, 9 Sep 2008 12:40:21 +0200 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C63B37.5020402@aei.mpg.de> References: <48C62B8C.7060102@aei.mpg.de><1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> <48C63B37.5020402@aei.mpg.de> Message-ID: >Does this capture (almost) everything what happens to a machine? w have >not yet looked into syslog-ng but a looks into your config files would >be very nice. You can also configure any standard (distribution shipped) syslog to log remotely to your head node or even a seperate logging master. Anything that gets reported to the syslog facility can be reported/archived in this manner, you just need to dig into the documentation (e.g. man syslog, man syslog.conf, man 5 syslog, etc) to figure out the configuration you need. It is actually pretty straight-forward. Logging the I/O errors or any other kernel driver output should be no problem. Most standard syslog mechanisms will not let you cleanly create a hierarchy such as what syslog-ng will give you, but I find that simply grepping one or two central log files works better for me, anyways. -geoff From carsten.aulbert at aei.mpg.de Tue Sep 9 04:16:28 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Tue, 09 Sep 2008 13:16:28 +0200 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: References: <48C62B8C.7060102@aei.mpg.de><1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> <48C63B37.5020402@aei.mpg.de> Message-ID: <48C65B0C.2000208@aei.mpg.de> Hi, Geoff Galitz wrote: > You can also configure any standard (distribution shipped) syslog to log > remotely to your head node or even a seperate logging master. Anything that > gets reported to the syslog facility can be reported/archived in this > manner, you just need to dig into the documentation (e.g. man syslog, man > syslog.conf, man 5 syslog, etc) to figure out the configuration you need. > It is actually pretty straight-forward. Logging the I/O errors or any other > kernel driver output should be no problem. > > Most standard syslog mechanisms will not let you cleanly create a hierarchy > such as what syslog-ng will give you, but I find that simply grepping one or > two central log files works better for me, anyways. That's what we tried yesterday and it did not work nicely. The client machine got an entry like *.* @loghost (end syslogd was restarted) and the loghost got the -r flag added to enable listening for remote calls. We did get a few messages, albeit not from the kernel when an error happened. I'll have another look today, maybe I did something wrong. Thanks! Carsten From perry at piermont.com Tue Sep 9 04:45:46 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 09 Sep 2008 07:45:46 -0400 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C62B8C.7060102@aei.mpg.de> (Carsten Aulbert's message of "Tue\, 09 Sep 2008 09\:53\:48 +0200") References: <48C62B8C.7060102@aei.mpg.de> Message-ID: <87ljy1czkl.fsf@snark.cb.piermont.com> Carsten Aulbert writes: > For the time being we are experimenting with using "script" in many > "screen" environment which should be able to monitor ipmitool's SoL > output, but somehow that strikes me as inefficient as well. First, you should probably never want script+screen -- use expect instead. It's the swiss army chainsaw of sysadmin tools. Second, I suspect you can use ipmish or a similar tool to simply "cat out" the console output and just redirect it to a file. > Initially, conserver.com looked nice and we also found an IPMI interface > for it, but that comes with two downsides: (1) it blocks IPMI access (I > have yet to find out if a secondary user can use SoL when another user > is using this already, but I doubt it) The whole point of conserver is that *conserver* allows multiple users to connect up at once. They connect up to conserver, not via ipmi, and conserver multiplexes the one IPMI connection. > and (2) it simply does not catch messages appearing in dmesg (simple > ones like plugging in a USB keyboard), but that may be a > configuration problem on our side. Probably. -- Perry E. Metzger perry at piermont.com From larry.stewart at sicortex.com Tue Sep 9 05:29:15 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Tue, 09 Sep 2008 08:29:15 -0400 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C62B8C.7060102@aei.mpg.de> References: <48C62B8C.7060102@aei.mpg.de> Message-ID: <48C66C1B.2030503@sicortex.com> Carsten Aulbert wrote: > Hi all, > > I would tend to guess this problem is fairly common and many solutions > are already in place, so I would like to enquirer about your solutions > to the problem: > > In our large cluster we have certain nodes going down with I/O hard disk > errors. We have some suspicion about the causes but would like to > investigate this further. However, the log files don't show much if > anything at all (which is understandably given that the log files reside > on disk and we are hitting I/O disk errors). Albeit the console shows > some interesting messages but cannot scroll back long enough. > > My question now, is there a cute little way to gather all the console > outputs of > 1000 nodes? The nodes don't have physical serial cables > attached to them - nor do we want to use many concentrators to achieve > this - but the off-the-shelf Supermicro boxes all have an IPMI card > installed and SoL works quite ok. > > Initially, conserver.com looked nice and we also found an IPMI interface > for it, but that comes with two downsides: (1) it blocks IPMI access (I > have yet to find out if a secondary user can use SoL when another user > is using this already, but I doubt it) and (2) it simply does not catch > messages appearing in dmesg (simple ones like plugging in a USB > keyboard), but that may be a configuration problem on our side. > > Also we tried (r)syslog but somehow this does not get all the messages > either, even when using something like *.* @loghost. > > For the time being we are experimenting with using "script" in many > "screen" environment which should be able to monitor ipmitool's SoL > output, but somehow that strikes me as inefficient as well. > > So, my question boils down to: How do people solve this problem? > > Thanks a lot > > Cheers > > Carsten > > We use conserver here at SiCortex, but it doesn't talk to node consoles directly. Instead, we've written a kind of intermediary between conserver and the real console access. The situation isn't exactly parallel, but if you wind up writing your own "intermediary" the structure and code might be useful. Node linux -> custom char device driver -> scan chain hardware -> embedded uClinux board-level microprocessor -> "scan daemon", which concentrates the terminals from 27 nodes -> TCP/IP socket -> x86 service processor -> "scconserver" which speaks the idiosyncratic terminal protocol on one side, and demultiplexes the consoles into invididual TCP sockets -> conserver, which does all the usual conserver stuff. This works well enough at the 972 node scale. In your situation, the intermediary could export IMPI sockets which it would multiplex in with its connection to the real IMPI access on the node. We used libevent to write scconserver, which makes all the book-keeping for a zillion connections fairly straightforward. If you head this way, you might get some benefit from http://downloads.sicortex.com/distfiles/sicortex-scconserver-5.0.0.9.50831.tbz2 All open source. Regarding dmesg vs console, this is all according to node logging settings, which I don't know much about. -- -Larry / Sector IX From tortay at cc.in2p3.fr Tue Sep 9 08:52:16 2008 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Tue, 09 Sep 2008 17:52:16 +0200 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C62B8C.7060102@aei.mpg.de> References: <48C62B8C.7060102@aei.mpg.de> Message-ID: <48C69BB0.2010803@cc.in2p3.fr> Carsten Aulbert wrote: [server console management for many servers with conserver] > We use conserver to get serial console access to almost all our machines. Below is the forwarded answer to your messages from my coworker who's in charge of this. The tools he created for interfacing IPMI and conserver are in the conserver "contrib" section (this may be what you refered to as the IPMI interface for conserver). If you want to contact him directly, his e-mail address is similar to mine, juste replace 'tortay' with 'wernli'. Lo?c. -------- Original Message -------- > Initially, conserver.com looked nice and we also found an IPMI > interface for it, but that comes with two downsides: (1) it blocks > IPMI access (I have yet to find out if a secondary user can use SoL > when another user is using this already, but I doubt it) and (2) it > simply does not catch messages appearing in dmesg (simple ones like > plugging in a USB keyboard), but that may be a configuration problem > on our side. We are using conserver(.com) on 6 linux boxes (quite old horses) for managing more than 1500 servers. Most of the latter are being handled by ipmitool SOL. On some - however rare - servers, I believe ipmi access is indeed restricted to one open connection. If you happen to be unlucky on this side (which I seriously doubt), it won't be an issue for the console access, as conserver is designed to let you share these (while logging all their output, which is what we're doing). As for the dmesg issue, you're just missing the "console=ttySx,baudrate" kernel parameter, which should come after "console=tty0" if you want init to talk to the serial line, or before for speaking to the monitor. > Also we tried (r)syslog but somehow this does not get all the messages > either, even when using something like *.* @loghost. this is however true, and is one of the reasons we got into the trouble of having consoles (ipmi or other) open for all our servers at any time. It can be very precious to grep through all the console logfiles to catch that error message which was hidden everywhere else. > For the time being we are experimenting with using "script" in many > "screen" environment which should be able to monitor ipmitool's SoL > output, but somehow that strikes me as inefficient as well. conserver scales extremely well and will be your best friend (if you don't have a dog that is). > So, my question boils down to: How do people solve this problem? feel free to private email me if you need the details -- | Lo?c Tortay - IN2P3 Computing Centre | From hahn at mcmaster.ca Tue Sep 9 09:12:24 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 9 Sep 2008 12:12:24 -0400 (EDT) Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C65B0C.2000208@aei.mpg.de> References: <48C62B8C.7060102@aei.mpg.de><1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> <48C63B37.5020402@aei.mpg.de> <48C65B0C.2000208@aei.mpg.de> Message-ID: > We did get a few messages, albeit not from the kernel when an error > happened. I'll have another look today, maybe I did something wrong. there's a semi-recent kernel feature which allows the kernel to avoid user-space by putting console traffic onto the net directly see Documentation/networking/netconsole.txt From prentice at ias.edu Tue Sep 9 10:15:29 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 09 Sep 2008 13:15:29 -0400 Subject: [Beowulf] Re: Re: GPU boards and cluster servers. In-Reply-To: <48C5B1F2.4080606@wsm.com> References: <200809082117.m88LHAOd032147@bluewest.scyld.com> <48C5B1F2.4080606@wsm.com> Message-ID: <48C6AF31.8070005@ias.edu> Jeff Johnson wrote: > >> A Xeon is a Xeon is a Xeon. >> > This is a very true statement. > > Unfortunately for many, the commonality ends where the processor and > socket meet. There is a great deal of deviation in motherboard designs. > Some are much better than others and it is not always based on the > location of the factory (China versus Japan, USA, etc). > > Intel, as an example, releases a reference design and bios definition > that is usually the gold standard for a particular type of platform. > Many companies will take this design and modify it to fit their needs > and in the process streamline the design. That is usually a cost driven > exercise. > A few years ago, when the 64-bit Opterons were still relatively new, I read this article on the HyperTransport (HT) architecture in SysAdmin magazine. At the time, according to the article, out of all the different companies selling Opteron-based systems, the HP Proliant DL-585 was one of the few that fully implemented the HT architecture. http://www.samag.com/documents/s=9408/sam0411b/0411b.htm So yeah, there' s a lot more to computer performance than the type of processor and socket on the motherboard. -- Prentice From rgb at phy.duke.edu Tue Sep 9 11:12:02 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 9 Sep 2008 14:12:02 -0400 (EDT) Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <20080909001105.GE997@bx9.net> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> Message-ID: On Mon, 8 Sep 2008, Greg Lindahl wrote: > On Mon, Sep 08, 2008 at 02:58:36PM -0400, Prentice Bisbal wrote: > >> I think these trends have more to do with the cheap cost of Dell >> Hardware and Dell's sales force and marketing to upper management than >> they do with any technical advantages Dell has over the competition. > > I was involved in the top 2 Dell systems in the UK. Both were > competetive bids, with a cluster integrator other than Dell, and Dell > hardware. So no, marketing wasn't a big driver, but low cost was. And there's nothing wrong with that. One thing that is appealing about Dells in any professional operation is that they usually come with hot and cold running service, pay for what you need. If I buy e.g. a Dell laptop (as I have for six or seven years now) I pay a single, easily budgeted price and if it breaks (as it has six or seven times now over the years -- I USE my laptop, run hard and put up wet), a nice man comes to my house and fixes it on the spot, sitting at my dining room table. Nearly anywhere else I could by a laptop leaves me with depot repair or worse. They provide similar levels of coverage on server/cluster systems. This can in turn let you prebudget all maintainance costs at the time of original purchase and be CERTAIN that your cluster or server room will not require additional emergency repair funds for the next 3-4 years, the expected useful life of the hardware anyway. It also saves you tremendously on your OWN opportunity cost time if something breaks as a phone call is a lot cheaper than slogging down to the server room, pulling a box, benching it, and messing with it for a few hours to figure out which part has gone bad. At the very least the "few hours" part can be relegated to somebody else. This is hardly unique, of course -- penguincomputing.com does as well and is arguably more reliable out of the box as they tend to use higher grade parts (from my own strictly anecdotal experience). But Dells aren't "bad", and are often cheap(est). With onsite service guaranteed and a certain amount of not-in-the-configurator pricing and configuration flexibility when one buys in bulk, it is by no means "obvious" that Dells are a poor choice for a cluster or server room, and I personally think they are an actively GOOD choice for 2-4 laptops (he says typing this reply into his brand new XPS M1530 1900x1200, 320 GB HD, 4 GB dual 64 bit core laptop with full accidental damage coverage on top of regular extended service...:-). rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From rgb at phy.duke.edu Tue Sep 9 11:19:24 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 9 Sep 2008 14:19:24 -0400 (EDT) Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C62B8C.7060102@aei.mpg.de> References: <48C62B8C.7060102@aei.mpg.de> Message-ID: On Tue, 9 Sep 2008, Carsten Aulbert wrote: > My question now, is there a cute little way to gather all the console > outputs of > 1000 nodes? The nodes don't have physical serial cables > attached to them - nor do we want to use many concentrators to achieve > this - but the off-the-shelf Supermicro boxes all have an IPMI card > installed and SoL works quite ok. Syslog-ng? Popping a USB flash disk on them to use as an alternative log location (if the kernel doesn't actively lock up on the disk error)? Booting from a USB flash image or diskless, so that a disk crash is just a disk crash? rgb > > Initially, conserver.com looked nice and we also found an IPMI interface > for it, but that comes with two downsides: (1) it blocks IPMI access (I > have yet to find out if a secondary user can use SoL when another user > is using this already, but I doubt it) and (2) it simply does not catch > messages appearing in dmesg (simple ones like plugging in a USB > keyboard), but that may be a configuration problem on our side. > > Also we tried (r)syslog but somehow this does not get all the messages > either, even when using something like *.* @loghost. > > For the time being we are experimenting with using "script" in many > "screen" environment which should be able to monitor ipmitool's SoL > output, but somehow that strikes me as inefficient as well. > > So, my question boils down to: How do people solve this problem? > > Thanks a lot > > Cheers > > Carsten > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From rgb at phy.duke.edu Tue Sep 9 11:26:36 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 9 Sep 2008 14:26:36 -0400 (EDT) Subject: [Beowulf] Monitoring crashing machines In-Reply-To: <48C65B0C.2000208@aei.mpg.de> References: <48C62B8C.7060102@aei.mpg.de><1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> <48C63B37.5020402@aei.mpg.de> <48C65B0C.2000208@aei.mpg.de> Message-ID: On Tue, 9 Sep 2008, Carsten Aulbert wrote: > We did get a few messages, albeit not from the kernel when an error > happened. I'll have another look today, maybe I did something wrong. If your kernel is out and out crashing, you might not get anything at all. In that case, let me add: "putting a cheap monitor on a suspect or crashed node" Or even after a crash. If the primary graphics card is being used as a console, the frame buffer will probably retain the last kernel oops written to it (if any) even after it locks up the system proper. Just plug a monitor into the framebuffer of a machine that has crashed and see if there is anything there. One last method (from back in the dark ages): "putting a tty-output printer on as a console printer" This was actually standard of practice for servers through the end of the 80s', anyway, because it was COMMON for servers to crash -- or be cracked -- and a hard copy of syslog/console output was often your only clue as to the cause, your only evidence of the intrusion. You still will have the problem of a kernel crash not infrequently being, well, "instant death". Some problems just lock up your system "now", without passing go or collecting $200. Nothing will then help you, although modern kernels have settings and setups that SHOULD die with oops and some sort of message, most of the time. Or some of the time. Or heck, who knows of the time? rgb > > Thanks! > > Carsten > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From lindahl at pbm.com Tue Sep 9 11:35:00 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 9 Sep 2008 11:35:00 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> Message-ID: <20080909183500.GA21946@bx9.net> On Tue, Sep 09, 2008 at 02:12:02PM -0400, Robert G. Brown wrote: > If I buy e.g. a Dell > laptop (as I have for six or seven years now) I pay a single, easily > budgeted price and if it breaks (as it has six or seven times now over > the years -- I USE my laptop, run hard and put up wet), a nice man comes > to my house and fixes it on the spot, sitting at my dining room table. > Nearly anywhere else I could by a laptop leaves me with depot repair or > worse. Well, there's a reason for that, and it's because Dell has always made fragile laptops. A decade ago there's no way anyone would buy a Dell laptop without a great repair plan, and so it was bundled. As for on-site and/or long-term repair contracts for clusters, they're pretty common. -- greg From prentice at ias.edu Tue Sep 9 12:38:38 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 09 Sep 2008 15:38:38 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> Message-ID: <48C6D0BE.5090201@ias.edu> Robert G. Brown wrote: > On Mon, 8 Sep 2008, Greg Lindahl wrote: > >> On Mon, Sep 08, 2008 at 02:58:36PM -0400, Prentice Bisbal wrote: >> >>> I think these trends have more to do with the cheap cost of Dell >>> Hardware and Dell's sales force and marketing to upper management than >>> they do with any technical advantages Dell has over the competition. >> >> I was involved in the top 2 Dell systems in the UK. Both were >> competetive bids, with a cluster integrator other than Dell, and Dell >> hardware. So no, marketing wasn't a big driver, but low cost was. > > And there's nothing wrong with that. One thing that is appealing about > Dells in any professional operation is that they usually come with hot > and cold running service, pay for what you need. If I buy e.g. a Dell > laptop (as I have for six or seven years now) I pay a single, easily > budgeted price and if it breaks (as it has six or seven times now over > the years -- I USE my laptop, run hard and put up wet), a nice man comes > to my house and fixes it on the spot, sitting at my dining room table. > Nearly anywhere else I could by a laptop leaves me with depot repair or > worse. > > They provide similar levels of coverage on server/cluster systems. This > can in turn let you prebudget all maintainance costs at the time of > original purchase and be CERTAIN that your cluster or server room will > not require additional emergency repair funds for the next 3-4 years, > the expected useful life of the hardware anyway. It also saves you > tremendously on your OWN opportunity cost time if something breaks as a > phone call is a lot cheaper than slogging down to the server room, > pulling a box, benching it, and messing with it for a few hours to > figure out which part has gone bad. At the very least the "few hours" > part can be relegated to somebody else. > My experience with tech support from Dell and other large vendors is contradictory to yours. Even when I have "on-site" support, they will not send out an on-site technician until the problem has been accurately pinpointed through phone support, so they know what hardware to ship. Most recently, I had a Dell PowerEdge something-or-other that wouldn't bootup - it was competely dead. The phone technician had me on the phone for several hours diagnosing the problem, and LONG after my day normally ends. I asked the phone technician to just send out an on-site technician repeatedly (and pointed out that we are PAYING for that service), and he refused. After finally diagnosing the problem, the phone support then scheduled a technician to come out with a new PERC card and motherboard to replace one or both of them. At that point, they could have skipped the on-site technician and let me replace those parts myself. When the technician showed up a couple days later, he was in and out in less than an hour. Again, I'm not picking on Dell specifically. I've seen this behavior with other large vendors. My point is that "on-site support" usually isn't always, so don't believe the hype. -- Prentice From hahn at mcmaster.ca Tue Sep 9 14:46:50 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 9 Sep 2008 17:46:50 -0400 (EDT) Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C6D0BE.5090201@ias.edu> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> Message-ID: > Again, I'm not picking on Dell specifically. I've seen this behavior > with other large vendors. My point is that "on-site support" usually > isn't always, so don't believe the hype. I think highly of HP service and HP hardware in general. we always spec onsite/NBD support. at first, we spent a lot of time talking through tier-1 support, but fairly quickly were given ways to bypass that. I don't think we paid extra, but I also don't think HP's giving it as charity, since an HPC site will typically have just a few, clueful contacts opening cases, who can efficiently interact with higher-level support. I don't really have signficant data for any other vendor. for small sites or individuals, it make a lot of sense (for the vendor) to try to filter out some of the randomness of support calls before committing a person. of course, a good CRM system would help this - perhaps that's why RGB gets satisfaction from Dell... I _do_ wish it was a bit more common to have onsite spares. not sure why vendors (HP at least) don't like to do this. maybe just that it might get kicked around or otherwise abused... regards, mark hahn. From lindahl at pbm.com Tue Sep 9 15:10:25 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 9 Sep 2008 15:10:25 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> Message-ID: <20080909221025.GB14178@bx9.net> On Tue, Sep 09, 2008 at 05:46:50PM -0400, Mark Hahn wrote: > I _do_ wish it was a bit more common to have onsite spares. not sure > why vendors (HP at least) don't like to do this. maybe just that it > might > get kicked around or otherwise abused... You don't have your own spares kit? For big clusters like yours, it doesn't cost much. -- greg From james.p.lux at jpl.nasa.gov Tue Sep 9 15:25:20 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Tue, 9 Sep 2008 15:25:20 -0700 Subject: [Beowulf] SLAs was Re: GPU boards and cluster servers. In-Reply-To: References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> Message-ID: > Again, I'm not picking on Dell specifically. I've seen this behavior > with other large vendors. My point is that "on-site support" usually > isn't always, so don't believe the hype. I think highly of HP service and HP hardware in general. we always spec onsite/NBD support. at first, we spent a lot of time talking through tier-1 support, but fairly quickly were given ways to bypass that. I don't think we paid extra, but I also don't think HP's giving it as charity, since an HPC site will typically have just a few, clueful contacts opening cases, who can efficiently interact with higher-level support. I don't really have signficant data for any other vendor. for small sites or individuals, it make a lot of sense (for the vendor) to try to filter out some of the randomness of support calls before committing a person. of course, a good CRM system would help this - perhaps that's why RGB gets satisfaction from Dell... --- Not exactly CRM.. he calls up and says "this is rgb, and I need assistance"... The level zero phone weenie thinks "TLA(three letter acronym/agency)! Important!" and presses the red button on the call distributor keyboard. If he called and identified himself as "Professor Brown from Duke", then he'd be in the queue with the rest of us, speculating about the weather in Hyderabad or Bangalore. Jim From rgb at phy.duke.edu Tue Sep 9 15:23:38 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 9 Sep 2008 18:23:38 -0400 (EDT) Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> Message-ID: On Tue, 9 Sep 2008, Mark Hahn wrote: > for small sites or individuals, it make a lot of sense (for the vendor) > to try to filter out some of the randomness of support calls before > committing a person. of course, a good CRM system would help this - perhaps > that's why RGB gets satisfaction from Dell... It's not just me -- "Duke" does a lot of business with Dell. A good campus rep that you can call and talk to and who will drop the Windowscrap from systems that "always" come with Windows via their browser interface go a long way. Don't get me wrong -- I'm not advertising Dell. If anything, I'd advertise Penguin as I like their nodes better than Dell's. I'm just stating that it isn't fair to cross off Dell from any consideration because there is something "crazy" about buying from Dell. I buy stuff from Dell all the time. Lots of departments at Duke, including physics, get lots of stuff from Dell (not usually cluster nodes, but we even have a few of those and don't AUTOMATICALLY refuse to consider them). I've had mixed experiences with Dell. As Greg pointed out somewhat wryly, Dell laptops haven't been absolutely reliable -- IBMs in the past have often been better built (although my M1530 seems rock solid, compared to my past latitude and inspirons -- even the hinge is finally "firm" instead of flimsy and compares well with my lenovo). But Dell has fixed them, quickly and with minimal hassle. Overall, I'd give Dell a B in terms of hardware quality averaged over many years -- not terrible, not the best (where Penguin would get an easy A). It would get an A in terms of service, where Penguin also gets an A although it is really hard to tell when nothing really breaks. Dells are a bit cheaper, IIRC from my last comparison, per expected compute operation. So one is left judging the marginal differential cost of slightly more reliable hardware (saving human time and hassle when it doesn't break) vs slightly cheaper hardware that MIGHT break slightly more often. Not a knee jerk decision. > I _do_ wish it was a bit more common to have onsite spares. not sure why > vendors (HP at least) don't like to do this. maybe just that it might > get kicked around or otherwise abused... Partly this is a choice of the site admin. Some things we "have" to keep onsite, because e.g. disks now have to be formally destroyed by the gods of privacy instead of returned to vendors so swap-out replacements are forbidden anyway. Others we keep onsite even though they are warrantied because a spare is cheaper and faster than four hour response service (even if we keep both on e.g. mission critical servers). Cluster nodes we usually don't, simply because we'd use the spares anyway as long as we had rackspace, right? Why keep idle systems hot or on a shelf? rgb > > regards, mark hahn. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From hahn at mcmaster.ca Tue Sep 9 15:41:01 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 9 Sep 2008 18:41:01 -0400 (EDT) Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <20080909221025.GB14178@bx9.net> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> <20080909221025.GB14178@bx9.net> Message-ID: >> I _do_ wish it was a bit more common to have onsite spares. not sure >> why vendors (HP at least) don't like to do this. maybe just that it >> might >> get kicked around or otherwise abused... > > You don't have your own spares kit? For big clusters like yours, it > doesn't cost much. could be we don't know how to ask; I'm not aware of HP actually offering such a kit. or how much we'd be willing to pay. it is an interesting question: not just how much does downtime cost you, but what are the kinds of failures you see and expect? our clusters have been remarkably robust, in spite of having pretty mundane hardware. plain old sata disks, for instance. we have several instances (sites) with a ~400 disk filesystem, but I think we're around 1-2% annual failure rate. we use raid6, but spares for those disks are the most obvious thing I'd want. the failure rate for PSU's, motherboards, dimms, etc are quite a lot lower (maybe 2 psu's of 768 nodes per year.) OTOH, most of this hardware is approaching its third birthday. magic warranty-related number there :| From lindahl at pbm.com Tue Sep 9 16:02:05 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 9 Sep 2008 16:02:05 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> <20080909221025.GB14178@bx9.net> Message-ID: <20080909230205.GB17243@bx9.net> On Tue, Sep 09, 2008 at 06:41:01PM -0400, Mark Hahn wrote: >> You don't have your own spares kit? For big clusters like yours, it >> doesn't cost much. > > could be we don't know how to ask; I'm not aware of HP actually offering > such a kit. or how much we'd be willing to pay. Well, I always buy a couple of extra nodes, and of course some extra disks. That way most failures have spares at hand, and then you can on-site or depot repair at your leisure. For the less common failures it costs more to have parts, but for my current business I have an extra of everything: switches, routers, PDUs, yadda yadda, at each of my sites. (Anyone else notice that tribe.net was down for days 'waiting for a spare part'? Must have been something they couldn't get at Fry's...) -- greg From jmdavis1 at vcu.edu Tue Sep 9 16:10:33 2008 From: jmdavis1 at vcu.edu (Mike Davis) Date: Tue, 09 Sep 2008 19:10:33 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> <20080909221025.GB14178@bx9.net> Message-ID: <48C70269.2050901@vcu.edu> > > could be we don't know how to ask; I'm not aware of HP actually > offering such a kit. or how much we'd be willing to pay. > > it is an interesting question: not just how much does downtime cost you, > but what are the kinds of failures you see and expect? our clusters > have been remarkably robust, in spite of having pretty mundane hardware. > plain old sata disks, for instance. we have several instances (sites) > with a ~400 disk filesystem, but I think we're around 1-2% annual failure > rate. we use raid6, but spares for those disks are the most obvious > thing I'd want. the failure rate for PSU's, motherboards, dimms, etc > are quite a lot lower (maybe 2 psu's of 768 nodes per year.) > > OTOH, most of this hardware is approaching its third birthday. magic > warranty-related number there :| > _______________________________________________ The last I checked neither HP nor Sun offered a spares kit. Apple does. I have built my own for the Sun's (at least those that are not under NBD support). Disks are the obvious thing. We have had several (3) Power Supplies fail and in one case we had both bad RAM and a bad daughter board in a 4600. This is with machines that range from the v60 (PIV 3.0ghz) to the v20z(2.4ghz opteron), to the x4100(2.6ghz opteron) to the x2200(2.6ghz opteron) to the x4600(2.8ghz opteron). We have also had a couple of issues with MB's (actually usually a builtin controller on the MB, often SCSI ). Our real problem now is heat related failures on our 2004 v20z machines. These machines run hot, our raised floor machine room can't keep the upper nodes in the rack cool and we have had several failures that appear cpu/MB related in the past year after several heat events. I am hoping to get these machines replaced as soon as possible since the use as much power and require as much cooling as newer machines with dual dualcore or dual quadcore processors. NBD support can make sense for certain systems (particularly systems that are managed for another department). I like to have it and some spares for my machines. Mike From mathog at caltech.edu Tue Sep 9 16:28:16 2008 From: mathog at caltech.edu (David Mathog) Date: Tue, 09 Sep 2008 16:28:16 -0700 Subject: [Beowulf] Re: Monitoring crashing machines Message-ID: "Robert G. Brown" wrote: > > One last method (from back in the dark ages): > > "putting a tty-output printer on as a console printer" Better yet, set up the serial port as a console, then attach another machine via a serial line, and just have the 2nd machine log everything. Then you can use text tools to search through the resulting log file, rather than having to dig through the paper. Dig being the operative word. In the old days some of those crash events spewed garbage to the printer, and that resulted in a ream of nonsense on the floor, and more often than not, the paper mashed into an accordian behind a pinfeed jam. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From rgb at phy.duke.edu Tue Sep 9 16:41:28 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 9 Sep 2008 19:41:28 -0400 (EDT) Subject: [Beowulf] Re: Monitoring crashing machines In-Reply-To: References: Message-ID: On Tue, 9 Sep 2008, David Mathog wrote: > word. In the old days some of those crash events spewed garbage to the > printer, and that resulted in a ream of nonsense on the floor, and more > often than not, the paper mashed into an accordian behind a pinfeed jam. Nobody said it was EASY back then, right? Even when a system DIDN'T crash, it dump reams of fanfold into the takeup box, most of it never examined by human mind. ;-) The real issue is whether or not the kernel dies a hard death or dies gently enough to issue messages. Some crashes give you a hint at the console, in log files, whereever. If the kernel lives long enough to do this, you can find SOME way to get access to it. If it dies hard, though, it doesn't really matter what you put on the system, there won't be any messages no matter what the medium you manage to attach. Beyond that there are many ways to get a non-dead kernel to write something to where you can see it on a crash. If one is difficult, try another. rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From larry.stewart at sicortex.com Tue Sep 9 18:48:46 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Tue, 9 Sep 2008 21:48:46 -0400 Subject: [Beowulf] Re: Monitoring crashing machines In-Reply-To: References: Message-ID: <0CF3F6B9-85BF-4D07-82B8-9B85DDF236C7@sicortex.com> On Sep 9, 2008, at 7:41 PM, Robert G. Brown wrote: > On Tue, 9 Sep 2008, David Mathog wrote: > >> word. In the old days some of those crash events spewed garbage to >> the >> printer, and that resulted in a ream of nonsense on the floor, and >> more >> often than not, the paper mashed into an accordian behind a pinfeed >> jam. > > Nobody said it was EASY back then, right? Even when a system DIDN'T > crash, it dump reams of fanfold into the takeup box, most of it never > examined by human mind. ;-) A non HPC story... from someone who used to work at the Stanford IT shop way back when. He was a systems analyst or programmer working on upgrading various department JCL decks and batch jobs for some systems conversion, maybe new DASD or something. While testing a job for one department, the report seemed to come out correctly, but it was immediately followed by a five inch thick abend dump. Evidently, the space allocated on the old disk was longer than the file data, but shorter than the program was expecting. It would process the report, and then run off the end of the file and crash. The analyst converted the file for the new disk, set the length correctly, and went on to the next job. A month or two later, the department calls in to inquire "Where's the numbers report?" After some confusion back and forth, it seems that the department had been dutifully filing the abend dumps in a row of file cabinets, and wanted to know why they had gone missing after the upgrade... -Larry PS I never did work with old style big iron myself. I probably would have gotten fired for leaving my coffee cup on top of one of the printers when it opened for more paper. PPS When I got started, we had printer that the "0" was worn out. I had to patch the device driver to substitute capital "O". From carsten.aulbert at aei.mpg.de Tue Sep 9 23:10:09 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Wed, 10 Sep 2008 08:10:09 +0200 Subject: [Beowulf] Re: Monitoring crashing machines In-Reply-To: <0CF3F6B9-85BF-4D07-82B8-9B85DDF236C7@sicortex.com> References: <0CF3F6B9-85BF-4D07-82B8-9B85DDF236C7@sicortex.com> Message-ID: <48C764C1.90403@aei.mpg.de> Hi all Lawrence Stewart wrote: > [...] > A month or two later, the department calls in to inquire "Where's the > numbers > report?" After some confusion back and forth, it seems that the department > had been dutifully filing the abend dumps in a row of file cabinets, and > wanted > to know why they had gone missing after the upgrade... OMG. I'm too young to have experienced that live (only started with ZX Spectrum+ and C64 with nice assembler programs usually well below 4kB) and can only say that I've seen 8 inch floppies during a one-week experience in a company during my school days. You see, I'm coming more from the PC side than Big Irons. Before getting too off topic here, I would like to thank all people who replied to my question (more answers to come). If someone invented a thing which could distill the knowledge from people (without harming them and of course leaving them with theirs) and produce a true answering machine, I guess that would be *the* reference for HPC stuff (and sometimes lengthy discussions drifting through the ether) :) Thanks again Cheers Carsten From carsten.aulbert at aei.mpg.de Tue Sep 9 23:22:50 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Wed, 10 Sep 2008 08:22:50 +0200 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: References: <48C62B8C.7060102@aei.mpg.de><1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> <48C63B37.5020402@aei.mpg.de> <48C65B0C.2000208@aei.mpg.de> Message-ID: <48C767BA.7030606@aei.mpg.de> Robert G. Brown wrote: > > "putting a cheap monitor on a suspect or crashed node" > One monitor to > 1300 1U server is not practical :) > Or even after a crash. If the primary graphics card is being used as a > console, the frame buffer will probably retain the last kernel oops > written to it (if any) even after it locks up the system proper. Just > plug a monitor into the framebuffer of a machine that has crashed and > see if there is anything there. Yes, that's already what we are doing (named "crash cart") we do see some related messages but usually there is no scroll buffer available anymore, thus mostly the important lines are lost. > > One last method (from back in the dark ages): > > "putting a tty-output printer on as a console printer" > Again, can you imagine (1) getting 1300 of these (2) and then employ enough students to refill the paper ;) *chuckling* Cheers Carsten From carsten.aulbert at aei.mpg.de Tue Sep 9 23:23:58 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Wed, 10 Sep 2008 08:23:58 +0200 Subject: [Beowulf] Monitoring crashing machines In-Reply-To: References: <48C62B8C.7060102@aei.mpg.de><1E161467-2661-414F-BF92-8D28E53D537F@staff.uni-marburg.de> <48C63B37.5020402@aei.mpg.de> <48C65B0C.2000208@aei.mpg.de> Message-ID: <48C767FE.7010006@aei.mpg.de> Mark Hahn wrote: > there's a semi-recent kernel feature which allows the kernel to avoid > user-space by putting console traffic onto the net directly > see Documentation/networking/netconsole.txt Now that looks very interesting. Thanks for the pointer! Cheers Carsten From andrew at moonet.co.uk Wed Sep 10 04:13:11 2008 From: andrew at moonet.co.uk (Andrew Holway) Date: Wed, 10 Sep 2008 12:13:11 +0100 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C6D0BE.5090201@ias.edu> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> Message-ID: <35AA6914-6FBC-4ECF-9FD3-035DCC432937@gmail.com> >>> >> >> >> > After finally diagnosing the problem, the phone support then > scheduled a > technician to come out with a new PERC card and motherboard to replace > one or both of them. At that point, they could have skipped the on- > site > technician and let me replace those parts myself. When the technician > showed up a couple days later, he was in and out in less than an hour. > > Again, I'm not picking on Dell specifically. I've seen this behavior > with other large vendors. My point is that "on-site support" usually > isn't always, so don't believe the hype. Again this is down to the service level you buy. Dell have an "on site after phone diagnosis" which is marginally cheaper than the standard. What was on your quote? Andy > > > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From andrew at moonet.co.uk Wed Sep 10 04:41:18 2008 From: andrew at moonet.co.uk (andrew holway) Date: Wed, 10 Sep 2008 12:41:18 +0100 Subject: [Beowulf] Lustre failover Message-ID: >From the Lustre manual:- With OST servers it is possible to have a load-balanced active/active configuration. Each node is the primary node for a group of OSTs, and the failover node for other groups. To expand the simple two-node example, we add ost2 which is primary on nodeB, and is on the LUNs nodeB:/dev/sdc1 and nodeA:/dev/sdd1. This demonstrates that the /dev/ identity can differ between nodes, but both devices must map to the same physical LUN. In this type of failover configuration, you can mount two OSTs on two different nodes, and format them from either node. With failover, two OSSs provide the same service to the Lustre network in parallel. In case of disaster or a failure in one of the nodes, the other OSS can provide uninterrupted filesystem services. For an active/active configuration, mount one OST on one node and another OST on the other node. You can format them from either node. Anyone done this on a production system? Experiances? Comments? Cheers Andy From hahn at mcmaster.ca Wed Sep 10 06:02:17 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 10 Sep 2008 09:02:17 -0400 (EDT) Subject: [Beowulf] Lustre failover In-Reply-To: References: Message-ID: > With OST servers it is possible to have a load-balanced active/active > configuration. > Each node is the primary node for a group of OSTs, and the failover > node for other ... > Anyone done this on a production system? we have a number of HP's Lustre (SFS) clusters, which use dual-homed disk arrays, but in active/passive configuration. it works reasonably well. > Experiances? Comments? active/active seems strange to me - it implies that the bottleneck is the OSS (OST server), rather than the disk itself. and a/a means each OSS has to do more locking for the shared disk, which would seem to make the problem worse... From hahn at mcmaster.ca Wed Sep 10 07:34:45 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 10 Sep 2008 10:34:45 -0400 (EDT) Subject: [Beowulf] Lustre failover In-Reply-To: <200809101622.14020.bs@q-leap.de> References: <200809101622.14020.bs@q-leap.de> Message-ID: >> active/active seems strange to me - it implies that the bottleneck >> is the OSS (OST server), rather than the disk itself. and a/a means >> each OSS has to do more locking for the shared disk, which would seem >> to make the problem worse... > > > No, you can do active/active with several systems > > Raid1 > / \ > OSS1 OSS2 > \ / > Raid2 > > > (Raid1 and Raid2 are hardware raid systems). > > Now OSS1 will primarily serve Raid1 and OSS2 will primarily serve Raid2. So yes, I know - that's how HP SFS is set up. the OP was talking active-active, though, meaning that IO at any instant can go to either OSS and still make it onto a particular raid. otherwise it's active/passive, what SFS does. From cap at nsc.liu.se Wed Sep 10 09:09:32 2008 From: cap at nsc.liu.se (Peter Kjellstrom) Date: Wed, 10 Sep 2008 18:09:32 +0200 Subject: [Beowulf] Lustre failover In-Reply-To: References: <200809101622.14020.bs@q-leap.de> Message-ID: <200809101809.37551.cap@nsc.liu.se> On Wednesday 10 September 2008, Mark Hahn wrote: > >> active/active seems strange to me - it implies that the bottleneck > >> is the OSS (OST server), rather than the disk itself. and a/a means > >> each OSS has to do more locking for the shared disk, which would seem > >> to make the problem worse... > > > > No, you can do active/active with several systems > > > > Raid1 > > / \ > > OSS1 OSS2 > > \ / > > Raid2 > > > > > > (Raid1 and Raid2 are hardware raid systems). > > > > Now OSS1 will primarily serve Raid1 and OSS2 will primarily serve Raid2. > > So > > yes, I know - that's how HP SFS is set up. the OP was talking > active-active, though, meaning that IO at any instant can go to either OSS > and still make it onto a particular raid. otherwise it's active/passive, > what SFS does. I have a real hard time understanding how lustre could manage an active/active OST. This based on the fact that an OST is essentially a ldiskfs(ext4) filesystem on a device and this setup does not work in a situation where more than one entity modifies the data. I think that what the lustre manual is refering to is a setup with two OSTs on a pair of servers. In this config one server would be active for one OST and passive for the other (and vice versa). /Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part. URL: From gerry.creager at tamu.edu Wed Sep 10 15:08:10 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed, 10 Sep 2008 17:08:10 -0500 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <35AA6914-6FBC-4ECF-9FD3-035DCC432937@gmail.com> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> <35AA6914-6FBC-4ECF-9FD3-035DCC432937@gmail.com> Message-ID: <48C8454A.4030208@tamu.edu> Andrew Holway wrote: > >> After finally diagnosing the problem, the phone support then scheduled a >> technician to come out with a new PERC card and motherboard to replace >> one or both of them. At that point, they could have skipped the on-site >> technician and let me replace those parts myself. When the technician >> showed up a couple days later, he was in and out in less than an hour. >> >> Again, I'm not picking on Dell specifically. I've seen this behavior >> with other large vendors. My point is that "on-site support" usually >> isn't always, so don't believe the hype. > > Again this is down to the service level you buy. Dell have an "on site > after phone diagnosis" which is marginally cheaper than the standard. > What was on your quote? I can't speak for Prentice, but ours claimed, "On-Site, NBD" which I'd interpreted to mean they were going to work with us for on-site diagnosis and support. I was wrong. -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From jlforrest at berkeley.edu Wed Sep 10 15:19:10 2008 From: jlforrest at berkeley.edu (Jon Forrest) Date: Wed, 10 Sep 2008 15:19:10 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C8454A.4030208@tamu.edu> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> <35AA6914-6FBC-4ECF-9FD3-035DCC432937@gmail.com> <48C8454A.4030208@tamu.edu> Message-ID: <48C847DE.3060704@berkeley.edu> Going back to the original topic (I think), I'm planning on starting a pilot project to get people around here interested in CUDA GPU computing. Since this is a pilot I'm not interested in spending a lot of money. What do people think the best bang for the buck is as far as NVidia CUDA graphics boards? Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From diep at xs4all.nl Wed Sep 10 16:27:35 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu, 11 Sep 2008 01:27:35 +0200 (CEST) Subject: [Beowulf] GPU boards and cluster servers Message-ID: <20080911012039.E29466-100000@xs2.xs4all.nl> John, I'd go for AMD thing. Think about it more than 4x more cache a stream processor, as they got 64 of them doing 5 instructions a cycle or so, versus nvidia has 240 of them. Seymour Cray's law has a better balance for it than Nvidia. Additionally it will be easier to find documentation and information about AMD as they are a processor manufacturer used to give out information about their hardware, Nvidia still has to learn that. As for speed of course today Nvidia with a new GPU faster end this year AMD, next year who knows, but usually it will be turning a coin each time. Each newer GPU you can assume to be faster. Adding cores for those guys is relative easy in contradiction for CPU's. In case you plan to make an algorithm that's not embarrassingly parallel, Nvidia has a problem that AMD doesn't. It has 2 layers of parallellism versus AMD just a single one. AFAIK in AMD/ATI you've got 64 processors that get each the same instruction stream, justlike 1 block of nvidia; but nvidia additionally to that has also a grid of blocks; that means you have to make special parallellistic algorithm also between blocks which is different from the parallellism from just 64 stream processors that execute instructions @ 5 units at a time. Additionally debugging blocks is going to be tougher than debugging 1 block; If you have 1 block that all executes the same code at the same time, then that's reasonable deterministic (could be memory writes to the same adress aren't deterministic in case you plan to do those). Think about it 4x more cache a stream processor (assuming cards have same amount of cache and potential, which averaged over a few years of time will be the same). Crucial to FFT type workloads. Vincent From jlforrest at berkeley.edu Wed Sep 10 16:46:39 2008 From: jlforrest at berkeley.edu (Jon Forrest) Date: Wed, 10 Sep 2008 16:46:39 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <20080911012039.E29466-100000@xs2.xs4all.nl> References: <20080911012039.E29466-100000@xs2.xs4all.nl> Message-ID: <48C85C5F.3030808@berkeley.edu> Vincent Diepeveen wrote: > John, > > I'd go for AMD thing. [justification for above snipped] This all may be true, but I don't see anything on the AMD web site that's anywhere near as complete as the CUDA development tools on the NVidia site. What I'm hearing from putting my ears to the railroad tracks is the OpenCL is going to be the Great Unification of all the various approaches to GPU computing. This is because the application vendors can't stand doing a CUDA port, an AMD port, ... With OpenCL in theory they'll only have to do one port, and the tool chain creators will only have to create one tool chain. Anyway, I wasn't really asking about all this. I was only wondering which board provides the most power for the least amount of money for a near-term pilot project. Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From lindahl at pbm.com Wed Sep 10 17:00:30 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Wed, 10 Sep 2008 17:00:30 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <48C85C5F.3030808@berkeley.edu> References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu> Message-ID: <20080911000030.GA9264@bx9.net> On Wed, Sep 10, 2008 at 04:46:39PM -0700, Jon Forrest wrote: > Anyway, I wasn't really asking about all this. > I was only wondering which board provides the > most power for the least amount of money for > a near-term pilot project. Well, then, why don't you run it on a low-end card that you already have (finite/free = infinity)? If you aren't going to bother to constrain the problem, you're going to get bogus answers. -- greg From libo at buaa.edu.cn Wed Sep 10 17:16:26 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Thu, 11 Sep 2008 08:16:26 +0800 Subject: [Beowulf] Re: GPU boards and cluster servers. References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> <35AA6914-6FBC-4ECF-9FD3-035DCC432937@gmail.com><48C8454A.4030208@tamu.edu> <48C847DE.3060704@berkeley.edu> Message-ID: <001301c913a3$a60ff370$6300a8c0@LIBO> Just for development, there are no difference to choose AMD or NVidia. CUDA is a bit better than Brook+/CAL in programming model, only TLP needs to be considered. And some applications or workloads are suitable for CUDA while some suitable for Brook+/CAL. Buying cheap gaming cards costs little. If for CUDA, GTX280 costing 300 Euro is OK, for AMD, a 4870X2 costing 400 Euro is a top one. Regards, Li, Bo ----- Original Message ----- From: "Jon Forrest" To: Sent: Thursday, September 11, 2008 6:19 AM Subject: Re: [Beowulf] Re: GPU boards and cluster servers. > Going back to the original topic (I think), I'm > planning on starting a pilot project to get > people around here interested in CUDA GPU computing. > Since this is a pilot I'm not interested in spending > a lot of money. > > What do people think the best bang for the buck is > as far as NVidia CUDA graphics boards? > > Cordially, > -- > Jon Forrest > Research Computing Support > College of Chemistry > 173 Tan Hall > University of California Berkeley > Berkeley, CA > 94720-1460 > 510-643-1032 > jlforrest at berkeley.edu > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jlforrest at berkeley.edu Wed Sep 10 17:38:09 2008 From: jlforrest at berkeley.edu (Jon Forrest) Date: Wed, 10 Sep 2008 17:38:09 -0700 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <20080911000030.GA9264@bx9.net> References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu> <20080911000030.GA9264@bx9.net> Message-ID: <48C86871.7000602@berkeley.edu> Greg Lindahl wrote: > > Well, then, why don't you run it on a low-end card that you already > have (finite/free = infinity)? If you aren't going to bother to > constrain the problem, you're going to get bogus answers. Easy. Because I don't already have a low-end card. What I'm going to try to do is to be able to show the faculty and grad students around here how easy it is to get a significant performance improvement by using CUDA as compared to using their normal i386 or x86_64 processors. The actual performance improvement isn't that important because even if it's just a 2X improvement it will be easy to justify. I'm expecting it to be a lot more because much of what goes on around here has already been ported and summarized on the CUDA web site with >=10X improvements. Then, once I've hooked the faculty I'll get them to buy a high-end card to get maximum performance. -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From libo at buaa.edu.cn Wed Sep 10 20:27:14 2008 From: libo at buaa.edu.cn (Li, Bo) Date: Thu, 11 Sep 2008 11:27:14 +0800 Subject: [Beowulf] Re: GPU boards and cluster servers References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu><20080911000030.GA9264@bx9.net> <48C86871.7000602@berkeley.edu> Message-ID: <002801c913be$4e07d560$6300a8c0@LIBO> Hello, For some workloads if SP satisfied and suitable for GPGPU, it will bring about 20X improvement or more for less money and power consumption. If not, maybe only 4X or even slower than CPU. Regards, Li, Bo ----- Original Message ----- From: "Jon Forrest" To: "Greg Lindahl" Cc: Sent: Thursday, September 11, 2008 8:38 AM Subject: Re: [Beowulf] Re: GPU boards and cluster servers > Greg Lindahl wrote: >> >> Well, then, why don't you run it on a low-end card that you already >> have (finite/free = infinity)? If you aren't going to bother to >> constrain the problem, you're going to get bogus answers. > > Easy. Because I don't already have a low-end card. > > What I'm going to try to do is to be able to show > the faculty and grad students around here how > easy it is to get a significant performance improvement > by using CUDA as compared to using their normal > i386 or x86_64 processors. The actual performance > improvement isn't that important because even if it's > just a 2X improvement it will be easy to justify. > I'm expecting it to be a lot more because much > of what goes on around here has already been ported > and summarized on the CUDA web site with >=10X improvements. > > Then, once I've hooked the faculty I'll get them to buy > a high-end card to get maximum performance. > > -- > Jon Forrest > Research Computing Support > College of Chemistry > 173 Tan Hall > University of California Berkeley > Berkeley, CA > 94720-1460 > 510-643-1032 > jlforrest at berkeley.edu > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gdjacobs at gmail.com Thu Sep 11 00:20:21 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Thu, 11 Sep 2008 02:20:21 -0500 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <48C86871.7000602@berkeley.edu> References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu> <20080911000030.GA9264@bx9.net> <48C86871.7000602@berkeley.edu> Message-ID: <48C8C6B5.9040805@gmail.com> Jon Forrest wrote: > Greg Lindahl wrote: >> >> Well, then, why don't you run it on a low-end card that you already >> have (finite/free = infinity)? If you aren't going to bother to >> constrain the problem, you're going to get bogus answers. > > Easy. Because I don't already have a low-end card. > > What I'm going to try to do is to be able to show > the faculty and grad students around here how > easy it is to get a significant performance improvement > by using CUDA as compared to using their normal > i386 or x86_64 processors. The actual performance > improvement isn't that important because even if it's > just a 2X improvement it will be easy to justify. > I'm expecting it to be a lot more because much > of what goes on around here has already been ported > and summarized on the CUDA web site with >=10X improvements. > > Then, once I've hooked the faculty I'll get them to buy > a high-end card to get maximum performance. Well, since you're starting from zero, you can start with either an AMD card, such as the 4850, or an NVidia card, such as one of the 9800 series. The 4850 does DP while the NVidia has a slightly more mature tool set. Take your pick for less than $200 either way. Heck, you can purchase an AMD 4670 or NVidia 9500GT (both provisos the same). Neither has the same level of performance, but either one can be had for <$100, so you can afford to purchase both and do some experiments. If you need DP, the cheapest NVidia card is the GT260 at ~$270. -- Geoffrey D. Jacobs From ajt at rri.sari.ac.uk Thu Sep 11 04:03:40 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Thu, 11 Sep 2008 12:03:40 +0100 Subject: [Beowulf] MOSIX2 Message-ID: <48C8FB0C.8000901@rri.sari.ac.uk> Is anyone using MOSIX2? I'm still running openMosix on our Beowulf cluster - We planned to migrate (pun intended!) to Kerrighed but, although it looks very promising, it is still too fragile for a production cluster. If one node crashes the entire cluster goes down... I ruled out MOSIX2 as an alternative at first, because it is not FLOSS. However, it is open source and free for non-commercial use. I'd be very interested to hear from anyone who is using MOSIX2 on a production cluster or just evaluating it as I am about to do. Thanks, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:ajt at rri.sari.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From franz.marini at mi.infn.it Thu Sep 11 04:56:39 2008 From: franz.marini at mi.infn.it (Franz Marini) Date: Thu, 11 Sep 2008 13:56:39 +0200 Subject: [Beowulf] Re: GPU boards and cluster servers. Message-ID: <1221134199.9640.6.camel@merlino.mi.infn.it> Hi, On Wed, 2008-09-10 at 15:19 -0700, Jon Forrest wrote: > Going back to the original topic (I think), I'm > planning on starting a pilot project to get > people around here interested in CUDA GPU computing. > Since this is a pilot I'm not interested in spending > a lot of money. > > What do people think the best bang for the buck is > as far as NVidia CUDA graphics boards? If you're not interested (or need) DP, you can start with a cheap 9800GT. I wouldn't recommend a 9800GTX because for a little more you can get a GTX260 which is faster and has DP support. If you want to have an idea on how a "pro" CUDA board (that is, a Tesla) will perform, you can get a GTX280, which is top of the line. F. --------------------------------------------------------- Franz Marini Prof. R. A. Broglia Theoretical Physics of Nuclei, Atomic Clusters and Proteins Research Group Dept. of Physics, University of Milan, Italy. email : franz.marini at mi.infn.it phone : +39 02 50317226 --------------------------------------------------------- From prentice at ias.edu Thu Sep 11 07:01:24 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu, 11 Sep 2008 10:01:24 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers. In-Reply-To: <48C8454A.4030208@tamu.edu> References: <25e9e5ad0809060826n33904e52pb9f3325fa254cc20@mail.gmail.com> <6DB5B58A8E5AB846A7B3B3BFF1B4315A02504889@AVEXCH1.qlogic.org> <48C575DC.2000608@ias.edu> <20080909001105.GE997@bx9.net> <48C6D0BE.5090201@ias.edu> <35AA6914-6FBC-4ECF-9FD3-035DCC432937@gmail.com> <48C8454A.4030208@tamu.edu> Message-ID: <48C924B4.40706@ias.edu> Gerry Creager wrote: > Andrew Holway wrote: >> >>> After finally diagnosing the problem, the phone support then scheduled a >>> technician to come out with a new PERC card and motherboard to replace >>> one or both of them. At that point, they could have skipped the on-site >>> technician and let me replace those parts myself. When the technician >>> showed up a couple days later, he was in and out in less than an hour. >>> >>> Again, I'm not picking on Dell specifically. I've seen this behavior >>> with other large vendors. My point is that "on-site support" usually >>> isn't always, so don't believe the hype. >> >> Again this is down to the service level you buy. Dell have an "on site >> after phone diagnosis" which is marginally cheaper than the standard. >> What was on your quote? > > I can't speak for Prentice, but ours claimed, "On-Site, NBD" which I'd > interpreted to mean they were going to work with us for on-site > diagnosis and support. I was wrong. Ditto. We pay for the on-site support. -- Prentice From gus at ldeo.columbia.edu Thu Sep 11 08:23:10 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 11 Sep 2008 11:23:10 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <48C86871.7000602@berkeley.edu> References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu> <20080911000030.GA9264@bx9.net> <48C86871.7000602@berkeley.edu> Message-ID: <48C937DE.4040504@ldeo.columbia.edu> Hi Jon and list I am trying to work on a similar project here. With tight funding, I had to go convince the director to buy one NVidia GeForce 9800 GTX (512MB memory) for testing. The card cost about $200 (before taxes). You may find better prices on Newegg and other places. If your budget is tighter than this, you may buy an GeForce 8800 GT for less, and still do useful work. Beware of the number of processors and the "compute capability" (see below) of the card you buy. They vary even within a same card series (8800 has a lot of different models). I am hoping to get a bit of spare time to work on a few programs to make the case for it. Or not, as these things seem to suck a lot of power, and the utility bill may drive the sponsor of my project mad, if we outfit all workstations here with these video cards. :) The CUDA API doesn't look as friendly as, say, OpenMP or MPI, and the the time invested in programming may only be worth on a pilot project. Or things may get better if a holy grail type of great unification API, such as the OpenCL that you mentioned, comes true. Anyway, if say, I can accelerate Matlab, it is already a big deal for a lot of people here that use Matlab to do small programming projects and data analysis. Like it or not, Matlab is the "de facto" programming language for the vast majority of scientists and science students. For code that uses FFTs or BLAS in a regular pattern / loop, porting it to use CUDA/GPU shouldn't be very hard, as CUDA has libraries for both. You need to make sure your computer supports the card you buy (unless you want to buy a new computer). Requirements vary according to the card, and most vendors / manufacturers post the specs and requirements on their web sites. For the card I bought the minimum was a 500W power supply with two PCIe 6-pin connectors. However, this card came with two molex-to-pci-e-6-pin adapters (which I didn't use, my PS had the needed PCIe connectors). Higher end cards (9800 GX2, GTX 260 and 280) seem to require 8-pin PCIe connectors, and probably require a beefed up power supply. I don't know if there are molex adapters for PCIe 8-pin connectors. My card also required one PCIe 16x slot available. I'd guess this is what most cards require. However, the card is thick and knocks out the space of the next PCI(e) slot (on my mobo a PCIe 8x). Make sure you have this much room to spare on your chassis. Mine is a workstation tower, but for rackmount chassis you may need a riser card, etc. I would guess these cards won't fit a 1U chassis, but I may be wrong. The motherboard can have either a PCIe 1.1 or 2.0 bus, but the card will work at the lower data rate available. Check the FAQ on the PCIe site about this: http://www.pcisig.com/news_room/faqs/pcie2.0_faq/ My mobo has PCIe 1.1, and only very recent ones seem to be 2.0. So, performance may not be stellar, but hopefully it will be OK. You may want to check the "CUDA enabled" card capabilities on the NVidia site. The CUDA Programming Guide Appendix A has the details such as number of processors and "compute capability" (in a nutshell, 1.0 is the basic 32-bit capability, 1.1 and 1.2 add a bit of functionality, 1.3 adds double precision support). See: http://www.nvidia.com/object/cuda_develop.html under "Documentation". Also, it is worth taking a look at the NVidia "CUDA on Linux" forum: http://forums.nvidia.com/index.php?s=13c4ff7c2bee768ff185e581fa17ff24&showforum=68 I posted questions about hardware requirements and software compatibility there (my card is under Fedora Core 8), and have got very helpful answers: http://forums.nvidia.com/index.php?showtopic=72798 I hope this helps. Gus Correa -- --------------------------------------------------------------------- Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- Jon Forrest wrote: > Greg Lindahl wrote: > >> >> Well, then, why don't you run it on a low-end card that you already >> have (finite/free = infinity)? If you aren't going to bother to >> constrain the problem, you're going to get bogus answers. > > > Easy. Because I don't already have a low-end card. > > What I'm going to try to do is to be able to show > the faculty and grad students around here how > easy it is to get a significant performance improvement > by using CUDA as compared to using their normal > i386 or x86_64 processors. The actual performance > improvement isn't that important because even if it's > just a 2X improvement it will be easy to justify. > I'm expecting it to be a lot more because much > of what goes on around here has already been ported > and summarized on the CUDA web site with >=10X improvements. > > Then, once I've hooked the faculty I'll get them to buy > a high-end card to get maximum performance. > From coutinho at dcc.ufmg.br Thu Sep 11 09:55:29 2008 From: coutinho at dcc.ufmg.br (Bruno Coutinho) Date: Thu, 11 Sep 2008 13:55:29 -0300 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <48C937DE.4040504@ldeo.columbia.edu> References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu> <20080911000030.GA9264@bx9.net> <48C86871.7000602@berkeley.edu> <48C937DE.4040504@ldeo.columbia.edu> Message-ID: 2008/9/11 Gus Correa > Hi Jon and list > > I am trying to work on a similar project here. > With tight funding, I had to go convince the director to buy one NVidia > GeForce 9800 GTX > (512MB memory) for testing. > The card cost about $200 (before taxes). You may find better prices on > Newegg and other places. > If your budget is tighter than this, you may buy an GeForce 8800 GT for > less, and still do useful work. > Beware of the number of processors and the "compute capability" (see > below) of the card you buy. > They vary even within a same card series (8800 has a lot of different > models). > > I recommend the Geforce gtx 260. I costs $300, but it has compute capability 1.3. This adds double support, and allows you make coalesced memory access much more easily (see section 5.1.2.1 of cuda manual). -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpnabar at gmail.com Mon Sep 8 17:30:03 2008 From: rpnabar at gmail.com (Rahul Nabar) Date: Mon, 8 Sep 2008 19:30:03 -0500 Subject: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing In-Reply-To: References: Message-ID: I was experimenting with using channel bonding my twin eth ports to get a combined bandwidth of (close to) 2 Gbps. The two relevant modes were 4 (802.3ad) and 6 (alb=Adaptive Load Balancing). I was trying to compare performance for both. Before running any sophisticated tests by netperf etc. I just tried to copy a large file via scp and timed the two file-copies. Option1: from node1 to node2. Both nodes have their twin ports bonded together as bond0 with mode=4 (802.3ad). They are connected via a Dell PowerConnect 6248 switch. Configured the switch so that I have two LAG groups combining the two ports coming from the same node. LACP was turned on. Option2: from node3 to node4. Use mode=6 (alb=Adaptive Load Balancing) No special switch config. No LAG. No LACP. Result: For a 4GB file-transfer. Both modes took the same time; approx 1min26 sec. These results are very mystifying to me. I was expecting mode4 (802.3ad ) to be almost twice as fast since it is the only mode which truly aggregates the twin channels. It ought to be the only one effective for a peer-to-peer communication (mode 6 would only help while talking with more than one peer) Any comments? Also the net file transfer speed seems way lower than what I'd expect from a close to 2 Gbps connect; even accounting for the protocol overheads. Do other people have some numbers for me from their systems? -- Rahul From maurice at harddata.com Tue Sep 9 16:39:43 2008 From: maurice at harddata.com (Maurice Hilarius) Date: Tue, 09 Sep 2008 17:39:43 -0600 Subject: [Beowulf] Hijacking topics In-Reply-To: <200809092313.m89NCr1b002366@bluewest.scyld.com> References: <200809092313.m89NCr1b002366@bluewest.scyld.com> Message-ID: <48C7093F.6010203@harddata.com> This is classic for this list. The topic: "Re: Re: GPU boards and cluster servers." Gets turned into a discourse on Dell hardware and related. Meanwhile, the (useful) question: "Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 Very likely a hopeless question, with this little information, but just in case: Does anyone have any 'real world' experience with 'both' of these CPUs, in terms of relative performance for 'whatever work you do' ? I realize the xeon is a faster mhz part (3.16ghz xeon vs 2.3ghz opteron) so I'm more concerned with "relative performance per mhz" " Gets totally ignored. Pardon me, ignored except for the guy who wrote: "We have "only" the Quad-Xeon boxes with E5435 and these are quite fast, indeed it seems that FFTW seems to run faster on Xeons but I have not made any benchmarks for the past ~ 9 months, so I don't know about the latest Opterons." Sorry to disturb.. let the babbling continue. -- With our best regards, //Maurice W. Hilarius Telephone: 01-780-456-9771/ /Hard Data Ltd. FAX: 01-780-456-9772/ /11060 - 166 Avenue email:maurice at harddata.com/ /Edmonton, AB, Canada http://www.harddata.com// / T5X 1Y3/ / -------------- next part -------------- An HTML attachment was scrubbed... URL: From eric.l.2046 at gmail.com Wed Sep 10 01:36:51 2008 From: eric.l.2046 at gmail.com (Eric.L) Date: Wed, 10 Sep 2008 16:36:51 +0800 Subject: [Beowulf] stace_analyzer.pl can't work. Message-ID: Hi,Jeffery I've read the article you've post in LinuxMagazine: http://www.linux-mag.com/id/6711. I believe that the tool stace_analyzer.pl will be very useful,but I've got some problems. It throws an error likes:Argument "" isn't numeric in addition (+) at ./strace_analyzer.pl line 310, <> line 409105. Have you experienced this? and resolved? BTW:I've install the perl module strict and Getopt::long, and the tool exhausts my laptop's memory(1.2G) while running. TIA. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From bernd.schubert at fastmail.fm Wed Sep 10 05:37:12 2008 From: bernd.schubert at fastmail.fm (Bernd Schubert) Date: Wed, 10 Sep 2008 14:37:12 +0200 Subject: [Beowulf] Lustre failover In-Reply-To: References: Message-ID: <200809101437.13051.bernd.schubert@fastmail.fm> On Wednesday 10 September 2008 13:41:18 andrew holway wrote: > >From the Lustre manual:- > > With OST servers it is possible to have a load-balanced active/active > configuration. > Each node is the primary node for a group of OSTs, and the failover > node for other > groups. To expand the simple two-node example, we add ost2 which is primary > on nodeB, and is on the LUNs nodeB:/dev/sdc1 and nodeA:/dev/sdd1. This > demonstrates that the /dev/ identity can differ between nodes, but both > devices must map to the same physical LUN. In this type of failover > configuration, you can > mount two OSTs on two different nodes, and format them from either node. > With failover, two OSSs provide the same service to the Lustre network in > parallel. In case > of disaster or a failure in one of the nodes, the other OSS can > provide uninterrupted > filesystem services. > For an active/active configuration, mount one OST on one node and another > OST on the other node. You can format them from either node. > > Anyone done this on a production system? Yes, sure we do this all the time. > > Experiances? Comments? You should use either careful manual failover or heartbeat + stonith to prevent accidental double mounts. Usually we have a setup like this: http://www.pci.uni-heidelberg.de/tc/usr/bernd/downloads/lustre/MDS.jpg Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH From bs at q-leap.de Wed Sep 10 07:22:13 2008 From: bs at q-leap.de (Bernd Schubert) Date: Wed, 10 Sep 2008 16:22:13 +0200 Subject: [Beowulf] Lustre failover In-Reply-To: References: Message-ID: <200809101622.14020.bs@q-leap.de> On Wednesday 10 September 2008 15:02:17 Mark Hahn wrote: > > With OST servers it is possible to have a load-balanced active/active > > configuration. > > Each node is the primary node for a group of OSTs, and the failover > > node for other > > ... > > > Anyone done this on a production system? > > we have a number of HP's Lustre (SFS) clusters, which use > dual-homed disk arrays, but in active/passive configuration. > it works reasonably well. > > > Experiances? Comments? > > active/active seems strange to me - it implies that the bottleneck > is the OSS (OST server), rather than the disk itself. and a/a means > each OSS has to do more locking for the shared disk, which would seem > to make the problem worse... No, you can do active/active with several systems Raid1 / \ OSS1 OSS2 \ / Raid2 (Raid1 and Raid2 are hardware raid systems). Now OSS1 will primarily serve Raid1 and OSS2 will primarily serve Raid2. So you have an active active situation. We usually do this with even more hardware raid systems, mirrored as software raid1 for optimal high availibility. Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH From Greg at keller.net Wed Sep 10 08:08:27 2008 From: Greg at keller.net (Greg Keller) Date: Wed, 10 Sep 2008 10:08:27 -0500 Subject: [Beowulf] Re: Lustre failover In-Reply-To: <200809101305.m8AD5avL019850@bluewest.scyld.com> References: <200809101305.m8AD5avL019850@bluewest.scyld.com> Message-ID: <04A6C36B-5518-4491-9EC9-049887648CE4@Keller.net> Re: Lustre failover I've worked on a number of large'sh lustre configs over the years, and all of them have been configured with Active/Active type mappings. There are a few issues being confused here: 1) Active/Active does not mean both OSS are accessing the same luns at the same time. Each "pair" of OSS nodes can see the same group of OST Luns exist on shared storage, but each OSS normally accesses only it's 1/2 of them until an OSS dies. 2) Active/Active does not mean *automatic* failover. In all but 1 case I have worked on the choice was made to have a rational Human Being like creature decide if the safest/fastest repair was to bring back the original OSS, or to failover the orphaned luns to their alternate OSS node. When the phone rings at 3am the rational and human being like descriptors are diminished, but still smarter than most scripts at assessing the best response to a failure. 3) Automatic Failover is completely doable if you can STONITH the failed node (Shoot the other node in the head). With a good network controlled power strip you can kill the failed node so it can't come back and continue writing to the OST Luns it used to own (which confetti's your 1s and 0s). Linux HA with heartbeats on Serial and TCP/GbE is the most common approach to automation. Once the failed/ suspect node is guaranteed not to make a surprise comback, the OST's it left behind will need to be started by the surviving OSS. 4) IPMI Power control has just enough lag/inconsisteny that the "Shooting draw" between 2 functional OSS servers can result in BOTH servers (or neither) powering down.... don't depend on it unless your IPMI implementation is ultra responsive and reliable. Make sure your script verifies it's "tripple dog sure" the other node can't come back before taking over the abandoned OST's. **Shameful Plug** DataDirect has a whitepaper that demonstrates many Lustre Failover Best Practices complete with pictures etc that is spot on from my experience. Here's the link: http://www.datadirectnet.com/resource-downloads/best-practices-for-architecting-a-lustre-based-storage-environment-download ***** Cheers! Greg On Sep 10, 2008, at 8:05 AM, beowulf-request at beowulf.org wrote: >> >> With OST servers it is possible to have a load-balanced active/active >> configuration. >> Each node is the primary node for a group of OSTs, and the failover >> node for other > ... >> Anyone done this on a production system? > > we have a number of HP's Lustre (SFS) clusters, which use > dual-homed disk arrays, but in active/passive configuration. > it works reasonably well. > >> Experiances? Comments? > > active/active seems strange to me - it implies that the bottleneck > is the OSS (OST server), rather than the disk itself. and a/a means > each OSS has to do more locking for the shared disk, which would seem > to make the problem worse... > From bernd.schubert at fastmail.fm Wed Sep 10 08:42:52 2008 From: bernd.schubert at fastmail.fm (Bernd Schubert) Date: Wed, 10 Sep 2008 17:42:52 +0200 Subject: [Beowulf] Lustre failover In-Reply-To: References: <200809101622.14020.bs@q-leap.de> Message-ID: <200809101742.52527.bernd.schubert@fastmail.fm> On Wednesday 10 September 2008 16:34:45 Mark Hahn wrote: > >> active/active seems strange to me - it implies that the bottleneck > >> is the OSS (OST server), rather than the disk itself. and a/a means > >> each OSS has to do more locking for the shared disk, which would seem > >> to make the problem worse... > > > > No, you can do active/active with several systems > > > > Raid1 > > / \ > > OSS1 OSS2 > > \ / > > Raid2 > > > > > > (Raid1 and Raid2 are hardware raid systems). > > > > Now OSS1 will primarily serve Raid1 and OSS2 will primarily serve Raid2. > > So > > yes, I know - that's how HP SFS is set up. the OP was talking > active-active, though, meaning that IO at any instant can go to either OSS > and still make it onto a particular raid. otherwise it's active/passive, > what SFS does. You mean to either OSS, but still on the very same OST ;) No, that won't work, simply not the way Lustre works. Cheers, Bernd -- Bernd Schubert Q-Leap Networks GmbH From ljdursi at gmail.com Wed Sep 10 18:42:02 2008 From: ljdursi at gmail.com (Jonathan Dursi) Date: Wed, 10 Sep 2008 21:42:02 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <48C86871.7000602@berkeley.edu> References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu> <20080911000030.GA9264@bx9.net> <48C86871.7000602@berkeley.edu> Message-ID: <384c5c0a0809101842l7ecf6917reb19d8a86c469edc@mail.gmail.com> 2008/9/10 Jon Forrest : > > What I'm going to try to do is to be able to show > the faculty and grad students around here how > easy it is to get a significant performance improvement > by using CUDA as compared to using their normal > i386 or x86_64 processors. So I agree with you about nvidia & CUDA -- there's a huge programming toolset and lots of example code for the CUDA environment, so in terms of getting your feet wet, and getting up and running quickly, that's almost certainly the way to go, even if OpenCL is going to come in and take over down the road. Any of the GeForce cards listed at http://www.nvidia.com/object/cuda_learn_products.html will do; they'll be single precision, but for testing purposes in many situations that'll be fine. They have different specs, but for the purposes of learning some CUDA, just get the beefiest one you can find for under $(line at which you start to need permission) from your favourite reseller. For many PDE-solving type applications, getting factors of 10x over the CPU is fairly straightforward, and then you really have to start thinking -- but the payoffs for some problems can be well worth it. If you haven't found it already, gpgpu and the forumns there can be useful sources of info. Jonathan -- Jonathan Dursi ljdursi at gmail.com From ljdursi at gmail.com Wed Sep 10 18:44:03 2008 From: ljdursi at gmail.com (Jonathan Dursi) Date: Wed, 10 Sep 2008 21:44:03 -0400 Subject: [Beowulf] Re: GPU boards and cluster servers In-Reply-To: <384c5c0a0809101842l7ecf6917reb19d8a86c469edc@mail.gmail.com> References: <20080911012039.E29466-100000@xs2.xs4all.nl> <48C85C5F.3030808@berkeley.edu> <20080911000030.GA9264@bx9.net> <48C86871.7000602@berkeley.edu> <384c5c0a0809101842l7ecf6917reb19d8a86c469edc@mail.gmail.com> Message-ID: <384c5c0a0809101844i27aaa544l30b576335da1fc2e@mail.gmail.com> 2008/9/10 Jonathan Dursi : > If you haven't found it already, gpgpu and the forumns there can be Uh, that should be gpgpu.org. ( and forums). Jonathan -- Jonathan Dursi ljdursi at gmail.com From lindahl at pbm.com Mon Sep 15 17:20:04 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 15 Sep 2008 17:20:04 -0700 Subject: [Beowulf] 10gig CX4 switches Message-ID: <20080916002004.GB1539@bx9.net> I have a bunch of 1gig switches with CX4 10gig uplinks (and empty X2 ports) and it's time to buy a 10gig switch. Has anyone done a recent survey of the market? I don't need any layer-3 features, just layer-2. I see that HP has a 6-port switch for ~ $4k, too small. Arastra looks nice, except that their inexpensive 10base-CR only does SFP+, not X2. And CX4 SFP+ doesn't even seem to exist (lack of space for the connector?) I know some of the big boys have expensive layer-3 10gig switches. Can anyone give me a clue? -- greg From landman at scalableinformatics.com Mon Sep 15 17:38:17 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 15 Sep 2008 20:38:17 -0400 Subject: [Beowulf] 10gig CX4 switches In-Reply-To: <20080916002004.GB1539@bx9.net> References: <20080916002004.GB1539@bx9.net> Message-ID: <48CEFFF9.4080504@scalableinformatics.com> Greg Lindahl wrote: > I have a bunch of 1gig switches with CX4 10gig uplinks (and empty X2 > ports) and it's time to buy a 10gig switch. Has anyone done a recent > survey of the market? I don't need any layer-3 features, just layer-2. > > I see that HP has a 6-port switch for ~ $4k, too small. > > Arastra looks nice, except that their inexpensive 10base-CR only does > SFP+, not X2. And CX4 SFP+ doesn't even seem to exist (lack of space > for the connector?) > > I know some of the big boys have expensive layer-3 10gig switches. > > Can anyone give me a clue? The Myricom units are pretty reasonable/nice. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From bill at Princeton.EDU Mon Sep 15 17:42:00 2008 From: bill at Princeton.EDU (Bill Wichser) Date: Mon, 15 Sep 2008 20:42:00 -0400 Subject: [Beowulf] 10gig CX4 switches In-Reply-To: <20080916002004.GB1539@bx9.net> References: <20080916002004.GB1539@bx9.net> Message-ID: <48CF00D8.3080208@princeton.edu> Have you looked at Fujitsu? Anything using the Fulcrum chip (this one does) will have the best latency numbers. And their switches are just layer 2. Bill Greg Lindahl wrote: >I have a bunch of 1gig switches with CX4 10gig uplinks (and empty X2 >ports) and it's time to buy a 10gig switch. Has anyone done a recent >survey of the market? I don't need any layer-3 features, just layer-2. > >I see that HP has a 6-port switch for ~ $4k, too small. > >Arastra looks nice, except that their inexpensive 10base-CR only does >SFP+, not X2. And CX4 SFP+ doesn't even seem to exist (lack of space >for the connector?) > >I know some of the big boys have expensive layer-3 10gig switches. > >Can anyone give me a clue? > >-- greg > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From atchley at myri.com Mon Sep 15 17:53:51 2008 From: atchley at myri.com (Scott Atchley) Date: Mon, 15 Sep 2008 20:53:51 -0400 Subject: [Beowulf] 10gig CX4 switches In-Reply-To: <48CEFFF9.4080504@scalableinformatics.com> References: <20080916002004.GB1539@bx9.net> <48CEFFF9.4080504@scalableinformatics.com> Message-ID: On Sep 15, 2008, at 8:38 PM, Joe Landman wrote: > Greg Lindahl wrote: >> I have a bunch of 1gig switches with CX4 10gig uplinks (and empty X2 >> ports) and it's time to buy a 10gig switch. Has anyone done a recent >> survey of the market? I don't need any layer-3 features, just >> layer-2. >> I see that HP has a 6-port switch for ~ $4k, too small. >> Arastra looks nice, except that their inexpensive 10base-CR only does >> SFP+, not X2. And CX4 SFP+ doesn't even seem to exist (lack of space >> for the connector?) >> I know some of the big boys have expensive layer-3 10gig switches. >> Can anyone give me a clue? > > The Myricom units are pretty reasonable/nice. We have 32 port switches which can be configured with a combination of Myrinet and Ethernet ports, but the Ethernet ports are either XFP or SFP+. We do not offer Ethernet CX4 ports. If you are looking for 12 or 20 ports, check out the Fujitsu XG switches. They have an optional cut-through mode and <400 us latency. There may be others, but I have used these. Scott From hahn at mcmaster.ca Mon Sep 15 21:13:53 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 16 Sep 2008 00:13:53 -0400 (EDT) Subject: [Beowulf] stace_analyzer.pl can't work. In-Reply-To: References: Message-ID: > It throws an error likes:Argument "" isn't numeric in addition (+) at > ./strace_analyzer.pl line 310, <> line 409105. > Have you experienced this? and resolved? I think the parser could be improved. in this case, it seems to be choking on lines like: 23:34:13.502857 read(7, 0xbfffc670, 32) = -1 EAGAIN (Resource temporarily unavailable) on 310, junk="unavailable" - the code should handle -1 specially, since it shouldn't be accumulated to $ReadBytesTotal. > BTW:I've install the perl module strict and Getopt::long, and the tool > exhausts my laptop's memory(1.2G) while running. I guess the code isn't intended to profile >409105-line traces on a laptop ;) it's a WIP, and stores a lot of stuff that it doesn't currently use, and could probably be stored more compactly... regards, mark hahn. From jan.heichler at gmx.net Tue Sep 16 00:32:24 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Tue, 16 Sep 2008 09:32:24 +0200 Subject: [Beowulf] Lustre failover In-Reply-To: <200809101622.14020.bs@q-leap.de> References: <200809101622.14020.bs@q-leap.de> Message-ID: <1865088910.20080916093224@gmx.net> Hallo Bernd, Mittwoch, 10. September 2008, meintest Du: BS> No, you can do active/active with several systems BS> Raid1 BS> / \ BS> OSS1 OSS2 BS> \ / BS> Raid2 BS> (Raid1 and Raid2 are hardware raid systems). BS> Now OSS1 will primarily serve Raid1 and OSS2 will primarily serve Raid2. So BS> you have an active active situation. this is not what you would call "active/active" in a HA-environment. If Server1 is DHCP-Server and Server2 is NFS-Server and both can take over the other service you have an "active/passive" DHCP and "active/passive" NFS. Ofr Lustre you have "active/passive" for the data on RAID1 and "active/passive" for the data on RAID2. But i know that in the Lustre world this scenario is called "active/active" because no hardware is idle ;-) Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: From laytonjb at att.net Tue Sep 16 03:47:29 2008 From: laytonjb at att.net (Jeff Layton) Date: Tue, 16 Sep 2008 03:47:29 -0700 (PDT) Subject: [Beowulf] stace_analyzer.pl can't work. Message-ID: <398011.103.qm@web80707.mail.mud.yahoo.com> As Mark alluded to, I'm not a Perl person at all :) In fact I used this analyzer as a way to learn some Perl syntax. So I'm looking for advice on how to make it better. Personally I thought about using Python since I know that much better but I wanted to experiment with Perl. I'm keeping some information around in arrays for the next stage of development - I just haven't gotten there yet. I'm working on a couple of new things - creating a histogram of the IO pattern and creating an IO simulator of the strace file. But, it's probably a good idea to go back and fix these bugs. I'm traveling and won't be able to get to my desktop until this weekend. Then I will see what I can do. Until then, if anyone has any suggestions, go for it. It if works, let me have a patch or tell me what to fix. Also, if anyone has any suggestions on better ways of storing data rather than the current method (arrays) - I'm all ears. Any thoughts on periodically dumping the arrays to a file or even just dumping the data to files as it parses? Might be slow, but it shouldn't then choke on large files. Could be a command line switch. Mark - again thanks for jumping in for me. I really appreciate it. Jeff ----- Original Message ---- From: Mark Hahn To: Eric.L Cc: beowulf at beowulf.org Sent: Tuesday, September 16, 2008 12:13:53 AM Subject: Re: [Beowulf] stace_analyzer.pl can't work. > It throws an error likes:Argument "" isn't numeric in addition (+) at > ./strace_analyzer.pl line 310, <> line 409105. > Have you experienced this? and resolved? I think the parser could be improved. in this case, it seems to be choking on lines like: 23:34:13.502857 read(7, 0xbfffc670, 32) = -1 EAGAIN (Resource temporarily unavailable) on 310, junk="unavailable" - the code should handle -1 specially, since it shouldn't be accumulated to $ReadBytesTotal. > BTW:I've install the perl module strict and Getopt::long, and the tool > exhausts my laptop's memory(1.2G) while running. I guess the code isn't intended to profile >409105-line traces on a laptop ;) it's a WIP, and stores a lot of stuff that it doesn't currently use, and could probably be stored more compactly... regards, mark hahn. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Tue Sep 16 05:13:10 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 16 Sep 2008 08:13:10 -0400 Subject: [Beowulf] stace_analyzer.pl can't work. In-Reply-To: <398011.103.qm@web80707.mail.mud.yahoo.com> References: <398011.103.qm@web80707.mail.mud.yahoo.com> Message-ID: <48CFA2D6.5070808@scalableinformatics.com> A quick patchy-patchy for 310 --- strace_analyzer.pl 2008-09-16 07:57:34.000000000 -0400 +++ strace_analyzer_new.pl 2008-09-16 08:01:45.000000000 -0400 @@ -307,7 +307,7 @@ $junk =~ s/[^0-9]//g; # Keep track of total number bytes read - $ReadBytesTotal += $junk; + $ReadBytesTotal += $junk if ($junk != -1); # Clean up write unit ($junk1, $junk2)=split(/\,/,$cmd_unit); There may be other error return codes which are negative, so if you want to filter those as well, use "(if $junk < 0)" rather than the above. As for the rest of the code structure, writing this parser isn't all that hard, and for those with smaller memories but bigger disks (and a desired to analyze large straces), we could use the DBIx::SimplePerl module. Jeff is already putting his arrays together as hashes, and that module makes it real easy to dump a hash data structure directly into a database, say a SQLite3 database. Which, curiously, could make a bit of the code easier to deal with/write/debug. The issue you have to worry about in dealing with huge streams of data, is running out of ram. This happens. Many "common" techniques fail when data gets very large (compared to something like ram). We had to solve a large upload/download problem for a customer who decided to use a web server for multi gigabyte file upload/download form in an application. The common solution was to pull everything in to ram and massage it from there. This failed rather quickly. I don't personally have large amounts of "free" time, but I could likely help out a bit with this. Jeff, do you want me to create something on our mercurial server for this? Or do you have it in SVN/CVS somewhere? Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From laytonjb at att.net Tue Sep 16 05:22:05 2008 From: laytonjb at att.net (Jeff Layton) Date: Tue, 16 Sep 2008 05:22:05 -0700 (PDT) Subject: [Beowulf] stace_analyzer.pl can't work. Message-ID: <466658.2796.qm@web80706.mail.mud.yahoo.com> I've never used Mercurial or any other "real" programming tool for tracking changes, but go for it! It forces me to learn it. I like the idea of a DB, but I'm a bit worried that this will get out of hand. It's a simple tool to do a quick analysis (although I have bigger plans in mind). I haven't looked at SQLite in a few years. Is it still an in-memory DB or does it allow you to dump the DB to a file (or two). BTW - thanks for the patch. I like the second option of ignoring any return codes that are negative. Easy change. Thanks! Jeff ----- Original Message ---- From: Joe Landman To: Jeff Layton Cc: Mark Hahn ; Eric.L ; beowulf at beowulf.org Sent: Tuesday, September 16, 2008 8:13:10 AM Subject: Re: [Beowulf] stace_analyzer.pl can't work. A quick patchy-patchy for 310 --- strace_analyzer.pl 2008-09-16 07:57:34.000000000 -0400 +++ strace_analyzer_new.pl 2008-09-16 08:01:45.000000000 -0400 @@ -307,7 +307,7 @@ $junk =~ s/[^0-9]//g; # Keep track of total number bytes read - $ReadBytesTotal += $junk; + $ReadBytesTotal += $junk if ($junk != -1); # Clean up write unit ($junk1, $junk2)=split(/\,/,$cmd_unit); There may be other error return codes which are negative, so if you want to filter those as well, use "(if $junk < 0)" rather than the above. As for the rest of the code structure, writing this parser isn't all that hard, and for those with smaller memories but bigger disks (and a desired to analyze large straces), we could use the DBIx::SimplePerl module. Jeff is already putting his arrays together as hashes, and that module makes it real easy to dump a hash data structure directly into a database, say a SQLite3 database. Which, curiously, could make a bit of the code easier to deal with/write/debug. The issue you have to worry about in dealing with huge streams of data, is running out of ram. This happens. Many "common" techniques fail when data gets very large (compared to something like ram). We had to solve a large upload/download problem for a customer who decided to use a web server for multi gigabyte file upload/download form in an application. The common solution was to pull everything in to ram and massage it from there. This failed rather quickly. I don't personally have large amounts of "free" time, but I could likely help out a bit with this. Jeff, do you want me to create something on our mercurial server for this? Or do you have it in SVN/CVS somewhere? Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Tue Sep 16 05:38:27 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 16 Sep 2008 08:38:27 -0400 Subject: [Beowulf] stace_analyzer.pl can't work. In-Reply-To: <466658.2796.qm@web80706.mail.mud.yahoo.com> References: <466658.2796.qm@web80706.mail.mud.yahoo.com> Message-ID: <48CFA8C3.5050908@scalableinformatics.com> Jeff Layton wrote: > > I've never used Mercurial or any other "real" programming tool for > tracking changes, but go for it! It forces me to learn it. Common ones these days are Mercurial and git. I prefer the former, it is understandable by mere mortals. > I like the idea of a DB, but I'm a bit worried that this will get out of > hand. It's a simple tool to do a quick analysis (although I have bigger > plans in mind). I haven't looked at SQLite in a few years. Is it still > an in-memory DB or does it allow you to dump the DB to a file (or two). On disk very simple database. Nothing even remotely complex about it. Slowly (and surely) many projects that have been using bdb (or more correctly *db) are migrating to it. We have been using it for a while for lots of little things. Works quite well, painless to set up (you don't). > > BTW - thanks for the patch. I like the second option of ignoring any > return codes that are negative. Easy change. should read ... if ($junk < 0); > > Thanks! > > Jeff > > > ----- Original Message ---- > From: Joe Landman > To: Jeff Layton > Cc: Mark Hahn ; Eric.L ; > beowulf at beowulf.org > Sent: Tuesday, September 16, 2008 8:13:10 AM > Subject: Re: [Beowulf] stace_analyzer.pl can't work. > > A quick patchy-patchy for 310 > > > > --- strace_analyzer.pl 2008-09-16 > 07:57:34.000000000 -0400 > +++ strace_analyzer_new.pl 2008-09-16 > 08:01:45.000000000 -0400 > @@ -307,7 +307,7 @@ > $junk =~ s/[^0-9]//g; > > # Keep track of total number bytes read > - $ReadBytesTotal += $junk; > + $ReadBytesTotal += $junk if ($junk != -1); > > # Clean up write unit > ($junk1, $junk2)=split(/\,/,$cmd_unit); > > > There may be other error return codes which are negative, so if you want > to filter those as well, use "(if $junk < 0)" rather than the above. > > As for the rest of the code structure, writing this parser isn't all > that hard, and for those with smaller memories but bigger disks (and a > desired to analyze large straces), we could use the DBIx::SimplePerl > module. Jeff is already putting his arrays together as hashes, and that > module makes it real easy to dump a hash data structure directly into a > database, say a SQLite3 database. Which, curiously, could make a bit of > the code easier to deal with/write/debug. > > The issue you have to worry about in dealing with huge streams of data, > is running out of ram. This happens. Many "common" techniques fail > when data gets very large (compared to something like ram). We had to > solve a large upload/download problem for a customer who decided to use > a web server for multi gigabyte file upload/download form in an > application. The common solution was to pull everything in to ram and > massage it from there. This failed rather quickly. > > I don't personally have large amounts of "free" time, but I could likely > help out a bit with this. Jeff, do you want me to create something on > our mercurial server for this? Or do you have it in SVN/CVS somewhere? > > Joe > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman at scalableinformatics.com > > web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From patrick at myri.com Tue Sep 16 06:18:10 2008 From: patrick at myri.com (Patrick Geoffray) Date: Tue, 16 Sep 2008 09:18:10 -0400 Subject: [Beowulf] 10gig CX4 switches In-Reply-To: <20080916002004.GB1539@bx9.net> References: <20080916002004.GB1539@bx9.net> Message-ID: <48CFB212.8080007@myri.com> Greg, Greg Lindahl wrote: > I see that HP has a 6-port switch for ~ $4k, too small. Don't know if it was specific to the model we tested, but hardware flow control does not work in one direction, even when turned on. > Arastra looks nice, except that their inexpensive 10base-CR only does > SFP+, not X2. And CX4 SFP+ doesn't even seem to exist (lack of space > for the connector?) There are 3 silicon vendors for small (< 24 ports) integrated 10G crossbars: Fujitsu, Fulcrum and Broadcom. They work ok, some better than others. Do a N->1 flow-control test (N >= 6) to see which one does not have enough internal buffering. Arastra switches are good, based on Fulcrum. SFP+ Twinax cables are as cheap as CX4 and much smaller footprint on the switch, that's why everybody is using it today (and QSFP for low-cost fiber). I have seen SFP+ to CX4 adapters, but I can't seem to find a source online. You may want to ask Arastra. Another solution would be to use SR fiber transceivers on both sides, one X2 and one SFP+. That would be a shame if you don't need the distance, as it will be significantly more expensive than a copper SFP+/CX4 solution. Patrick From hearnsj at googlemail.com Tue Sep 16 07:03:11 2008 From: hearnsj at googlemail.com (John Hearns) Date: Tue, 16 Sep 2008 15:03:11 +0100 Subject: [Beowulf] 10gig CX4 switches In-Reply-To: <20080916002004.GB1539@bx9.net> References: <20080916002004.GB1539@bx9.net> Message-ID: <9f8092cc0809160703k707f0b26uaa80bf0f34921f4a@mail.gmail.com> 2008/9/16 Greg Lindahl > I have a bunch of 1gig switches with CX4 10gig uplinks (and empty X2 > ports) and it's time to buy a 10gig switch. Has anyone done a recent > survey of the market? I don't need any layer-3 features, just layer-2. > > Have a look at the Quadrics switches. The smaller TG201 switch would fit the bill nicely for you - you get get it in two variants, one with 24x copper ports and one with 12x copper ports and 12x empty ports for GBICs. They're pretty cost effective, John Hearns -------------- next part -------------- An HTML attachment was scrubbed... URL: From gus at ldeo.columbia.edu Tue Sep 16 13:20:50 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 16 Sep 2008 16:20:50 -0400 Subject: [Beowulf] MS Cray Message-ID: <48D01522.6030107@ldeo.columbia.edu> Dear Beowulf and COTS fans For those of you who haven't read the news today: http://www.theregister.co.uk/2008/09/16/cray_baby_super/ IGIDH (I guess it doesn't help.) Gus Correa -- --------------------------------------------------------------------- Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- From prentice at ias.edu Tue Sep 16 13:35:31 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 16 Sep 2008 16:35:31 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D01522.6030107@ldeo.columbia.edu> References: <48D01522.6030107@ldeo.columbia.edu> Message-ID: <48D01893.4010904@ias.edu> Gus Correa wrote: > Dear Beowulf and COTS fans > > For those of you who haven't read the news today: > > http://www.theregister.co.uk/2008/09/16/cray_baby_super/ > > IGIDH (I guess it doesn't help.) > > Gus Correa > Quote from article: "It's also attempting to lure scientists and researchers with discretionary IT budgets to forget using shared, giant clusters and get their own box and tuck it in behind their desk where no one can see it to run their workloads locally. The personal supercomputer is not a new idea, but this is the first time that Cray is trying it out in the market." That will work great until the newbie scientists find that airflow into a computer tucked in "behind their desk where no one can see it" is piss poor, and that fans powerful enough to provide adequate airflow "behind the desk where no one can see it" are going to be LOUD. -- Prentice From prentice at ias.edu Tue Sep 16 14:01:41 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 16 Sep 2008 17:01:41 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <9f8092cc0809161353x5b9ca214k234e2852a2db5b5a@mail.gmail.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <9f8092cc0809161353x5b9ca214k234e2852a2db5b5a@mail.gmail.com> Message-ID: <48D01EB5.1000405@ias.edu> John Hearns wrote: > > > 2008/9/16 Prentice Bisbal > > > That will work great until the newbie scientists find that airflow into > a computer tucked in "behind their desk where no one can see it" is piss > poor, and that fans powerful enough to provide adequate airflow "behind > the desk where no one can see it" are going to be LOUD. > > Ahem. > > > "Because the CX1 sits in an office environment, the front of the chassis > has an optional noise cancellation add-on, which drops the whirring of > fan noise down to the point where it is actually legal to put it in an > office environment." > > > When I worked at Streamline, we demoed a blade cluster in a > noise-cancelling APC rack enclosure at a conference we ran. It sat > running away at full lilt at the back of the room as people delivered > their talks, and no-one noticed till we opened the front door. > I guess this is somethign similar. > You got me. I saw that when I continued reading the article *after* my post. I was hoping no one else read the article to the end. Noise-cancellation devices may help keep the noise down, but the air flow under or "behind" a desk is still a problem. Fans can only move air if there's a place for the air to come from, and a place for the air to go. -- Prentice From john.leidel at gmail.com Tue Sep 16 14:12:55 2008 From: john.leidel at gmail.com (John Leidel) Date: Tue, 16 Sep 2008 16:12:55 -0500 Subject: [Beowulf] MS Cray In-Reply-To: <48D01522.6030107@ldeo.columbia.edu> References: <48D01522.6030107@ldeo.columbia.edu> Message-ID: <1221599575.4030.227.camel@e521.site> and, a selfish plug: http://insidehpc.com/2008/09/16/cray-announces-mini-supercomputer-line/ On Tue, 2008-09-16 at 16:20 -0400, Gus Correa wrote: > Dear Beowulf and COTS fans > > For those of you who haven't read the news today: > > http://www.theregister.co.uk/2008/09/16/cray_baby_super/ > > IGIDH (I guess it doesn't help.) > > Gus Correa > From jhh3851 at yahoo.com Tue Sep 16 14:16:51 2008 From: jhh3851 at yahoo.com (Joseph Han) Date: Tue, 16 Sep 2008 14:16:51 -0700 (PDT) Subject: [Beowulf] Re: 10gig CX4 switches In-Reply-To: <200809161900.m8GJ08DZ019701@bluewest.scyld.com> Message-ID: <941626.70386.qm@web55008.mail.re4.yahoo.com> Not sure if this is beyond your budget, but what about a Force10 S2410? http://www.force10networks.com/products/s2410.asp Joseph From hearnsj at googlemail.com Tue Sep 16 14:23:21 2008 From: hearnsj at googlemail.com (John Hearns) Date: Tue, 16 Sep 2008 22:23:21 +0100 Subject: [Beowulf] MS Cray In-Reply-To: <48D01EB5.1000405@ias.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <9f8092cc0809161353x5b9ca214k234e2852a2db5b5a@mail.gmail.com> <48D01EB5.1000405@ias.edu> Message-ID: <9f8092cc0809161423i58ec03c1n92e27f631338984f@mail.gmail.com> > > You got me. I saw that when I continued reading the article *after* my > post. I was hoping no one else read the article to the end. > > Noise-cancellation devices may help keep the noise down, but the air > flow under or "behind" a desk is still a problem. Fans can only move air > if there's a place for the air to come from, and a place for the air to go. > I do agree with you there. Many times we've seen personal supercomputers touted, only for them to fade away. Fancy enclosure or no, you have to think twice about one of these under your desk. Networks are fast these days. I've also seen plenty of people who buy quite ordinary perforated steel racks and are convinced a cluster will sit happily in a work area, or in a room dedicated to the cluster plus graphics workstations (I can think of examples from Streamline and from my company previous to that). Such users tend to rapidly abandon that idea! I once got an Intel twin motherboard system to take home, for evaluation and power draw measurements for a contract at CERN. I live in a two bedroom apartment in central London. I couldn't power this thing up for fear of the neighbours being woken up, though I'm quite happy to share a machine room with 200 of the things. John Hearns -------------- next part -------------- An HTML attachment was scrubbed... URL: From lindahl at pbm.com Tue Sep 16 14:53:11 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 16 Sep 2008 14:53:11 -0700 Subject: [Beowulf] Re: 10gig CX4 switches In-Reply-To: <941626.70386.qm@web55008.mail.re4.yahoo.com> References: <200809161900.m8GJ08DZ019701@bluewest.scyld.com> <941626.70386.qm@web55008.mail.re4.yahoo.com> Message-ID: <20080916215311.GA21176@bx9.net> On Tue, Sep 16, 2008 at 02:16:51PM -0700, Joseph Han wrote: > Not sure if this is beyond your budget, but what about a Force10 S2410? > > http://www.force10networks.com/products/s2410.asp I don't care about the absolute price as long as it's cost effective. It seems that the S2410 is one of those products whose price you can't find on the web, which is usually a bad sign. It's also "interesting" that Force10 says that only qualified cables can be used with their switch. Mmmmmmm. -- greg From james.p.lux at jpl.nasa.gov Tue Sep 16 15:07:47 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Tue, 16 Sep 2008 15:07:47 -0700 Subject: [Beowulf] MS Cray In-Reply-To: <48D01893.4010904@ias.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> Message-ID: James Lux, P.E. Task Manager, SOMD Software Defined Radios Flight Communications Systems Section Jet Propulsion Laboratory 4800 Oak Grove Drive, Mail Stop 161-213 Pasadena, CA, 91109 +1(818)354-2075 phone +1(818)393-6875 fax -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Prentice Bisbal Sent: Tuesday, September 16, 2008 1:36 PM Cc: Beowulf Subject: Re: [Beowulf] MS Cray Gus Correa wrote: > Dear Beowulf and COTS fans > > For those of you who haven't read the news today: > > http://www.theregister.co.uk/2008/09/16/cray_baby_super/ > > IGIDH (I guess it doesn't help.) > > Gus Correa > Quote from article: "It's also attempting to lure scientists and researchers with discretionary IT budgets to forget using shared, giant clusters and get their own box and tuck it in behind their desk where no one can see it to run their workloads locally. The personal supercomputer is not a new idea, but this is the first time that Cray is trying it out in the market." That will work great until the newbie scientists find that airflow into a computer tucked in "behind their desk where no one can see it" is piss poor, and that fans powerful enough to provide adequate airflow "behind the desk where no one can see it" are going to be LOUD. ------------------- Well, plus ca change, plus c'est la meme chose.. The same argument was made when the IBM1130 came out, or the IBM 5150, for that matter.. It doesn't have to be loud, and the thermal issue can be dealt with in a variety of ways (some easier and more practical than others).. I could probably support a kW load in my office without causing huge problems (especially if I leave the lights off), and the amount of computation you can get for a kilowatt is always growing. There is a huge psychological advantage to having the computer physically under your management and control. You don't have folks trying to "optimize the use of a valuable institutional resource" with scheduling, etc. You might be willing to tolerate a factor of 2 hit in performance for the ability to not have to account for anyone else about how much you're using or not using it. Jim From alscheinine at tuffmail.us Tue Sep 16 15:13:16 2008 From: alscheinine at tuffmail.us (Alan Louis Scheinine) Date: Tue, 16 Sep 2008 17:13:16 -0500 Subject: [Beowulf] MS Cray In-Reply-To: <9f8092cc0809161423i58ec03c1n92e27f631338984f@mail.gmail.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <9f8092cc0809161353x5b9ca214k234e2852a2db5b5a@mail.gmail.com> <48D01EB5.1000405@ias.edu> <9f8092cc0809161423i58ec03c1n92e27f631338984f@mail.gmail.com> Message-ID: <48D02F7C.3040706@tuffmail.us> It can be viewed as a seasonal machine. To save energy in winter some institutions lower the thermostat so that wearing a sweater is necessary. Lucky is the parallel programmer with a mini-cray under the desk. A personal Cray does not simplify life for many people. I heard someone say that the disadvantage of working as a private consultant is the time spent being one's own IT department. For many people, even in technical fields, maintaining MS Windows (not to mention Linux) is difficult or at least annoying. So even a simplified Cray with MS Windows at a company would probably not be "personal" but rather maintained by the IT support. I welcome comments from other people as to whether I'm accurate about my sense of what is typical at company. The key added value (aside from the Cray nameplate) is the support for easy installation. But I remember the Cray XD1 (acquired from Octiga Bay). If sales volume is low the product line might be dropped so the specialized simplifications provided by Cray may not be useful after a few years. Best regards, Alan Scheinine -- Alan Scheinine 5010 Mancuso Lane, Apt. 621 Baton Rouge, LA 70809 Email: alscheinine at tuffmail.us Office phone: 225 578 0294 Mobile phone USA: 225 288 4176 [+1 225 288 4176] From gus at ldeo.columbia.edu Tue Sep 16 15:18:52 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 16 Sep 2008 18:18:52 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D01893.4010904@ias.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> Message-ID: <48D030CC.5020005@ldeo.columbia.edu> Hi Prentice and Beowulf fans Prentice Bisbal wrote: >Gus Correa wrote: > > >>Dear Beowulf and COTS fans >> >>For those of you who haven't read the news today: >> >>http://www.theregister.co.uk/2008/09/16/cray_baby_super/ >> >>IGIDH (I guess it doesn't help.) >> >>Gus Correa >> >> >> > >Quote from article: > >"It's also attempting to lure scientists and researchers with >discretionary IT budgets to forget using shared, giant clusters and get >their own box and tuck it in behind their desk where no one can see it >to run their workloads locally. The personal supercomputer is not a new >idea, but this is the first time that Cray is trying it out in the market." > > > I guess there are and there have been other competitors for this niche market, although maybe not with so marketable logos and brand names, with slightly different scope, etc. Two recent examples: http://sicortex.com/products/sc072_pds http://www.nvidia.com/object/tesla_d870.html Well, who knows, maybe beowulfs will dwindle, and products like these will become the HPC mainstream. Windows has such a foothold in the computer market that this may prove to be possible. Would this be the end of civilization as we know it? (Unix, Linux, COTS, reading this list, ...) Or would it be replaced by a new state of affairs, a move towards HPC machines with proprietary design and proprietary software, after which we would perhaps be back again to an open architecture? Anyway, economic cycles may not be this much cyclic. >That will work great until the newbie scientists find that airflow into >a computer tucked in "behind their desk where no one can see it" is piss >poor, and that fans powerful enough to provide adequate airflow "behind >the desk where no one can see it" are going to be LOUD. > > > To their credit, they seem to be aware of the noise problem. Quoting the article: "Because the CX1 sits in an office environment, the front of the chassis has an optional noise cancellation add-on, which drops the whirring of fan noise down to the point where it is actually legal to put it in an office environment." Otherwise, your "newbie scientist" can put his/her earbuds and pump up the volume on his Ipod, while he/she navigates through the Vista colorful 3D menus. I still think that there are savings, and perhaps some virtue, in assembling some components and replacing parts by myself, with a simple screwdriver. Or at least to be able to do so, to have this potential. But maybe this is just wishful romantic thinking. Gus Correa -- --------------------------------------------------------------------- Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- From landman at scalableinformatics.com Tue Sep 16 15:39:20 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 16 Sep 2008 18:39:20 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D030CC.5020005@ldeo.columbia.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> Message-ID: <48D03598.2050407@scalableinformatics.com> Gus Correa wrote: > Otherwise, your "newbie scientist" can put his/her earbuds and pump up > the volume on his Ipod, > while he/she navigates through the Vista colorful 3D menus. Owie .... I can just imagine the folks squawking about this at SC08 "Yes folks, you need a Cray supercomputer to make Vista run at acceptable performance ..." The machine seems to run w2k8. My own experience with w2k8 is that, frankly, it doesn't suck. This is the first time I have seen a windows release that I can say that about. The low end economics probably won't work out for this machine though, unless it is N times faster than some other agglomeration of Intel-like products. Adding windows will add cost, not performance in any noticeable way. The question that Cray (and every other vendor building non-commodity units) is how much better is this than a small cluster someone can build/buy on their own? Better as in faster, able to leap more tall buildings in a single bound, ... (Superman TV show reference for those not in the know). And the hard part will be justifying the additional cost. If the machine isn't 2x the performance, would it be able to justify 2x the price? Since it appears to be a somewhat well branded cluster, I am not sure that argument will be easy to make. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From mathog at caltech.edu Tue Sep 16 16:01:49 2008 From: mathog at caltech.edu (David Mathog) Date: Tue, 16 Sep 2008 16:01:49 -0700 Subject: [Beowulf] Re: MS Cray Message-ID: Prentice Bisbal wrote: > Quote from article: > > "It's also attempting to lure scientists and researchers with > discretionary IT budgets to forget using shared, giant clusters and get > their own box and tuck it in behind their desk where no one can see it > to run their workloads locally. The personal supercomputer is not a new > idea, but this is the first time that Cray is trying it out in the market." > > That will work great until the newbie scientists find that airflow into > a computer tucked in "behind their desk where no one can see it" is piss > poor, and that fans powerful enough to provide adequate airflow "behind > the desk where no one can see it" are going to be LOUD. Well, they _might_ be able to muffle the air flow noise with an expensive, and most likely large, case design. The one thing they can't get away from is the heat. Anything "super", meaning much faster than an average desktop, is going to use a lot of power, and 1000W is roughly where I would imagine this sort of machine would sit, the upper limit being set by the need to leave at least 500W for incidentals like small printers, phones, displays and the like, on the same 15A circuit. 1000W is too much heat to have under ones desk, unless a dramatically lowered sperm count is the goal. It is also pushing the edge of the AC capacity for a small office. On the other hand, if Cray can make it reasonably quiet it should be ok in a lab environment, even at 1000W. Labs have better air flow, higher AC capacity, and better power (multiple 20A circuits are common) than do offices. Labs are better, but not THAT much better, so if everybody in the lab wants one, you might be back to needing a machine room, at which point the expensive quiet case was a waste of money. Or worse, they might not fit in racks. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From gus at ldeo.columbia.edu Tue Sep 16 17:10:59 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 16 Sep 2008 20:10:59 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D03598.2050407@scalableinformatics.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> Message-ID: <48D04B13.5060303@ldeo.columbia.edu> Hi Joe and fellow Beowulf fans Joe Landman wrote: > Gus Correa wrote: > >> Otherwise, your "newbie scientist" can put his/her earbuds and pump >> up the volume on his Ipod, >> while he/she navigates through the Vista colorful 3D menus. > > > Owie .... I can just imagine the folks squawking about this at SC08 > "Yes folks, you need a Cray supercomputer to make Vista run at > acceptable performance ..." > :) > > The machine seems to run w2k8. My own experience with w2k8 is that, > frankly, it doesn't suck. This is the first time I have seen a > windows release that I can say that about. > > The low end economics probably won't work out for this machine though, > unless it is N times faster than some other agglomeration of > Intel-like products. Adding windows will add cost, not performance in > any noticeable way. What if performance is not the main goal? Here is what the article has to say about it: "Microsoft's strategy - one that no supercomputer maker and no X64 chip maker can ignore - is to attack from the bottom, to find those myriad new HPC users who never learned Unix, never learned Linux, and have no desire to." There have been several long and heated discussions on this list about computer literacy and computer education for scientists and science students. Mostly centered on computer languages, not so much was said about Unix/Linux proficiency, bits of shell or scripting language skills, and the rudiments of Unix/Linux tools and programming environment. I don't intend to reopen them. However, was Microsoft listening to those discussions? > > The question that Cray (and every other vendor building non-commodity > units) is how much better is this than a small cluster someone can > build/buy on their own? Better as in faster, able to leap more tall > buildings in a single bound, ... (Superman TV show reference for those > not in the know). And the hard part will be justifying the additional > cost. If the machine isn't 2x the performance, would it be able to > justify 2x the price? Since it appears to be a somewhat well branded > cluster, I am not sure that argument will be easy to make. > > You are right about the economics, at least if we consider hardware alone. According to the article the full configuration has 64 Xeon 3.4GHZ cores, equivalent to eight cluster nodes with IB hardware. The "fully loaded" machine price is $80k, or $10k per node. Quoting from the article: "A single chassis can house a maximum of 4 TB of disk or -when using the fastest 3.4 GHz quad-core Xeons Intel has delivered - up to 768 gigaflops of computing power in a single chassis. (That's eight two-socket blades using quad-core Xeons, for a total of 64 cores). Obviously, three of these CX1s linked up yields 2.3 teraflops - a nice size for a personal super." "The base price of the chassis with bare bones blades and switches is $25,000. When the machine is fully loaded, the price tag comes to around $80,000 or so. Cray is selling the CX1 boxes online starting today - the first time a Cray machine has been sold online and directly - and expects to have volume shipments revved up by the end of October." *** Here is the link to the CX1 on the Cray web site: http://www.cray.com/products/CX1.aspx You need MS Explorer to customize/price it. Gus Correa From lindahl at pbm.com Tue Sep 16 18:40:15 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 16 Sep 2008 18:40:15 -0700 Subject: [Beowulf] MS Cray In-Reply-To: <48D03598.2050407@scalableinformatics.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> Message-ID: <20080917014015.GA13551@bx9.net> On Tue, Sep 16, 2008 at 06:39:20PM -0400, Joe Landman wrote: > The question that Cray (and every other vendor building non-commodity > units) is how much better is this than a small cluster someone can > build/buy on their own? My impression has been that so far, there hasn't been a huge market discovered for supercomputing appliances. Which is a shame, but there you have it. It has been tried. And if anyone does make it work, I'm sure everyone else will be all over it. Cray also has a well-known inability to sell low-priced systems, including the Y-MP/EL and XD1. Perhaps the 3rd time's the charm. -- greg From tjrc at sanger.ac.uk Tue Sep 16 23:49:12 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed, 17 Sep 2008 07:49:12 +0100 Subject: [Beowulf] MS Cray In-Reply-To: References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> Message-ID: On 16 Sep 2008, at 11:07 pm, Lux, James P wrote: > There is a huge psychological advantage to having the computer > physically under your management and control. You don't have folks > trying to "optimize the use of a valuable institutional resource" > with scheduling, etc. You might be willing to tolerate a factor of > 2 hit in performance for the ability to not have to account for > anyone else about how much you're using or not using it. And then they all expect the central systems support group to get it running for them, and to fix it when it breaks, and to generally maintain it. Suddenly you have dozens of completely different systems scattered far and wide across your site, and you're starting to get complaints that the support group are unobtainable these days - they're never at their desk any more, and don't seem to have any time to build new stuff any more. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From eugen at leitl.org Wed Sep 17 01:33:55 2008 From: eugen at leitl.org (Eugen Leitl) Date: Wed, 17 Sep 2008 10:33:55 +0200 Subject: [Beowulf] Cray, Intel, and Microsoft birth baby supercomputer Message-ID: <20080917083355.GG25850@leitl.org> http://www.theregister.co.uk/2008/09/16/cray_baby_super/ Cray, Intel, and Microsoft birth baby supercomputer Gigaflops for mom and pop shops By Timothy Prickett Morgan ? Get more from this author Posted in Servers, 16th September 2008 18:25 GMT Supercomputer maker Cray today announced a new desk-side, low-end, bladed office supercomputer in conjunction with chip partner Intel and software partner Microsoft. The new CX1 supercomputer is the first product to come to market after Cray tapped Intel as its future strategic chip supplier, dissing long-time chip partner Advanced Micro Devices. Cray, which has been struggling financially since clustered Linux boxes became the rage in supercomputing a decade ago, is known for creating the fastest vector and parallel supercomputers in the world, and with the CX1, it is trying to push down into a market where newbies in life sciences, digital rendering, financial services, and other fields are playing around with supers for the first time. It's also attempting to lure scientists and researchers with discretionary IT budgets to forget using shared, giant clusters and get their own box and tuck it in behind their desk where no one can see it to run their workloads locally. The personal supercomputer is not a new idea, but this is the first time that Cray is trying it out in the market. If you want to cut off the air that Linux breathes, as Microsoft certainly does, one of the choke points where you try to get your Windows tentacles wrapped around is supercomputing, or what people for some reason now call high performance computing. But to take on Linux in HPC requires a slightly different tack than what worked for Windows in the data center, and it requires something a little more subtle than the cheap software and portability across architectures that made Linux the darling of academic, government, and corporate supercomputing centers in a mere decade, supplanting Unix. Microsoft's strategy - one that no supercomputer maker and no X64 chip maker can ignore - is to attack from the bottom, to find those myriad new HPC users who never learned Unix, never learned Linux, and have no desire to. This strategy is what moved Windows from the desktop to the data center in the 1990s, and it worked so brilliantly that Windows machines account for more than two-thirds of server revenues each quarter and the lion's share of shipments. People use the software they are comfortable with, and Linux was an easy transition for Unix shops, just as moving from a Windows desktop to Windows servers is relatively simple. While technically speaking, the CX1 minisuper is certified to run Red Hat's Enterprise Linux 5 and can certainly run Novell's SUSE Linux Enterprise Server 10 (which is the preferred Linux on Cray's high-end, massively parallel Opteron boxes, the XT4 and XT5, and which has not been certified on the CX1), Red Hat and Novell were not invited to the CX1 launch party, while Burton Smith, now a technical fellow for parallel computing at Microsoft and formerly the chief scientist at Cray and the company that ate it in March 2000, Tera Computer, as well as Kyril Faenov, general manager of the Windows HPC business at Microsoft, were given great swaths of time during the launch to espouse the virtues of Windows HPC Server 2008. You do the math. Windows HPC Server 2008 is the latest implementation of the supercomputer-tuned variant of Windows, which includes support for the Message Passing Interface (MPI) protocol used to create supercomputer clusters as well as optimizations in Windows to make it better able to squeeze every ounce of performance out of X64 iron. Microsoft is poised to announce Windows HPC Server 2008 next week in New York at a supercomputing event hosted on Wall Street - provided any of the big banks and trading houses are still there to host the event. The CX1 minisuper is really a blade chassis with integrated Gigabit Ethernet and InfiniBand switches that can be loaded up with blades for computing, storage, and visualization - the latter being what you and I would call in the old days "being a graphics workstation." The blade chassis has eight slots and runs off normal wall power, not the 240-volt power required for big iron in data centers. The CC48 blade has a single Xeon socket and lots of memory expansion for memory-intensive HPC workloads, while the CC54 blade has two Xeon slots and a little less memory. The CV54-01 blade is what is called a visualization node, and it is basically a workstation on a blade, complete with a high-end nVidia graphics card and an optional GPU to boost display capabilities. The CS54-04 blade is a storage blade that takes up two slots and has four 2.5-inch SAS drives and the CS54-08 blade takes up three slots and offers eight drives. Using the integrated switches in the CX1, up to three boxes can be lashed together into a cluster, and for those shops who need to add more power to their clusters, they can use external switches to do the job. A single chassis can house a maximum of 4 TB of disk or -when using the fastest 3.4 GHz quad-core Xeons Intel has delivered - up to 768 gigaflops of computing power in a single chassis. (That's eight two-socket blades using quad-core Xeons, for a total of 64 cores). Obviously, three of these CX1s linked up yields 2.3 teraflops - a nice size for a personal super. Because the CX1 sits in an office environment, the front of the chassis has an optional noise cancellation add-on, which drops the whirring of fan noise down to the point where it is actually legal to put it in an office environment. The base price of the chassis with bare bones blades and switches is $25,000. When the machine is fully loaded, the price tag comes to around $80,000 or so. Cray is selling the CX1 boxes online starting today - the first time a Cray machine has been sold online and directly - and expects to have volume shipments revved up by the end of October. From gerry.creager at tamu.edu Wed Sep 17 06:22:31 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed, 17 Sep 2008 08:22:31 -0500 Subject: [Beowulf] MS Cray In-Reply-To: <48D04B13.5060303@ldeo.columbia.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> Message-ID: <48D10497.7080801@tamu.edu> Gus Correa wrote: > Hi Joe and fellow Beowulf fans > > Joe Landman wrote: > >> Gus Correa wrote: >> >>> Otherwise, your "newbie scientist" can put his/her earbuds and pump >>> up the volume on his Ipod, >>> while he/she navigates through the Vista colorful 3D menus. >> >> >> Owie .... I can just imagine the folks squawking about this at SC08 >> "Yes folks, you need a Cray supercomputer to make Vista run at >> acceptable performance ..." >> > :) > >> >> The machine seems to run w2k8. My own experience with w2k8 is that, >> frankly, it doesn't suck. This is the first time I have seen a >> windows release that I can say that about. > >> >> The low end economics probably won't work out for this machine though, >> unless it is N times faster than some other agglomeration of >> Intel-like products. Adding windows will add cost, not performance in >> any noticeable way. > > What if performance is not the main goal? > Here is what the article has to say about it: > > "Microsoft's strategy - one that no supercomputer maker and no X64 chip > maker can ignore - is to attack from the bottom, to find those myriad > new HPC users who never learned Unix, never learned Linux, and have no > desire to." > > There have been several long and heated discussions on this list about > computer literacy and computer education for scientists and science > students. Mostly centered on computer languages, not so much was said > about Unix/Linux proficiency, > bits of shell or scripting language skills, and the rudiments of > Unix/Linux tools > and programming environment. > I don't intend to reopen them. > However, was Microsoft listening to those discussions? > >> >> The question that Cray (and every other vendor building non-commodity >> units) is how much better is this than a small cluster someone can >> build/buy on their own? Better as in faster, able to leap more tall >> buildings in a single bound, ... (Superman TV show reference for those >> not in the know). And the hard part will be justifying the additional >> cost. If the machine isn't 2x the performance, would it be able to >> justify 2x the price? Since it appears to be a somewhat well branded >> cluster, I am not sure that argument will be easy to make. >> >> > > You are right about the economics, at least if we consider hardware alone. > According to the article the full configuration has 64 Xeon 3.4GHZ cores, > equivalent to eight cluster nodes with IB hardware. > The "fully loaded" machine price is $80k, or $10k per node. > > Quoting from the article: > > "A single chassis can house a maximum of 4 TB of disk or -when using the > fastest 3.4 GHz quad-core Xeons Intel has delivered - up to 768 > gigaflops of computing power in a single chassis. (That's eight > two-socket blades using quad-core Xeons, for a total of 64 cores). > Obviously, three of these CX1s linked up yields 2.3 teraflops - a nice > size for a personal super." > > "The base price of the chassis with bare bones blades and switches is > $25,000. When the machine is fully loaded, the price tag comes to around > $80,000 or so. Cray is selling the CX1 boxes online starting today - the > first time a Cray machine has been sold online and directly - and > expects to have volume shipments revved up by the end of October." > > *** > > Here is the link to the CX1 on the Cray web site: > > http://www.cray.com/products/CX1.aspx > > You need MS Explorer to customize/price it. I just knew you had to be wrong, but sure enough, I can't see config options. It's a show stopper for me. If I need IE to buy the system, it's not likely to happen until A) there's an IE that runs natively on *nix, and B) it doesn't have the myriad problems associated with IE in the past. I do admit to a sinking feeling when I noted that the front page (and of course, the subsequent pages) were ASPX... I suspect Microsoft has been listening here. I also suspect this machine will do ok in the business world, but somehow I doubt they're gonna see significant headway in a lot of the scientific arenas. If you aren't computer literate, you're not likely to port a complicated model from *nix to Windows, nor are you likely to write a significant piece of code. I've a geodesist friend who DOES write solely for Windows, but that's a conscious choice by someone who was a talented computer scientist first, and a geodesist later in life. He uses Windows because, well, mainly because the folks he teaches, and writes code for, do. However, he's the exception. The CX1 looks like something I'd love next to my desk -- with Linux on it -- to accomplish testing before I take something to the big iron. It might even allow me to pre- and post-process my data for hurricane WRF runs. It's not hefty enough to let me do those runs in the timeframe I require otherwise. It's a tool, not a solution. gerry -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From james.p.lux at jpl.nasa.gov Wed Sep 17 06:22:58 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 06:22:58 -0700 Subject: [Beowulf] MS Cray In-Reply-To: Message-ID: On 9/16/08 11:49 PM, "Tim Cutts" wrote: On 16 Sep 2008, at 11:07 pm, Lux, James P wrote: > There is a huge psychological advantage to having the computer > physically under your management and control. You don't have folks > trying to "optimize the use of a valuable institutional resource" > with scheduling, etc. You might be willing to tolerate a factor of > 2 hit in performance for the ability to not have to account for > anyone else about how much you're using or not using it. And then they all expect the central systems support group to get it running for them, and to fix it when it breaks, and to generally maintain it. Suddenly you have dozens of completely different systems scattered far and wide across your site, and you're starting to get complaints that the support group are unobtainable these days - they're never at their desk any more, and don't seem to have any time to build new stuff any more. Tim But how is that any different than having a PC on your desk? I see the deskside supercomputer as a revisiting of the "workstation" class computer. Used to be that PCs and Apples were what sat on most peoples desks, but some had Apollo or Sun or Perq workstations, because they had applications that needed the computational horsepower (or, more likely, the high res hardware graphics support.. A CGA was pretty painful for doing PC board layout). Same sort of thing for having the old Tektronix 4014 graphics terminal, rather than hiking down to the computer center to pick up your flatbed plotter output. Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.leidel at gmail.com Wed Sep 17 06:45:24 2008 From: john.leidel at gmail.com (John Leidel) Date: Wed, 17 Sep 2008 08:45:24 -0500 Subject: [Beowulf] MS Cray In-Reply-To: References: Message-ID: <1221659124.4030.250.camel@e521.site> I almost hate to throw this one out there, but does anyone remember the SGI deskside series? Challenge, Origin, Onyx.... These were fairly popular there in the mid to late 90's. We had one at GFDL up until at least a year ago. [I want to score one for my house to play with] On Wed, 2008-09-17 at 06:22 -0700, Lux, James P wrote: > > > > On 9/16/08 11:49 PM, "Tim Cutts" wrote: > > > > On 16 Sep 2008, at 11:07 pm, Lux, James P wrote: > > > There is a huge psychological advantage to having the > computer > > physically under your management and control. You don't > have folks > > trying to "optimize the use of a valuable institutional > resource" > > with scheduling, etc. You might be willing to tolerate a > factor of > > 2 hit in performance for the ability to not have to account > for > > anyone else about how much you're using or not using it. > > And then they all expect the central systems support group to > get it > running for them, and to fix it when it breaks, and to > generally > maintain it. Suddenly you have dozens of completely different > systems > scattered far and wide across your site, and you're starting > to get > complaints that the support group are unobtainable these days > - > they're never at their desk any more, and don't seem to have > any time > to build new stuff any more. > > Tim > > > But how is that any different than having a PC on your desk? > > I see the deskside supercomputer as a revisiting of the > ?workstation? class computer. Used to be that PCs and Apples > were what sat on most peoples desks, but some had Apollo or > Sun or Perq workstations, because they had applications that > needed the computational horsepower (or, more likely, the high > res hardware graphics support.. A CGA was pretty painful for > doing PC board layout). > > Same sort of thing for having the old Tektronix 4014 graphics > terminal, rather than hiking down to the computer center to > pick up your flatbed plotter output. > > Jim > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tjrc at sanger.ac.uk Wed Sep 17 06:51:34 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed, 17 Sep 2008 14:51:34 +0100 Subject: [Beowulf] MS Cray In-Reply-To: References: Message-ID: On 17 Sep 2008, at 2:22 pm, Lux, James P wrote: > But how is that any different than having a PC on your desk? > > I see the deskside supercomputer as a revisiting of the > "workstation" class computer. Used to be that PCs and Apples were > what sat on most peoples desks, but some had Apollo or Sun or Perq > workstations, because they had applications that needed the > computational horsepower (or, more likely, the high res hardware > graphics support.. A CGA was pretty painful for doing PC board > layout). > > Same sort of thing for having the old Tektronix 4014 graphics > terminal, rather than hiking down to the computer center to pick up > your flatbed plotter output. > > Jim We don't generally allow people here to buy their own PCs and Apples either. They get a standard build from us, all centrally managed by LanDESK. They also get a known type of hardware; they can't just buy what the hell they like. I have more than 800 Windows desktops to support. If they were all different and purchased ad-hoc by individual users, I would be in even worse hell than I am already. Most people don't build Beowulf clusters out of ad-hoc piles of machines from God-knows-where. Most of us buy consistent hardware, because it's impossible to support anything else. The Tektronix graphics terminal is slightly different, because it was just that, a terminal, and consequently doesn't present such a headache. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From landman at scalableinformatics.com Wed Sep 17 07:01:54 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 17 Sep 2008 10:01:54 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D10497.7080801@tamu.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> Message-ID: <48D10DD2.1060506@scalableinformatics.com> Gerry Creager wrote: > The CX1 looks like something I'd love next to my desk -- with Linux on > it -- to accomplish testing before I take something to the big iron. It This is something I suspect you will be able to do. The CX1 may support Linux (and it wouldn't surprise me if it had that as an option). > might even allow me to pre- and post-process my data for hurricane WRF > runs. It's not hefty enough to let me do those runs in the timeframe I > require otherwise. Heh... We like the under-desktop experience, with lots of fast disk and big pipes to the disk. Honestly, this looks like the direction for most of "smaller" HPC that can run locally under your own control. The big iron/heavy metal for the large (non-prototype) jobs. > > It's a tool, not a solution. Yup. Lots of folks get lost in this, thinking that a solution == the thing they market. Its not. It is just one aspect of things. A product is a tool. A solution is so much more than that (and usually starts with a statement of a problem ... otherwise it is a solution searching for a problem). Joe > > gerry > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From jmdavis1 at vcu.edu Wed Sep 17 07:06:08 2008 From: jmdavis1 at vcu.edu (Mike Davis) Date: Wed, 17 Sep 2008 10:06:08 -0400 Subject: [Beowulf] MS Cray In-Reply-To: References: Message-ID: <48D10ED0.5060005@vcu.edu> Lux, James P wrote: > > > > > But how is that any different than having a PC on your desk? > > I see the deskside supercomputer as a revisiting of the > ?workstation? class computer. Used to be that PCs and Apples were > what sat on most peoples desks, but some had Apollo or Sun or Perq > workstations, because they had applications that needed the > computational horsepower (or, more likely, the high res hardware > graphics support.. A CGA was pretty painful for doing PC board > layout). > > Same sort of thing for having the old Tektronix 4014 graphics > terminal, rather than hiking down to the computer center to pick > up your flatbed plotter output. > > Jim > Jim, One big difference is that this machine will be sold to department chairs and Deans not as a desktop or high end workstation but as a cheap "Supercomputer" that needs no support. The PC support available in an organization may be completely unable to deal with the realities of HPC. That's when I get an urgent call about a machine that I don't know about. The clock started ticking the day that the machine arrived and that's the impossible timetable to which I will be held. In other words, even with my absolute full attention my efforts will be presented as failing to set up the machine in a timely manner. In addition all of the other researchers who have invested in the centralized resources will complain that they are not getting the attention that they need. I think that there are times that machines such as this on a departmental or even researcher level make sense even in an organization that provides central resources. But those times are the exceptions. I have 2.5 System Admins. I have ~300 machines in two different locations as standalone servers and parts of clusters. We can get by with this level of staffing through standardization of hardware and operating systems (currently 90% linux, 9% Solaris, 1% IRIX), security standards that lock down unused ports and services, and careful testing of software (physical sciences, math, OR, bioinformatics) before it is made generally available. With budget cuts looming on the horizon, adding support for new department level systems without additional staffing would leave us unable to continue to provide adequate support for the central systems. IMHO. YMMV. -- Mike Davis Technical Director (804) 828-3885 Center for High Performance Computing jmdavis1 at vcu.edu Virginia Commonwealth University "Never tell people how to do things. Tell them what to do and they will surprise you with their ingenuity." George S. Patton From kyron at neuralbs.com Wed Sep 17 07:13:25 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 10:13:25 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D03598.2050407@scalableinformatics.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> Message-ID: <48D11085.8050702@neuralbs.com> Joe Landman wrote: > Gus Correa wrote: > >> Otherwise, your "newbie scientist" can put his/her earbuds and pump >> up the volume on his Ipod, >> while he/she navigates through the Vista colorful 3D menus. > > Owie .... I can just imagine the folks squawking about this at SC08 > "Yes folks, you need a Cray supercomputer to make Vista run at > acceptable performance ..." Maybe they have a "tune options for performance" option ;) > > The machine seems to run w2k8. My own experience with w2k8 is that, > frankly, it doesn't suck. This is the first time I have seen a > windows release that I can say that about. A few questions (not necessarily expecting a response): POSIX? VERBS? Kernel latency and scheduler control? These are the real barriers IMHO, without minimally supporting POSIX (threads), there is very little incentive to use the machine for development unless you're willing to accept the code will _only_ run on your "desktop". > > The low end economics probably won't work out for this machine though, > unless it is N times faster than some other agglomeration of > Intel-like products. Adding windows will add cost, not performance in > any noticeable way. > > The question that Cray (and every other vendor building non-commodity > units) is how much better is this than a small cluster someone can > build/buy on their own? Better as in faster, able to leap more tall > buildings in a single bound, ... (Superman TV show reference for those > not in the know). And the hard part will be justifying the additional > cost. If the machine isn't 2x the performance, would it be able to > justify 2x the price? Since it appears to be a somewhat well branded > cluster, I am not sure that argument will be easy to make. I just rebuilt a 32 core cluster for ~5k$ (CAD) (8*Q6600 1Gig RAM/node + gige netwroking). Bang for the buck? I can't wait to see the CX1's performance specs under _both_ windows and Linux. Eric From john.leidel at gmail.com Wed Sep 17 07:21:43 2008 From: john.leidel at gmail.com (John Leidel) Date: Wed, 17 Sep 2008 09:21:43 -0500 Subject: [Beowulf] MS Cray In-Reply-To: <48D10DD2.1060506@scalableinformatics.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> <48D10DD2.1060506@scalableinformatics.com> Message-ID: <1221661303.4030.253.camel@e521.site> On Wed, 2008-09-17 at 10:01 -0400, Joe Landman wrote: > Gerry Creager wrote: > > > The CX1 looks like something I'd love next to my desk -- with Linux on > > it -- to accomplish testing before I take something to the big iron. It > > This is something I suspect you will be able to do. The CX1 may support > Linux (and it wouldn't surprise me if it had that as an option). Indeed... it supports RedHat. Oddly enough, no mention of SLES. Cray has been running SLES on their XT login/service nodes for quite some time. I'm curious why they changed horses. > > > might even allow me to pre- and post-process my data for hurricane WRF > > runs. It's not hefty enough to let me do those runs in the timeframe I > > require otherwise. > > Heh... We like the under-desktop experience, with lots of fast disk and > big pipes to the disk. Honestly, this looks like the direction for most > of "smaller" HPC that can run locally under your own control. The big > iron/heavy metal for the large (non-prototype) jobs. > > > > > It's a tool, not a solution. > > Yup. Lots of folks get lost in this, thinking that a solution == the > thing they market. Its not. It is just one aspect of things. A > product is a tool. A solution is so much more than that (and usually > starts with a statement of a problem ... otherwise it is a solution > searching for a problem). > > Joe > > > > > gerry > > > > From svdavidson at charter.net Mon Sep 15 15:12:40 2008 From: svdavidson at charter.net (Shannon V. Davidson) Date: Mon, 15 Sep 2008 17:12:40 -0500 Subject: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing In-Reply-To: References: Message-ID: <48CEDDD8.7080206@charter.net> Rahul Nabar wrote: > I was experimenting with using channel bonding my twin eth ports to > get a combined bandwidth of (close to) 2 Gbps. The two relevant modes > were 4 (802.3ad) and 6 (alb=Adaptive Load Balancing). I was trying to > compare performance for both. > > Before running any sophisticated tests by netperf etc. I just tried to > copy a large file via scp and timed the two file-copies. > > Option1: > from node1 to node2. Both nodes have their twin ports bonded together > as bond0 with mode=4 (802.3ad). > > They are connected via a Dell PowerConnect 6248 switch. Configured the > switch so that I have two LAG groups combining the two ports coming > from the same node. LACP was turned on. > > Option2: > from node3 to node4. Use mode=6 (alb=Adaptive Load Balancing) No > special switch config. No LAG. No LACP. > > Result: For a 4GB file-transfer. Both modes took the same time; approx > 1min26 sec. > > These results are very mystifying to me. I was expecting mode4 > (802.3ad ) to be almost twice as fast since it is the only mode which > truly aggregates the twin channels. It ought to be the only one > effective for a peer-to-peer communication (mode 6 would only help > while talking with more than one peer) > > Any comments? Also the net file transfer speed seems way lower than > what I'd expect from a close to 2 Gbps connect; even accounting for > the protocol overheads. Do other people have some numbers for me from > their systems? > 4GB / 86 seconds = 47 MB/sec, which is about the speed of a single disk. You should go ahead and try netperf. Shannon > -- > Rahul > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From dvadell at linuxclusters.com.ar Mon Sep 15 18:42:36 2008 From: dvadell at linuxclusters.com.ar (Diego M. Vadell) Date: Mon, 15 Sep 2008 22:42:36 -0300 Subject: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing In-Reply-To: References: Message-ID: <200809152242.37405.dvadell@linuxclusters.com.ar> On Monday 08 September 2008 21:30:03 Rahul Nabar wrote: > I was experimenting with using channel bonding my twin eth ports to > get a combined bandwidth of (close to) 2 Gbps. The two relevant modes > were 4 (802.3ad) and 6 (alb=Adaptive Load Balancing). I was trying to > compare performance for both. > > Before running any sophisticated tests by netperf etc. I just tried to > copy a large file via scp and timed the two file-copies. > Hi, A month ago I spent an awful lot of time trying to debug something like that: I was trying to tune two linux boxes to make the perform better with a satellite link (700ms delay). turned out that ssh is not the right tool to test the network. From http://www.psc.edu/networking/projects/hpn-ssh/ "SCP and the underlying SSH2 protocol implementation in OpenSSH is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a bottleneck for network throughput of SCP, especially on long and high bandwith network links. Modifying the ssh code to allow the buffers to be defined at run time eliminates this bottleneck. We have created a patch ..." I started testing with wget in one side and an apache in the other, and everything went back to what I expected. so my advice is: test the network with something else, but not ssh nor scp. HIH, -- Diego. From forum.san at gmail.com Sun Sep 14 05:07:22 2008 From: forum.san at gmail.com (Sangamesh B) Date: Sun, 14 Sep 2008 17:37:22 +0530 Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 In-Reply-To: <48C56BFF.6000304@aei.mpg.de> References: <1220638410.4385019ctimchipman@myrealbox.com> <48C56BFF.6000304@aei.mpg.de> Message-ID: Hi Tim, Recently I benchmarked a Fortran based Scientific application on both Intel Xeon Quad core and AMD Opteron Quad core with RHEL 5. Following are the results: AMD-2.3GHz INTEL-2.33GHz 1. Serial 147.719 sec 73.952 sec 2. Parallel 4 core 39.798 sec 32.317 sec 3. Parallel 8 core 26.880 sec 30.371 sec For AMD, I used GCC compiler(gfortran), which is released by AMD for its barcelona-AMD fam10 based processors. For Intel, Intel 10 compilers. Thank you, Sangamesh Conultant-HPC On Mon, Sep 8, 2008 at 11:46 PM, Carsten Aulbert wrote: > Hi Tim, > > Tim Chipman wrote: > > Does anyone have any 'real world' experience with 'both' of these CPUs, > in terms of relative performance for 'whatever work you do' ? > > > > I realize the xeon is a faster mhz part (3.16ghz xeon vs 2.3ghz opteron) > so I'm more concerned with "relative performance per mhz" > > > > I'm involved with a cluster project, and we have 2 options from our > vendor, > > > > - 25 compute nodes, dual-quadcore intel 5460, or > > - 32 compute nodes, dual-quadcore amd 2356 > > > >>From what I've been able to glean (Spec.org / SpecCPU 2006), > > > > - the intel chips have better integer performance > > - the amd chips have better FPU performance > > > > so the likely anticipated real-world performance result .. will depend on > how a given application blends / balances things. > > We have "only" the Quad-Xeon boxes with E5435 and these are quite fast, > indeed it seems that FFTW seems to run faster on Xeons but I have not > made any benchmarks for the past ~ 9 months, so I don't know about the > latest Opterons. > > I think you need to come up with a real world scenario of what will be > run on the cluster and maybe compile a little benchmark yourself ans aks > the vendor to run both (or get hands on on both boxes. I think that's > the only "fair" comparison that's possible. > > HTH > > Carsten > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thpierce at gmail.com Tue Sep 16 10:20:06 2008 From: thpierce at gmail.com (Tom Pierce) Date: Tue, 16 Sep 2008 13:20:06 -0400 Subject: [Beowulf] 10gig CX4 switches In-Reply-To: <9f8092cc0809160703k707f0b26uaa80bf0f34921f4a@mail.gmail.com> References: <20080916002004.GB1539@bx9.net> <9f8092cc0809160703k707f0b26uaa80bf0f34921f4a@mail.gmail.com> Message-ID: <25e9e5ad0809161020n3579494bo7e1d62ca0ab623ce@mail.gmail.com> I have Cat6 cabling on the cluster. Would any of the 10 Gb switches use Cat6 cable, or is 10Gb still locked into Cx4/fiber? Tom On Tue, Sep 16, 2008 at 10:03 AM, John Hearns wrote: > > > 2008/9/16 Greg Lindahl > >> I have a bunch of 1gig switches with CX4 10gig uplinks (and empty X2 >> ports) and it's time to buy a 10gig switch. Has anyone done a recent >> survey of the market? I don't need any layer-3 features, just layer-2. >> >> Have a look at the Quadrics switches. > The smaller TG201 switch would fit the bill nicely for you - you get get it > in two variants, one with 24x copper ports and one with 12x copper ports and > 12x empty ports for GBICs. > They're pretty cost effective, > > John Hearns > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- ----------------------- Thanks Tom -------------- next part -------------- An HTML attachment was scrubbed... URL: From lynesh at cardiff.ac.uk Wed Sep 17 06:50:53 2008 From: lynesh at cardiff.ac.uk (Huw Lynes) Date: Wed, 17 Sep 2008 14:50:53 +0100 Subject: [Beowulf] MS Cray In-Reply-To: <1221659124.4030.250.camel@e521.site> References: <1221659124.4030.250.camel@e521.site> Message-ID: <1221659453.2833.5.camel@w1199.insrv.cf.ac.uk> On Wed, 2008-09-17 at 08:45 -0500, John Leidel wrote: > I almost hate to throw this one out there, but does anyone remember the > SGI deskside series? Challenge, Origin, Onyx.... > Having experience of all three, I suggest that it's a bit of a stretch to refer to any of those as "deskside". I'm just old enough to have had to deal with a Challenge in production. I still occasionally have to go and prod the old Origins and Onyx in Computer Science. Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From tom.elken at qlogic.com Wed Sep 17 08:39:10 2008 From: tom.elken at qlogic.com (Tom Elken) Date: Wed, 17 Sep 2008 08:39:10 -0700 Subject: [Beowulf] MS Cray In-Reply-To: <1221659453.2833.5.camel@w1199.insrv.cf.ac.uk> References: <1221659124.4030.250.camel@e521.site> <1221659453.2833.5.camel@w1199.insrv.cf.ac.uk> Message-ID: <6DB5B58A8E5AB846A7B3B3BFF1B4315A025966F8@AVEXCH1.qlogic.org> > [mailto:beowulf-bounces at beowulf.org] On Behalf Of Huw Lynes > On Wed, 2008-09-17 at 08:45 -0500, John Leidel wrote: > > I almost hate to throw this one out there, but does anyone > remember the > > SGI deskside series? Challenge, Origin, Onyx.... > > > > Having experience of all three, I suggest that it's a bit of a stretch > to refer to any of those as "deskside". "Challenge" covered a lot of system types. Some were proper desksides, but not proper Challenges: From Wikipedia: "Other systems from Silicon Graphics that used the "Challenge" brand were the Challenge M and the Challenge S. These systems were repackaged Silicon Graphics Indigo2 and Indy workstations that were not configured with the graphics hardware that made them useful as workstations. These systems were Challenges in name only and have no architectural similarity with the multiprocessing Challenges, although they had cases with the same blue hue as proper Challenges." -Tom From james.p.lux at jpl.nasa.gov Wed Sep 17 08:51:35 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 08:51:35 -0700 Subject: [Beowulf] MS Cray In-Reply-To: References:

Message-ID: -----Original Message----- From: Tim Cutts [mailto:tjrc at sanger.ac.uk] Sent: Wednesday, September 17, 2008 6:52 AM To: Lux, James P Cc: Prentice Bisbal; Beowulf Subject: Re: [Beowulf] MS Cray On 17 Sep 2008, at 2:22 pm, Lux, James P wrote: > But how is that any different than having a PC on your desk? > > I see the deskside supercomputer as a revisiting of the "workstation" > class computer. Used to be that PCs and Apples were what sat on most > peoples desks, but some had Apollo or Sun or Perq workstations, > because they had applications that needed the computational horsepower > (or, more likely, the high res hardware graphics support.. A CGA was > pretty painful for doing PC board layout). > > Same sort of thing for having the old Tektronix 4014 graphics > terminal, rather than hiking down to the computer center to pick up > your flatbed plotter output. > > Jim We don't generally allow people here to buy their own PCs and Apples either. They get a standard build from us, all centrally managed by LanDESK. They also get a known type of hardware; they can't just buy what the hell they like. I have more than 800 Windows desktops to support. If they were all different and purchased ad-hoc by individual users, I would be in even worse hell than I am already. Most people don't build Beowulf clusters out of ad-hoc piles of machines from God-knows-where. Most of us buy consistent hardware, because it's impossible to support anything else. The Tektronix graphics terminal is slightly different, because it was just that, a terminal, and consequently doesn't present such a headache. Tim ---- Indeed, and such is the case in most large organizations. Two that I have direct familiarity with have slightly different models. One, in a Fortune 500 company, had, at any time, only 3 possible hardware configurations for the desktop (with literally 10s of thousands deployed), with the actual image rolled out every day. Essentially, the disk drive in the box served as a local cache. There were other configurations for software developers, but still, pretty much locked down. The server farms are run separately, by a centralized organization, as is the mainframe. A small "departmental server" (e.g. for a software development group to use for testing) would be in a server room somewhere, managed by the central org. The other, here at JPL, has about 10,000 or so computers of various ages and configurations that are managed collectively (as opposed to those being Sysadmined locally,e.g. in a lab). At any given time, there's a dozen or so kinds of computers (desktop/laptop/PC/Mac) available, but since the configurations are changing, and they have a 3 year recycle time, there's probably 30 or 40 configurations in the field at a given time. The software configuration is substantially more consistent, in that there's a basic "core software" load of OS, tools (Office, Mail, Calendaring), but people, in general, have admin access to their own machine, and are free to install anything else (as long as it's legal). OTOH, if something you add causes problems, they're not on the hook to support it, and ultimately, their response might be to reimage the disk. They ARE pushing towards a thin client model, at least for non-specialized desktop users (e.g. if all you do is email, calendaring, documents, and web service consuming). Interestingly, the monthly cost for both organizations is about the same ( a few hundred bucks a month for hardware lease+service). We also have "servers for rent" (with SA and 24/7 monitoring done by others), as well as various and sundry supercomputers. A deskside supercomputer would fit in the model here fairly well, as just another flavor of either high performance desktop machine, or as a small server in your lab. Jim From landman at scalableinformatics.com Wed Sep 17 08:54:35 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 17 Sep 2008 11:54:35 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D11085.8050702@neuralbs.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D11085.8050702@neuralbs.com> Message-ID: <48D1283B.6070705@scalableinformatics.com> Eric Thibodeau wrote: > Joe Landman wrote: >> Gus Correa wrote: >> >>> Otherwise, your "newbie scientist" can put his/her earbuds and pump >>> up the volume on his Ipod, >>> while he/she navigates through the Vista colorful 3D menus. >> >> Owie .... I can just imagine the folks squawking about this at SC08 >> "Yes folks, you need a Cray supercomputer to make Vista run at >> acceptable performance ..." > Maybe they have a "tune options for performance" option ;) >> >> The machine seems to run w2k8. My own experience with w2k8 is that, >> frankly, it doesn't suck. This is the first time I have seen a >> windows release that I can say that about. > A few questions (not necessarily expecting a response): > > POSIX? > VERBS? > Kernel latency and scheduler control? Don't mistake me for a w2k8 apologist. I reamed them pretty hard on the lack of a real posix infrastructure (they claim SUA, but frankly it doesn't build most of what we throw at it, so it really is a non-starter and not worth considering IMO). They need to pull Cygwin in close and tight to get a good POSIX infrastructure. It is in their best interests. Sadly, I suspect the ego driven nature of this will pretty much prevent them from doing this. Can't touch the "toxic" OSS now, can they ... IB Verbs? Well through OFED, yes. Through the windows stack? Who knows. We were playing with it on JackRabbit for a customer test/benchmark. Kernel latency? Much better/more responsive than w2k3. Scheduler control? Not sure how much you have. I don't like deep diving into registries ... that is a pretty sure way to kill a windows machine. > > These are the real barriers IMHO, without minimally supporting POSIX > (threads), there is very little incentive to use the machine for > development unless you're willing to accept the code will _only_ run on > your "desktop". >> >> The low end economics probably won't work out for this machine though, >> unless it is N times faster than some other agglomeration of >> Intel-like products. Adding windows will add cost, not performance in >> any noticeable way. >> >> The question that Cray (and every other vendor building non-commodity >> units) is how much better is this than a small cluster someone can >> build/buy on their own? Better as in faster, able to leap more tall >> buildings in a single bound, ... (Superman TV show reference for those >> not in the know). And the hard part will be justifying the additional >> cost. If the machine isn't 2x the performance, would it be able to >> justify 2x the price? Since it appears to be a somewhat well branded >> cluster, I am not sure that argument will be easy to make. > I just rebuilt a 32 core cluster for ~5k$ (CAD) (8*Q6600 1Gig RAM/node + > gige netwroking). Bang for the buck? I can't wait to see the CX1's > performance specs under _both_ windows and Linux. The desktop CPUs/MBs will get you best bang per buck, as long as you don't mind no ECC, and 8GB ram limits per node. For your applications, this might be fine. For others, with large memory footprint and long run times, I see people need/require ECC (as memory density increases, ECC becomes important .... darned cosmic rays/natural decays/noisy power supplies/...) > > Eric -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From james.p.lux at jpl.nasa.gov Wed Sep 17 09:03:43 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 09:03:43 -0700 Subject: [Beowulf] MS Cray In-Reply-To: <48D10ED0.5060005@vcu.edu> References: <48D10ED0.5060005@vcu.edu> Message-ID: Lux, James P wrote: > > > > > But how is that any different than having a PC on your desk? > > I see the deskside supercomputer as a revisiting of the > "workstation" class computer. Used to be that PCs and Apples were > what sat on most peoples desks, but some had Apollo or Sun or Perq > workstations, because they had applications that needed the > computational horsepower (or, more likely, the high res hardware > graphics support.. A CGA was pretty painful for doing PC board > layout). > > Same sort of thing for having the old Tektronix 4014 graphics > terminal, rather than hiking down to the computer center to pick > up your flatbed plotter output. > > Jim > Jim, One big difference is that this machine will be sold to department chairs and Deans not as a desktop or high end workstation but as a cheap "Supercomputer" that needs no support. The PC support available in an organization may be completely unable to deal with the realities of HPC. --> then the seller is lying about no support required, and the buyer deserves what they get. The real question is whether the box is more like a "super desktop PC" or a "supercomputer".. The former, by design, should have a fairly low admin overhead (i.e. the hardware configuration is fixed and stable, the OS ditto, relatively few applications on it).. The latter has a high SA overhead, because, by it's nature, it's used by a heterogenous group of folks running heterogenous applications which were all developed to stretch the limits. It's whether the new deskside box exhibits "the realities of HPC", or it's just a faster computer like the other ones. That's when I get an urgent call about a machine that I don't know about. The clock started ticking the day that the machine arrived and that's the impossible timetable to which I will be held. In other words, even with my absolute full attention my efforts will be presented as failing to set up the machine in a timely manner. In addition all of the other researchers who have invested in the centralized resources will complain that they are not getting the attention that they need. --> That's more of an education and management of expectations. I think that there are times that machines such as this on a departmental or even researcher level make sense even in an organization that provides central resources. But those times are the exceptions. I have 2.5 System Admins. I have ~300 machines in two different locations as standalone servers and parts of clusters. ---> then you're grossly underfunded and/or your institution is getting a great deal because you and your staff is making a herculean effort. Typical support costs in industry run about $200-300/month per desktop (and that's for fairly vanilla installations).. 300*300 = $90K/mo = $1080K/yr -> 4-6 people. We can get by with this level of staffing through standardization of hardware and operating systems (currently 90% linux, 9% Solaris, 1% IRIX), security standards that lock down unused ports and services, and careful testing of software (physical sciences, math, OR, bioinformatics) before it is made generally available. With budget cuts looming on the horizon, adding support for new department level systems without additional staffing would leave us unable to continue to provide adequate support for the central systems. IMHO. YMMV. -->> exactly.. Your operation is on the ragged edge of resources, so your organization really can't tolerate dropping in a new and different sort of box, at least within the desktop PC support model. But for an organization that already has, say, 10K machines, and the staff corresponding to the $30-40M/yr budget (e.g. a hundred people), adding a new flavor of box isn't as disruptive. One of the horde can be detailed off to become the "new widget expert". --> So perhaps your institution isn't really the appropriate target market (yet).. I don't see this as particularly different than any other new technology introduction. When mainframes first entered the halls of academe, I'm sure the same sort of discussions arose. Heck, it's why computers like the PDP-8 were invented. Jim From jlb17 at duke.edu Wed Sep 17 09:24:16 2008 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Wed, 17 Sep 2008 12:24:16 -0400 (EDT) Subject: [Beowulf] MS Cray In-Reply-To: <48D10497.7080801@tamu.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> Message-ID: On Wed, 17 Sep 2008 at 8:22am, Gerry Creager wrote > Gus Correa wrote: >> >> Here is the link to the CX1 on the Cray web site: >> >> http://www.cray.com/products/CX1.aspx >> >> You need MS Explorer to customize/price it. > > I just knew you had to be wrong, but sure enough, I can't see config options. > It's a show stopper for me. If I need IE to buy the system, it's not likely > to happen until A) there's an IE that runs natively on *nix, and B) it > doesn't have the myriad problems associated with IE in the past. Hrm -- WORKSFORME using FF3 on Fedora 9. Notably, Cray is happy to sell the CX1 to you sans OS. W2K8 is a $3752 option (at least in my sample config with 8 compute blades). As was noted, the Linux they suggest is "Red Hat Linux", by which one would assume they mean RHEL (why does everybody make that mistake -- there hasn't been a "Red Hat Linux" since RH9 (shudder)). Also, as one would expect, the hardware premium is hefty. A compute blade with dual Xeon E5462s, 16GB RAM (8x2GB), and an 80GB HDD is $6656. Without even trying too hard I can get a similarly configured 1U node for $4400. So that's a 50% markup on nodes, not to mention the almost $9K for the chassis. Of course, that's before any discounts. Maybe it's my miserly ways, but I just don't see the value proposition here... -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From gus at ldeo.columbia.edu Wed Sep 17 09:32:33 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed, 17 Sep 2008 12:32:33 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <1221661303.4030.253.camel@e521.site> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> <48D10DD2.1060506@scalableinformatics.com> <1221661303.4030.253.camel@e521.site> Message-ID: <48D13121.5030509@ldeo.columbia.edu> Dear Beowulf fans Since I posted the Cray CX1 announcement, just to be fair to other players, here are some of them: 1) SiCortex has a Linux and MIPS (72 processors) based "deskside supercomputer". They claim it to work with 300W of power. Of course, being Linux, it requires Linux literacy to use. See: http://sicortex.com/products/sc072_pds/sc072_pds_datasheet 2) NVidia is advertising its Tesla series, although this GPU-based deskside system will probably work as a "deskside co-processor" rather than as a "deskside supercomputer". Literacy in CUDA, not only in C and Linux, is probably required to use it effectively. GPU experts, please correct me if I am wrong. See: http://www.nvidia.com/object/tesla_8_series.html There may be more "deskside supercomputers" out there, and I apologize to anyone that may have been omitted. NEC had something called SX-8i, I think, not very long ago. If the Cray CX1 idea sticks and the machines sell, it is likely that other companies will launch similar product lines. *** Deskside supercomputers, and even bigger ones, are starting to be marketed as "plug-and-play", as something that requires little, if any, system administration and maintenance (proprietary hardware and maintenance fees are rarely mentioned), and not much computer literacy to be used. Other postings to this thread already pointed this out. All they need is an available power outlet on your office wall ( ... how about an Ethernet port?), and similar marketing arguments. They are marketed in contrast to clusters, which are pictured as complicated beasts, hard and expensive to maintain, requiring dedicated IT personnel, sucking more power, and leading to higher TCO. The logic presented to decision makers would be that, besides being user friendly, what you pay upfront for these machines you recover quickly in IT salaries and utility bills. Gus Correa John Leidel wrote: >On Wed, 2008-09-17 at 10:01 -0400, Joe Landman wrote: > > >>Gerry Creager wrote: >> >> >> >>>The CX1 looks like something I'd love next to my desk -- with Linux on >>>it -- to accomplish testing before I take something to the big iron. It >>> >>> >>This is something I suspect you will be able to do. The CX1 may support >>Linux (and it wouldn't surprise me if it had that as an option). >> >> > >Indeed... it supports RedHat. Oddly enough, no mention of SLES. Cray >has been running SLES on their XT login/service nodes for quite some >time. I'm curious why they changed horses. > > > >>>might even allow me to pre- and post-process my data for hurricane WRF >>>runs. It's not hefty enough to let me do those runs in the timeframe I >>>require otherwise. >>> >>> >>Heh... We like the under-desktop experience, with lots of fast disk and >>big pipes to the disk. Honestly, this looks like the direction for most >>of "smaller" HPC that can run locally under your own control. The big >>iron/heavy metal for the large (non-prototype) jobs. >> >> >> >>>It's a tool, not a solution. >>> >>> >>Yup. Lots of folks get lost in this, thinking that a solution == the >>thing they market. Its not. It is just one aspect of things. A >>product is a tool. A solution is so much more than that (and usually >>starts with a statement of a problem ... otherwise it is a solution >>searching for a problem). >> >>Joe >> >> >> >>>gerry >>> >>> >>> >> >> > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From hearnsj at googlemail.com Wed Sep 17 09:39:52 2008 From: hearnsj at googlemail.com (John Hearns) Date: Wed, 17 Sep 2008 17:39:52 +0100 Subject: [Beowulf] MS Cray In-Reply-To: References: <48D10ED0.5060005@vcu.edu> Message-ID: <9f8092cc0809170939g3eecfe43h398c761c6493e37@mail.gmail.com> 2008/9/17 Lux, James P > > . When mainframes first entered the halls of academe, I'm sure the same > sort of discussions arose. Heck, it's why computers like the PDP-8 were > invented. > > Jim > Just let me correct you there. Surely PDP-8s were calculators or Data Processing whatchamacallits, and emphatically NOT Computer Systems. (A history lesson is called for here - I cannot remember the exact terminology which allowed PDPs to be sold to individual labs and departments) -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Wed Sep 17 09:58:27 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 17 Sep 2008 12:58:27 -0400 Subject: [Beowulf] MS Cray In-Reply-To: References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> Message-ID: <48D13733.4000504@scalableinformatics.com> Joshua Baker-LePain wrote: > Also, as one would expect, the hardware premium is hefty. A compute > blade with dual Xeon E5462s, 16GB RAM (8x2GB), and an 80GB HDD is $6656. > Without even trying too hard I can get a similarly configured 1U node > for $4400. So that's a 50% markup on nodes, not to mention the almost > $9K for the chassis. Of course, that's before any discounts. > > Maybe it's my miserly ways, but I just don't see the value proposition > here... What about a bundled bumper sticker saying something like "my other computer is a Cray" ... Thats gotta be worth something... right? Way way back I remember internal discussions in SGI (or SGI/Cray if you prefer) that went something like "you can't paint a box purple and charge 3x the price". Some thought you could. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From arnoldg at ncsa.uiuc.edu Wed Sep 17 07:59:44 2008 From: arnoldg at ncsa.uiuc.edu (Galen Arnold) Date: Wed, 17 Sep 2008 09:59:44 -0500 (CDT) Subject: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing In-Reply-To: <831041969.340051221663316152.JavaMail.root@zimbra-1.ncsa.uiuc.edu> Message-ID: <1660070812.340291221663584526.JavaMail.root@zimbra-1.ncsa.uiuc.edu> Here's another approach at saturating a network link using a client that works on a variety of os platforms: FileZilla Site Manager -> Transfer settings -> Maximum number of connections [10] It helps if you have a number of not-small [> 10MB] files laying around in a test directory. -G Galen Arnold system engineer NCSA ----- Original Message ----- From: "Diego M. Vadell" To: beowulf at beowulf.org Cc: "Rahul Nabar" Sent: Monday, September 15, 2008 8:42:36 PM GMT -06:00 US/Canada Central Subject: Re: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing On Monday 08 September 2008 21:30:03 Rahul Nabar wrote: > I was experimenting with using channel bonding my twin eth ports to > get a combined bandwidth of (close to) 2 Gbps. The two relevant modes > were 4 (802.3ad) and 6 (alb=Adaptive Load Balancing). I was trying to > compare performance for both. > > Before running any sophisticated tests by netperf etc. I just tried to > copy a large file via scp and timed the two file-copies. > Hi, A month ago I spent an awful lot of time trying to debug something like that: I was trying to tune two linux boxes to make the perform better with a satellite link (700ms delay). turned out that ssh is not the right tool to test the network. From http://www.psc.edu/networking/projects/hpn-ssh/ "SCP and the underlying SSH2 protocol implementation in OpenSSH is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a bottleneck for network throughput of SCP, especially on long and high bandwith network links. Modifying the ssh code to allow the buffers to be defined at run time eliminates this bottleneck. We have created a patch ..." I started testing with wget in one side and an apache in the other, and everything went back to what I expected. so my advice is: test the network with something else, but not ssh nor scp. HIH, -- Diego. _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From landman at scalableinformatics.com Wed Sep 17 10:06:19 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 17 Sep 2008 13:06:19 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D13121.5030509@ldeo.columbia.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> <48D10DD2.1060506@scalableinformatics.com> <1221661303.4030.253.camel@e521.site> <48D13121.5030509@ldeo.columbia.edu> Message-ID: <48D1390B.2040008@scalableinformatics.com> Gus Correa wrote: > Dear Beowulf fans > > Since I posted the Cray CX1 announcement, > just to be fair to other players, here are some of them: > > 1) SiCortex has a Linux and MIPS (72 processors) > based "deskside supercomputer". > They claim it to work with 300W of power. > Of course, being Linux, it requires Linux literacy to use. > See: > > http://sicortex.com/products/sc072_pds/sc072_pds_datasheet Contrary to various public statements from some vendors, we have seen linux literacy on the rise, and have seen a number of organizations switch all their staff to it. It turns out for them to be easier/lower cost to deploy and support. I am not at liberty to say who though. > 2) NVidia is advertising its Tesla series, > although this GPU-based deskside system will probably work > as a "deskside co-processor" rather than as a "deskside supercomputer". > Literacy in CUDA, not only in C and Linux, is probably required to use > it effectively. > GPU experts, please correct me if I am wrong. > See: > > http://www.nvidia.com/object/tesla_8_series.html CUDA is needed to program it. You can use CUDA enabled software without knowing how to program in CUDA. > > There may be more "deskside supercomputers" out there, > and I apologize to anyone that may have been omitted. > NEC had something called SX-8i, I think, not very long ago. > If the Cray CX1 idea sticks and the machines sell, it is likely that other > companies will launch similar product lines. The personal super has been around a while as an idea. Some versions of it sorta kinda work, but not as a market/product. > > *** > > Deskside supercomputers, and even bigger ones, > are starting to be marketed as "plug-and-play", as something that > requires little, if any, > system administration and maintenance > (proprietary hardware and maintenance fees are rarely mentioned), > and not much computer literacy to be used. The appliance model. Great if you can get it to work in the market, but many codes are quite different, so you get the choice between a 1-off accelerator (won't sell), or the very general purpose black box (which often won't run the ISV codes without modification of the basic black box). > Other postings to this thread already pointed this out. > All they need is an available power outlet on your office wall ( ... how > about an Ethernet port?), > and similar marketing arguments. > > They are marketed in contrast to clusters, > which are pictured as complicated beasts, hard and expensive to maintain, > requiring dedicated IT personnel, sucking more power, and leading to > higher TCO. They really aren't as terrible as this. We (and a number of others) make clusters that are pretty much plug-n-play. You configure/order it, and it gets delivered, and works after assembly (connection to wall, network, cooling). Some require more remote assembly than others, some you can just ship a bunch-o-boxes. > The logic presented to decision makers would be that, besides being user > friendly, > what you pay upfront for these machines you recover quickly in IT > salaries and utility bills. Hmmm.... some of these arguments are valid, some are not. > > Gus Correa > > John Leidel wrote: > >> On Wed, 2008-09-17 at 10:01 -0400, Joe Landman wrote: >> >> >>> Gerry Creager wrote: >>> >>> >>>> The CX1 looks like something I'd love next to my desk -- with Linux >>>> on it -- to accomplish testing before I take something to the big >>>> iron. It >>> This is something I suspect you will be able to do. The CX1 may >>> support Linux (and it wouldn't surprise me if it had that as an option). >>> >> >> Indeed... it supports RedHat. Oddly enough, no mention of SLES. Cray >> has been running SLES on their XT login/service nodes for quite some >> time. I'm curious why they changed horses. >> >> >>>> might even allow me to pre- and post-process my data for hurricane >>>> WRF runs. It's not hefty enough to let me do those runs in the >>>> timeframe I require otherwise. >>>> >>> Heh... We like the under-desktop experience, with lots of fast disk >>> and big pipes to the disk. Honestly, this looks like the direction >>> for most of "smaller" HPC that can run locally under your own >>> control. The big iron/heavy metal for the large (non-prototype) jobs. >>> >>> >>>> It's a tool, not a solution. >>>> >>> Yup. Lots of folks get lost in this, thinking that a solution == the >>> thing they market. Its not. It is just one aspect of things. A >>> product is a tool. A solution is so much more than that (and usually >>> starts with a statement of a problem ... otherwise it is a solution >>> searching for a problem). >>> >>> Joe >>> >>> >>>> gerry >>>> >>>> >>> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From james.p.lux at jpl.nasa.gov Wed Sep 17 10:18:54 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 10:18:54 -0700 Subject: [Beowulf] MS Cray In-Reply-To: <9f8092cc0809170939g3eecfe43h398c761c6493e37@mail.gmail.com> References: <48D10ED0.5060005@vcu.edu> <9f8092cc0809170939g3eecfe43h398c761c6493e37@mail.gmail.com> Message-ID: From: John Hearns [mailto:hearnsj at googlemail.com] Sent: Wednesday, September 17, 2008 9:40 AM To: Lux, James P; beowulf at beowulf.org Subject: Re: [Beowulf] MS Cray . When mainframes first entered the halls of academe, I'm sure the same sort of discussions arose. Heck, it's why computers like the PDP-8 were invented. Jim Just let me correct you there. Surely PDP-8s were calculators or Data Processing whatchamacallits, and emphatically NOT Computer Systems. (A history lesson is called for here - I cannot remember the exact terminology which allowed PDPs to be sold to individual labs and departments) --Sort of like when we buy PCs for the lab, they're "instrument controllers".. Jim From james.p.lux at jpl.nasa.gov Wed Sep 17 10:20:30 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 10:20:30 -0700 Subject: [Beowulf] MS Cray In-Reply-To: References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> Message-ID: From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joshua Baker-LePain Sent: Wednesday, September 17, 2008 9:24 AM To: Gerry Creager Cc: Beowulf Subject: Re: [Beowulf] MS Cray On Wed, 17 Sep 2008 at 8:22am, Gerry Creager wrote > Gus Correa wrote: >> >> Here is the link to the CX1 on the Cray web site: >> >> http://www.cray.com/products/CX1.aspx >> >> You need MS Explorer to customize/price it. > > I just knew you had to be wrong, but sure enough, I can't see config options. > It's a show stopper for me. If I need IE to buy the system, it's not > likely to happen until A) there's an IE that runs natively on *nix, > and B) it doesn't have the myriad problems associated with IE in the past. Hrm -- WORKSFORME using FF3 on Fedora 9. Notably, Cray is happy to sell the CX1 to you sans OS. W2K8 is a $3752 option (at least in my sample config with 8 compute blades). As was noted, the Linux they suggest is "Red Hat Linux", by which one would assume they mean RHEL (why does everybody make that mistake -- there hasn't been a "Red Hat Linux" since RH9 (shudder)). Also, as one would expect, the hardware premium is hefty. A compute blade with dual Xeon E5462s, 16GB RAM (8x2GB), and an 80GB HDD is $6656. Without even trying too hard I can get a similarly configured 1U node for $4400. So that's a 50% markup on nodes, not to mention the almost $9K for the chassis. Of course, that's before any discounts. Maybe it's my miserly ways, but I just don't see the value proposition here... -- Noise? Packaging? Nobody ever got fired for buying Cray? From kyron at neuralbs.com Wed Sep 17 10:21:24 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 13:21:24 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D1283B.6070705@scalableinformatics.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D11085.8050702@neuralbs.com> <48D1283B.6070705@scalableinformatics.com> Message-ID: <48D13C94.8010800@neuralbs.com> Joe Landman wrote: > Eric Thibodeau wrote: >> Joe Landman wrote: >>> Gus Correa wrote: >>> >>>> Otherwise, your "newbie scientist" can put his/her earbuds and pump >>>> up the volume on his Ipod, >>>> while he/she navigates through the Vista colorful 3D menus. >>> >>> Owie .... I can just imagine the folks squawking about this at SC08 >>> "Yes folks, you need a Cray supercomputer to make Vista run at >>> acceptable performance ..." >> Maybe they have a "tune options for performance" option ;) >>> >>> The machine seems to run w2k8. My own experience with w2k8 is that, >>> frankly, it doesn't suck. This is the first time I have seen a >>> windows release that I can say that about. >> A few questions (not necessarily expecting a response): >> >> POSIX? >> VERBS? >> Kernel latency and scheduler control? > > Don't mistake me for a w2k8 apologist. I reamed them pretty hard on > the lack of a real posix infrastructure (they claim SUA, but frankly > it doesn't build most of what we throw at it, so it really is a > non-starter and not worth considering IMO). They need to pull Cygwin > in close and tight to get a good POSIX infrastructure. It is in their > best interests. Sadly, I suspect the ego driven nature of this will > pretty much prevent them from doing this. Can't touch the "toxic" OSS > now, can they ... Cygwin...yerk, that emulation bloat is slow as hell and can barely run some of my basic scripts. A simple find would put the CPU in 100% usage state. Now I don't blame Cygwin for this per say but rather the way that windows (probably) runs this as a DOS(ish) app in a somewhat polled mode. My lack of interest in that OS stopped me from diggin deeper into why Cygwin is so slow. Given it's a more than mature project, I'd have expected such poor performance to have been addressed by now. > > IB Verbs? Well through OFED, yes. Through the windows stack? Who > knows. We were playing with it on JackRabbit for a customer > test/benchmark. ...and...the results? ;) > > Kernel latency? Much better/more responsive than w2k3. Scheduler > control? Not sure how much you have. I don't like deep diving into > registries ... that is a pretty sure way to kill a windows machine. Well, let's just say these are mechanisms I expect an HPC machine to have when "squeezing the last drop of performance" is mentioned. >> >> These are the real barriers IMHO, without minimally supporting POSIX >> (threads), there is very little incentive to use the machine for >> development unless you're willing to accept the code will _only_ run >> on your "desktop". >>> >>> The low end economics probably won't work out for this machine >>> though, unless it is N times faster than some other agglomeration of >>> Intel-like products. Adding windows will add cost, not performance >>> in any noticeable way. >>> >>> The question that Cray (and every other vendor building >>> non-commodity units) is how much better is this than a small cluster >>> someone can build/buy on their own? Better as in faster, able to >>> leap more tall buildings in a single bound, ... (Superman TV show >>> reference for those not in the know). And the hard part will be >>> justifying the additional cost. If the machine isn't 2x the >>> performance, would it be able to justify 2x the price? Since it >>> appears to be a somewhat well branded cluster, I am not sure that >>> argument will be easy to make. >> I just rebuilt a 32 core cluster for ~5k$ (CAD) (8*Q6600 1Gig >> RAM/node + gige netwroking). Bang for the buck? I can't wait to see >> the CX1's performance specs under _both_ windows and Linux. > > The desktop CPUs/MBs will get you best bang per buck, as long as you > don't mind no ECC, and 8GB ram limits per node. For your > applications, this might be fine. For others, with large memory > footprint and long run times, I see people need/require ECC (as memory > density increases, ECC becomes important .... darned cosmic > rays/natural decays/noisy power supplies/...) Well, the nodes I built have MBs that state that they can go as high as 8Gigs, but any one with a little experience with the 8Gig mix + 800+MHz RAM know that very little hardware (MB) can actually do it in a stable fashion. My recent experiences with "fast" RAM (800 -1066MHz) is that they end up costing you time ($$$) since it would seem most MBs claim to support it but they all seem to have some impedance problems of some sort (totally unstable). And if one reads the fine prints and the QVLs (Qualifies Vendor Lists), you notice these really high throughputs are for low memory density of the banks (a _total_ of 2-4 Gigs max). I'd personally say this is more of an issue than the ECCism of RAM. I work on "Clusering Algorithms", and not to confuse people, I mean the k-means type which we could call "data aggregation/mining" algorithms. They are long running and applied to sizable databases (1.2Gigs) which need to be loaded onto each node. This is where having multi-core nodes comes in quite handy as there is way less time lost in data propagation and loading (the databases). ...which brings me to wonder how the I/O is managed under the CX1...is it as basic as one I/O node and GFS or do all nodes have their own I/O paths. I mention this since I've too often seen people ignore the I/O (load times) ignored in their performance assessments ;) From mathog at caltech.edu Wed Sep 17 10:39:30 2008 From: mathog at caltech.edu (David Mathog) Date: Wed, 17 Sep 2008 10:39:30 -0700 Subject: [Beowulf] Re: MS Cray Message-ID: "Lux, James P" wrote: > The other, here at JPL, I have heard about this contract before - and in my opinion, it is a horrible deal. The taxpayers get reamed and the vendor makes out like a bandit. (SNIP) >At any given time, there's a dozen or so kinds of computers >(desktop/laptop/PC/Mac) available, but since the configurations are >changing, and they have a 3 year recycle time, the key point being the 3 year lease. (SNIP) >Interestingly, the monthly cost for both organizations is about the >same ( a few hundred bucks a month for hardware lease+service). Let "a few hundred" = $200, and of course there are 36 months in 3 years, so JPL pays the vendor $7200 for each machine, plus "support" for this term. At the end of the lease the vendor gets the computer back, and they probably sell it for a few hundred dollars, just to sweeten the already cushy deal. The office staff may need support once and a while, but one can assume that the average JPL engineer or scientist can more than handle all their own PC software issues, and at worst would just need to swap a machine if there was a major hardware issue. In other words, the average support cost to the vendor for the technical staff is but a tiny fraction of what JPL pays them. Who's the vendor, Halliburton? Regards David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From gus at ldeo.columbia.edu Wed Sep 17 11:14:46 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed, 17 Sep 2008 14:14:46 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D10497.7080801@tamu.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> Message-ID: <48D14916.3020506@ldeo.columbia.edu> Hi Gerry and Beowulf fans Gerry Creager wrote: > Gus Correa wrote: > >> Here is the link to the CX1 on the Cray web site: >> http://www.cray.com/products/CX1.aspx >> You need MS Explorer to customize/price it. > > > I just knew you had to be wrong, but sure enough, I can't see config > options. Thanks for your trust! > It's a show stopper for me. If I need IE to buy the system, it's not > likely to happen until A) there's an IE that runs natively on *nix, > and B) it doesn't have the myriad problems associated with IE in the > past. > BTW, the Cray web site was changed today, and now I can configure/price the CX1 from Linux/Firefox. Gus Correa > I do admit to a sinking feeling when I noted that the front page (and > of course, the subsequent pages) were ASPX... > > I suspect Microsoft has been listening here. I also suspect this > machine will do ok in the business world, but somehow I doubt they're > gonna see significant headway in a lot of the scientific arenas. If > you aren't computer literate, you're not likely to port a complicated > model from *nix to Windows, nor are you likely to write a significant > piece of code. I've a geodesist friend who DOES write solely for > Windows, but that's a conscious choice by someone who was a talented > computer scientist first, and a geodesist later in life. He uses > Windows because, well, mainly because the folks he teaches, and writes > code for, do. However, he's the exception. > > The CX1 looks like something I'd love next to my desk -- with Linux on > it -- to accomplish testing before I take something to the big iron. > It might even allow me to pre- and post-process my data for hurricane > WRF runs. It's not hefty enough to let me do those runs in the > timeframe I require otherwise. > > It's a tool, not a solution. > > gerry > From hahn at mcmaster.ca Wed Sep 17 11:38:16 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 17 Sep 2008 14:38:16 -0400 (EDT) Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 In-Reply-To: References: <1220638410.4385019ctimchipman@myrealbox.com> <48C56BFF.6000304@aei.mpg.de> Message-ID: > Recently I benchmarked a Fortran based Scientific application on both > Intel Xeon Quad core and AMD Opteron Quad core with RHEL 5. Following are > the results: > > AMD-2.3GHz INTEL-2.33GHz > > 1. Serial 147.719 73.952 sec > 2. Parallel 4 core 39.798 32.317 sec > 3. Parallel 8 core 26.880 30.371 sec > > For AMD, I used GCC compiler(gfortran), which is released by AMD for > its barcelona-AMD fam10 based processors. that's a tiny bit misleading - I'm certain AMD would not suggest that their spin of GCC is superior to a commercial AMD-tuned compiler (eg Pathscale, PGI) I'm not sure how to understand the numbers. it's not reasonable to expect AMD to be half the speed of Intel in serial. I'd probably assume that's due to poor optimization; the alternative is that the code's working set fits in Intel's cache(s), but not in AMD's (certainly possible). the scaling numbers appear to support a cache-based interpretation: at 4c, the program is as in-cache as it can be but still has mandatory misses that eventually show AMD's memory-bandwidth advantage... From james.p.lux at jpl.nasa.gov Wed Sep 17 11:39:58 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 11:39:58 -0700 Subject: [Beowulf] RE: MS Cray In-Reply-To: References: Message-ID: > -----Original Message----- > From: David Mathog [mailto:mathog at caltech.edu] > Sent: Wednesday, September 17, 2008 10:40 AM > To: beowulf at beowulf.org > Cc: Lux, James P > Subject: Re: MS Cray > > "Lux, James P" wrote: > > > The other, here at JPL, > > I have heard about this contract before - and in my opinion, > it is a horrible deal. The taxpayers get reamed and the > vendor makes out like a bandit. > actually not.. It's quite competitive with the sort of deal other folks get in industry. > (SNIP) > > >At any given time, there's a dozen or so kinds of computers > >(desktop/laptop/PC/Mac) available, but since the configurations are > >changing, and they have a 3 year recycle time, > > the key point being the 3 year lease. > > (SNIP) Well.. That's always a trade.. Buying comes out of the capital bucket, leasing comes out of the expense bucket, and they have very different treatments, accounting wise. Don't forget that JPL works on a "cost reimbursement" basis: that is, we spend money, and the gov't, via NASA, reimburses those costs, and ONLY those costs. There are literally bookshelves full of rules (the Federal Acquisition Regulations) that tell you what is allocable, accountable, and reimburseable. JPL cannot borrow money to pay for things: the US government will not pay for interest on borrowed money. Most capital investments have to be approved by an act of congress, and the amortization of that investment requires special treatment to make sure that costs are properly allocated to each project. Leasing makes it easy..Accountants cost money too, if you do march down the amortization process. Many, many commercial companies have similar sorts of issues, particularly with respect to transactions between divisions (e.g. see Regulation W for banking), all designed to prevent "hiding profits". > > >Interestingly, the monthly cost for both organizations is about the > >same ( a few hundred bucks a month for hardware lease+service). > > Let "a few hundred" = $200, and of course there are 36 months > in 3 years, so JPL pays the vendor $7200 for each machine, > plus "support" for this term. At the end of the lease the > vendor gets the computer back, and they probably sell it for > a few hundred dollars, just to sweeten the already cushy > deal. actually, a mid range desktop machine is about $150, and they actually do monthly comparisons against other vendors, so the cost is probably pretty reasonable for a large volume consumer. Don't forget that part of the cost is the (legally required) paperwork to prove that the taxpayer's not getting the shaft. (e.g. those comparisons don't happen for free, and someone has to collate the data, put it into a report, etc.) The office staff may need support once and a while, but one can assume that the average JPL engineer or scientist can more than handle all their own PC software issues, and at worst would just need to swap a machine if there was a major hardware issue. In other words, the average support cost to the vendor for the technical staff is but a tiny fraction of what JPL pays them. The office staff may need support once and a while, > but one can assume that the average JPL engineer or scientist > can more than handle all their own PC software issues, and at > worst would just need to swap a machine if there was a major > hardware issue. In other words, the average support cost to > the vendor for the technical staff is but a tiny fraction of > what JPL pays them. Actually, before they went to a centralized support model, they found that the "shadow administrator" and "shadow support staff" costs were huge, and more to the point, not accurately measureable (what you can't measure, you can't manage). And, while most JPL technical staff are certainly capable of doing their own support, there's good reason for them not to: they typically have a more than full time job doing something more specialized (e.g. designing deep space communication links, or writing nav software for rovers, or designing and building rovers), and every minute they spend fooling with their computer is a minute not getting the *real job* of space exploration done. It's like working on my house: I'm capable of painting the walls myself, but I hire a professional to do it, because there are better uses for my time (and the specialist will do a better job). Support isn't trivial here, even for office staff. For a variety of reasons (and not unique to JPL.. Any other 5000+ employee, $1B year business will have similar ones), we have an amazingly wide array of various and sundry institutional applications to do things like timecards, keeping track of inventory, document mangement, etc. I'd venture that the desktop support staff spends more than 70% of their time dealing with non-OS related software issues (e.g. why is my email not getting through), and a very tiny fraction of their time responding to hardware problems or OS issues. For instance, they're rolling out a new "unified messaging system" which will integrate calendaring, based on MS Exchange Server (for whatever reason.. That's irrelevant), and rather than saying "Use Outlook", which would certainly reduce support costs, they ARE supporting half a dozen email clients, web clients, a bunch of calendaring clients, not to mention a fairly heterogenous set of hardware platforms. Let us not forget IT security. This is a non-trivial matter when you have to manage tens of thousands of desktops and comply with dozens of pages of government regulation, NASA procedural instructions, etc. The upshot is, you get a fair amount for your $130/month support subscription. To put that in context, that's less than two hours of engineer time. > > Who's the vendor, Halliburton? No, it's a division of Lockheed Martin, but basically anyone in the business of doing what they do charges about the same amount of money. The business *is* fairly competitive (and gets recompeted every 3-5 years, I think), and they changed vendors a few cycles ago. For what it's worth, we have a similar sort of scenario when dealing with test equipment. Do you own it, and have inhouse inventory, (with all the peculiar government contract stuff about cost accounting), or do you have an outside vendor provide it on lease/rent. Same sorts of heterogenous product line. Same sorts of issues with support (e.g. test and calibration of instruments). Same sort of problems with purchasing capital equipment on a cost reimbursement job (the government discourages you charging the full cost of your infrastructure to their job). A big disadvantage of in-house is that it leads to an inventory of ancient gear that becomes hard to maintain, and balkanized ownership (we bought that for Project X, and though Project X is long gone, the former staff of Project X still owns it). A big advantage of in-house is that you can walk across the street and pick up an oscilloscope. Jim From rgb at phy.duke.edu Wed Sep 17 12:08:16 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 17 Sep 2008 15:08:16 -0400 (EDT) Subject: [Beowulf] Re: MS Cray In-Reply-To: References: Message-ID: On Wed, 17 Sep 2008, David Mathog wrote: > Let "a few hundred" = $200, and of course there are 36 months in 3 > years, so JPL pays the vendor $7200 for each machine, plus "support" for > this term. At the end of the lease the vendor gets the computer back, > and they probably sell it for a few hundred dollars, just to sweeten the > already cushy deal. The office staff may need support once and a while, > but one can assume that the average JPL engineer or scientist can more > than handle all their own PC software issues, and at worst would just > need to swap a machine if there was a major hardware issue. In other > words, the average support cost to the vendor for the technical staff is > but a tiny fraction of what JPL pays them. > > Who's the vendor, Halliburton? Brilliantly put. And yet people buy in! Why? In a typical small business, they may well have no real systems administrators. "Administration" may be done by an office manager, a secretary, a staff person with a bit of computing in their blood or better than average common sense. In such an environment, the (say) $4200 surplus times 30 or 50 machines may end up being cheaper than hiring a full time real sysadmin, although this is an extreme example where one is right on the margin with these particular numbers even for that on the 50 end of things. Add to that, though, that Windows admin isn't terrribly scalable and things break a lot, so one admin cannot handle linux-like numbers of boxes, so it is still not quite crazy. The real question is why an admin-rich environment with lots of full time admins would ever buy into such a deal. If you've got a full time admin ANYWAY, paying $150/month for support on top of this (beyond the cost of the hardware is just insane. rgb > > Regards > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From kyron at neuralbs.com Wed Sep 17 12:18:06 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 15:18:06 -0400 Subject: [Beowulf] MS Cray In-Reply-To: <48D14916.3020506@ldeo.columbia.edu> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> <48D14916.3020506@ldeo.columbia.edu> Message-ID: <48D157EE.9050308@neuralbs.com> Gus Correa wrote: > Hi Gerry and Beowulf fans > > Gerry Creager wrote: > >> Gus Correa wrote: >> >>> Here is the link to the CX1 on the Cray web site: >>> http://www.cray.com/products/CX1.aspx >>> You need MS Explorer to customize/price it. >> >> >> I just knew you had to be wrong, but sure enough, I can't see config >> options. > > Thanks for your trust! > >> It's a show stopper for me. If I need IE to buy the system, it's not >> likely to happen until A) there's an IE that runs natively on *nix, >> and B) it doesn't have the myriad problems associated with IE in the >> past. >> > BTW, the Cray web site was changed today, > and now I can configure/price the CX1 from Linux/Firefox. I think I heard the "Oh crap!" from Cray from here when one of their employees must have noticed the remarks on the BW ml ;)...this might also explain why I was hitting errors on the page these past 2 days...nice QA ;) > > Gus Correa > >> I do admit to a sinking feeling when I noted that the front page (and >> of course, the subsequent pages) were ASPX... >> >> I suspect Microsoft has been listening here. I also suspect this >> machine will do ok in the business world, but somehow I doubt they're >> gonna see significant headway in a lot of the scientific arenas. If >> you aren't computer literate, you're not likely to port a complicated >> model from *nix to Windows, nor are you likely to write a significant >> piece of code. I've a geodesist friend who DOES write solely for >> Windows, but that's a conscious choice by someone who was a talented >> computer scientist first, and a geodesist later in life. He uses >> Windows because, well, mainly because the folks he teaches, and >> writes code for, do. However, he's the exception. >> >> The CX1 looks like something I'd love next to my desk -- with Linux >> on it -- to accomplish testing before I take something to the big >> iron. It might even allow me to pre- and post-process my data for >> hurricane WRF runs. It's not hefty enough to let me do those runs in >> the timeframe I require otherwise. >> >> It's a tool, not a solution. >> >> gerry >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From kyron at neuralbs.com Wed Sep 17 12:43:36 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 15:43:36 -0400 Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 In-Reply-To: References: <1220638410.4385019ctimchipman@myrealbox.com> <48C56BFF.6000304@aei.mpg.de> Message-ID: <48D15DE8.2090103@neuralbs.com> Sangamesh B wrote: > Hi Tim, > > Recently I benchmarked a Fortran based Scientific application on > both Intel Xeon Quad core and AMD Opteron Quad core with RHEL 5. > Following are the results: > > AMD-2.3GHz > INTEL-2.33GHz > > 1. Serial 147.719 > sec 73.952 sec > > 2. Parallel 4 core 39.798 > sec 32.317 sec > > 3. Parallel 8 core 26.880 > sec 30.371 sec I'll go ahead and assume your "Fortran based Sci app" is memory bound and this might be why the AMD based parallel version scales a little better given it's using HT as communication fabric (non-shared MMU). > > > For AMD, I used GCC compiler(gfortran), which is released by AMD > for its barcelona-AMD fam10 based processors. > > For Intel, Intel 10 compilers. Which compile flags were used? Also, note that I've had issues with icc generating really fast but inaccurate code (fp model is not IEEE *by default*, I am sure _everyone_ knows this and I am stating the obvious here). The figures are interesting but you could get a tad higher in the information by using the time utility ( the one from http://www.gnu.org/directory/time.html ) with the -v option. This way we can get a hint of cache use/misses and all sorts of generally neat stuff (without getting into profiling). > > Thank you, > Sangamesh > Conultant-HPC > > > > On Mon, Sep 8, 2008 at 11:46 PM, Carsten Aulbert > > wrote: > > Hi Tim, > > Tim Chipman wrote: > > Does anyone have any 'real world' experience with 'both' of > these CPUs, in terms of relative performance for 'whatever work > you do' ? > > > > I realize the xeon is a faster mhz part (3.16ghz xeon vs 2.3ghz > opteron) so I'm more concerned with "relative performance per mhz" > > > > I'm involved with a cluster project, and we have 2 options from > our vendor, > > > > - 25 compute nodes, dual-quadcore intel 5460, or > > - 32 compute nodes, dual-quadcore amd 2356 > > > >>From what I've been able to glean (Spec.org / SpecCPU 2006), > > > > - the intel chips have better integer performance > > - the amd chips have better FPU performance > > > > so the likely anticipated real-world performance result .. will > depend on how a given application blends / balances things. > > We have "only" the Quad-Xeon boxes with E5435 and these are quite > fast, > indeed it seems that FFTW seems to run faster on Xeons but I have not > made any benchmarks for the past ~ 9 months, so I don't know about the > latest Opterons. > > I think you need to come up with a real world scenario of what will be > run on the cluster and maybe compile a little benchmark yourself > ans aks > the vendor to run both (or get hands on on both boxes. I think that's > the only "fair" comparison that's possible. > > HTH > > Carsten > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Wed Sep 17 12:54:29 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 17 Sep 2008 15:54:29 -0400 Subject: [Beowulf] Re: MS Cray In-Reply-To: References: Message-ID: <48D16075.8090404@scalableinformatics.com> Robert G. Brown wrote: > The real question is why an admin-rich environment with lots of full > time admins would ever buy into such a deal. If you've got a full time > admin ANYWAY, paying $150/month for support on top of this (beyond the > cost of the hardware is just insane. Have you ever administered a lab full of these units? You need as much help as you can get to administer the windows machines. Sadly, while claims of there being more windows admins are true (thats not the sad part) you need (far) more to administer fewer windows machines than the fewer admins needed for more Linux machines (that is the sad part). We have seen 2 full time admins handle 4000+ Linux machines with time to develop software to make their lives easier (Incyte), as compared to seeing 10 windows admins struggle to keep 100 machines each up to date. The cluster version of w2k* should alleviate or at least make it tolerable for clusters. Desktops are another matter. Part of the reason that it is locked down so tight in large organizations is that when it fails (not "if") they want to reduce the degrees of freedom of the failure modes. Keep it simple, in their own way. But that is straying a bit. Large organizations often are not admin rich, they tend to try to cut out costs. Admins == costs. Admins need as much help as they can possibly get. If for $150/machine in a large organization, you can take away some level of their pain, this might be worth more than the cost of the additional healthcare coverage, heartburn medicine, and upper/lower GI series needed ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From james.p.lux at jpl.nasa.gov Wed Sep 17 13:12:33 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 13:12:33 -0700 Subject: [Beowulf] Re: MS Cray In-Reply-To: References: Message-ID: > -----Original Message----- > From: Robert G. Brown [mailto:rgb at phy.duke.edu] > Sent: Wednesday, September 17, 2008 12:08 PM > To: David Mathog > Cc: beowulf at beowulf.org; Lux, James P > Subject: Re: [Beowulf] Re: MS Cray > > On Wed, 17 Sep 2008, David Mathog wrote: > > > Let "a few hundred" = $200, and of course there are 36 months in 3 > > years, so JPL pays the vendor $7200 for each machine, plus > "support" > > for this term. At the end of the lease the vendor gets the > computer > > back, and they probably sell it for a few hundred dollars, just to > > sweeten the already cushy deal. The office staff may need support > > once and a while, but one can assume that the average JPL > engineer or > > scientist can more than handle all their own PC software > issues, and > > at worst would just need to swap a machine if there was a major > > hardware issue. In other words, the average support cost to the > > vendor for the technical staff is but a tiny fraction of > what JPL pays them. > > > > Who's the vendor, Halliburton? > > Brilliantly put. And yet people buy in! Why? > > In a typical small business, they may well have no real > systems administrators. "Administration" may be done by an > office manager, a secretary, a staff person with a bit of > computing in their blood or better than average common sense. > In such an environment, the (say) $4200 surplus times 30 or > 50 machines may end up being cheaper than hiring a full time > real sysadmin, although this is an extreme example where one > is right on the margin with these particular numbers even for > that on the 50 end of things. Add to that, though, that > Windows admin isn't terrribly scalable and things break a > lot, so one admin cannot handle linux-like numbers of boxes, > so it is still not quite crazy. > > The real question is why an admin-rich environment with lots > of full time admins would ever buy into such a deal. If > you've got a full time admin ANYWAY, paying $150/month for > support on top of this (beyond the cost of the hardware is > just insane. > In the model here at JPL, you wouldn't have a full time SA anyway. You'd get the services of one of the instutional SA army for your monthly fee. If you have a "computing project" (say, building a cluster), and a fulltime SA is part of the project, then you'd not buy the institutional services. Your private SA is then responsible for all the institutional requirements (which can be substantial). Such computers are "unsubscribed" computers, and you have to go through a (fairly simple) justification process to explain how it's not actually a desktop. (The cost to provide "desktop" support within an organization has been documented as being higher than from the outside vendor) The institutional contract is very much oriented towards "desktop" services, not towards "software development support". We have hundreds of computers that are not subscribed, and which have dedicated SA services, provided by inhouse staff. I think that rgb is right, though, that there are certain size and structure organizations for which things are different. It's tied to the "granularity" of buying people. For very small organizations, it's the "office mgr as admin" model, and they might call a consultant in for more complex things (I used to earn my living doing just this.. I'd do the server configs, cabling, etc, set up accounts, etc., build scripts.) I think the threshold for this is when SA duties are down in the few hours/week range. Then there are very large organizations (like JPL, or Fortune 500 companies) for which standardizing desktops and centralized support make sense. In between, it's when you have organizations that have enough work to justify about a full time person (or, maybe, 1/2 a person) for their SA job and moving up to a few FTE. You likely have some SA reserve capacity, so the incremental cost for adding one computer is small (as in almost zero), and in this situation, the choice between delta pay for an SA (zero) and $150/month for a service contract is an easy one to make. Overall institutional structure has a lot to do with it. Academia, in general, has a strong departmental structure, and individual departments tend to be run as independent businesses. As a result, they're sort of working like small/medium sized businesses, with the same zero delta $ for SA vs monthly fee. Some departments are full of technically skilled folks, some of whom for which the incremental hourly cost is very low. If you have someone on salary, "can you spend a half hour looking at my computer" is free (until that poor SA gets overloaded and burns out, or graduates). And, small entreprenurial organizations are incorrigble optimists. (anyone in academic research has to be an optimist,.. The bold grant proposal typically wins.) They will willingly accept the risk that something that is expensive to fix won't crop up. Very small organizations take the risk, because they can't afford otherwise. More than one small company has gone out of business because their computer died, taking all the business records with it. Very large organizations are conservative. To a certain extent, the monthly fee is sort of like insurance.. You can gamble that no "big problem" crops up that is beyond your inhouse SA capability, and that you can do minimal SA work. OTOH, you can pay the monthly fee, and when a disaster occurs (a virus infects all your computers, or there's a regulatory compliance edict that requires you to do something to all 50 of your computers), the service provider sucks it up and does it. (on the odds that not everyone has a problem at the same time) Jim From james.p.lux at jpl.nasa.gov Wed Sep 17 13:14:21 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 13:14:21 -0700 Subject: [Beowulf] MS Cray In-Reply-To: <48D157EE.9050308@neuralbs.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> <48D14916.3020506@ldeo.columbia.edu> <48D157EE.9050308@neuralbs.com> Message-ID: > >> > >> I suspect Microsoft has been listening here. I also suspect this > >> machine will do ok in the business world, but somehow I > doubt they're > >> gonna see significant headway in a lot of the scientific > arenas. Of course MS is on the list. Why not? Look back through the archives when CCS was being discussed. And if MS wants to develop products to address some specific market niche, more power to them. From lindahl at pbm.com Wed Sep 17 13:25:43 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Wed, 17 Sep 2008 13:25:43 -0700 Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 In-Reply-To: <48D15DE8.2090103@neuralbs.com> References: <1220638410.4385019ctimchipman@myrealbox.com> <48C56BFF.6000304@aei.mpg.de> <48D15DE8.2090103@neuralbs.com> Message-ID: <20080917202543.GF24126@bx9.net> On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote: > Also, note that I've had issues with icc > generating really fast but inaccurate code (fp model is not IEEE *by > default*, I am sure _everyone_ knows this and I am stating the obvious > here). All modern, high-performance compilers default that way. It's certainly the case that sometimes it goes more horribly wrong than necessary, but I wouldn't ding icc for this default. Compare results with IEEE mode. -- greg From kyron at neuralbs.com Wed Sep 17 13:27:54 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 16:27:54 -0400 Subject: [Beowulf] MS Cray In-Reply-To: References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> <48D14916.3020506@ldeo.columbia.edu> <48D157EE.9050308@neuralbs.com> Message-ID: <48D1684A.5080400@neuralbs.com> Lux, James P wrote: >>>> I suspect Microsoft has been listening here. I also suspect this >>>> machine will do ok in the business world, but somehow I >>>> >> doubt they're >> >>>> gonna see significant headway in a lot of the scientific >>>> >> arenas. >> > > Of course MS is on the list. Why not? Look back through the archives when CCS was being discussed. And if MS wants to develop products to address some specific market niche, more power to them. > Not that this was one of my comments, the MS dude never hid, he actually posted explicit questions regarding how MS should approach the clustering community ;) On that node, here are my questions to the MS implementation of clustering (and how CX1 will actually be usable or how one will be able to develop _for_ execution on one of the MS based CX1): Currently, most of our users are under Linux on their WS so that they can develop something that will potentially run off a big cluster. This implies the users are under a POSIX compliant OS with good MPI/OpenMP/threading support, which waters down to Linux or OSX (those Power Macs are impressive and *silent*). Where I have reserves about the MS solution is as follows: how will MS users be able to develop parallel code locally on their WS without needing to upgrade/change their hardware/OS to be compatible with a MS based cluster/CX1. It's not made clear weather the "clustering tools" are tightly integrated into the CX1 platform or if, as with Linux or OSX, it's a simple case of installing mpi-ish libs (and a few others). With the pricing scheme, I can't imagine _every_ dev getting their own CX1 to play on so I believe adoption of the platform required ease of installation (ie: all tools should be available throughout windows 2000-xp-vista) and shouldn't even be version specific (IMHO)...but that's me being used to Linux heh! ;) Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.p.lux at jpl.nasa.gov Wed Sep 17 13:27:55 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 13:27:55 -0700 Subject: [Beowulf] Re: MS Cray In-Reply-To: <48D16075.8090404@scalableinformatics.com> References: <48D16075.8090404@scalableinformatics.com> Message-ID: > -----Original Message----- > From: Joe Landman [mailto:landman at scalableinformatics.com] > Sent: Wednesday, September 17, 2008 12:54 PM > To: Robert G. Brown > Cc: David Mathog; beowulf at beowulf.org; Lux, James P > Subject: Re: [Beowulf] Re: MS Cray > > > > Robert G. Brown wrote: > > > The real question is why an admin-rich environment with > lots of full > > time admins would ever buy into such a deal. If you've got a full > > time admin ANYWAY, paying $150/month for support on top of this > > (beyond the cost of the hardware is just insane. > > Have you ever administered a lab full of these units? You > need as much help as you can get to administer the windows > machines. Sadly, while claims of there being more windows > admins are true (thats not the sad > part) you need (far) more to administer fewer windows > machines than the fewer admins needed for more Linux machines > (that is the sad part). > > We have seen 2 full time admins handle 4000+ Linux machines > with time to develop software to make their lives easier > (Incyte), as compared to seeing 10 windows admins struggle to > keep 100 machines each up to date. I think part of the problem in the Windows world is the incredible diversity of applications (by which I include websites with significant client side processing) that wind up being run on them. Rich growth medium, lots of spontaneous mutations. When you get to large desktop rollouts, Windows can have fairly low admin overhead, but it's done by restricting flexibility (e.g. SMS, boot from the network, etc.) to reduce the nutritional value of the growth medium. If everyone boots the same image from the net, applying a patch to 10,000 computers is trivial. While such an environment would probably make everyone on this list exceedingly unhappy (I could guarantee there's no compiler of any kind in it..you might get a JRE, and edit your source code in MSWord), it would (and does) serve a huge number of folks in the business world perfectly well. Windows in a development intensive, HPC environment, is going to be admin expensive. Jim From mathog at caltech.edu Wed Sep 17 13:33:25 2008 From: mathog at caltech.edu (David Mathog) Date: Wed, 17 Sep 2008 13:33:25 -0700 Subject: [Beowulf] RE: MS Cray Message-ID: "Lux, James P" wrote: > Well.. That's always a trade.. Buying comes out of the capital bucket, > leasing comes out of the expense bucket, and they have very different > treatments, accounting wise. Don't forget that JPL works on a "cost > reimbursement" basis: that is, we spend money, and the gov't, via > NASA, reimburses those costs, and ONLY those costs. There are > literally bookshelves full of rules (the Federal Acquisition > Regulations) that tell you what is allocable, accountable, and > reimburseable. Translation: "bureaucracy is expensive". > the US government will not pay for interest on borrowed money. Don't tell the folks holding all of those US bonds! I'm pretty sure you meant that it will not allow its subsidiaries to borrow money separately. > Most capital investments have to be approved by an act of > congress, and the amortization of that investment requires special > treatment to make sure that costs are properly allocated to each > project. Leasing makes it easy. Accountants cost money too, if you do > march down the amortization process. Many, many commercial > companies have similar sorts of issues, particularly with respect > to transactions between divisions (e.g. see Regulation W for > banking), all designed to prevent "hiding profits". Which leads to so much overhead that we end up with the $700 hammer (for military work, which has even more of it), and at JPL, hardware prices much above what an individual would pay to purchase the same item on the open market. (Without benefit of a volume discount). There should be a happy medium between using regulations to squeeze out waste/fraud, and drowning in red tape. To me this program seems rather closer to the latter than the former. > Actually, before they went to a centralized support model, they found > that the "shadow administrator" and "shadow support staff" costs were > huge, and more to the point, not accurately measureable (what you > can't measure, you can't manage). Conversely, it may cost more to measure and manage than it saves. > And, while most JPL technical staff are certainly capable of doing > their own support, there's good reason for them not to Agreed. However, there is a simple way to enforce that, deny end users admin access to JPL supplied PCs. (Well, it might not so simple to keep the technically adept out of their machines. ) > Support isn't trivial here, even for office staff. For a variety of reasons (and not unique to JPL.. Any other 5000+ employee, $1B year business will have similar ones), we have an amazingly wide array of various and sundry institutional applications to do things like timecards, keeping track of inventory, document mangement, etc. I'd venture that the desktop support staff spends more than 70% of their time dealing with non-OS related software issues (e.g. why is my email not getting through), and a very tiny fraction of their time responding to hardware problems or OS issues. Point 1, I believe you that support isn't spending much time fixing hardware, which is why I think this contract is too expensive. Point 2, if the core software (presumably mostly web based at this juncture) is so problematic, it would suggest that that is the place to go for cost savings. > Let us not forget IT security. This is a non-trivial matter when you have to manage tens of thousands of desktops and comply with dozens of pages of government regulation, NASA procedural instructions, etc. > > The upshot is, you get a fair amount for your $130/month support subscription. To put that in context, that's less than two hours of engineer time. So $4680 for three years for purchase and support of what type of machine exactly? Does the XXX/month include the support of the centralized business software, or just the end user's machine? It was my impression that it was the latter, which is why I thought this manner of supplying computers to be overly expensive. Surely network support is covered in some sort of overhead and isn't allocated on a machine by machine basis. > For what it's worth, we have a similar sort of scenario when dealing > with test equipment. Do you own it, and have in house inventory, > (with all the peculiar government contract stuff about cost > accounting), or do you have an outside vendor provide it > on lease/rent. That's a tough one. I would assume that things like oscilloscopes and voltmeters would be owned and kept in a pool for checkout. These have a long service life and aren't all that expensive. For specialized and expensive equipment, it is too complex to generalize, even before the government purchase rules are thrown into the mix. These sorts of tools often need expensive service contracts if purchased (to cover the replacement of failed parts which are not generally available), and it often comes down to 6 of one and half dozen of the other if buy or lease is most cost effective. It's a different problem though. Most test equipment is not a commodity. Computers are. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From landman at scalableinformatics.com Wed Sep 17 13:35:22 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 17 Sep 2008 16:35:22 -0400 Subject: [Beowulf] Re: MS Cray In-Reply-To: References: <48D16075.8090404@scalableinformatics.com> Message-ID: <48D16A0A.3080808@scalableinformatics.com> Lux, James P wrote: > When you get to large desktop rollouts, Windows can have fairly low > admin overhead, but it's done by restricting flexibility (e.g. SMS, > boot from the network, etc.) to reduce the nutritional value of the Sadly, I haven't seen this at the large customer rollouts I have seen. It is anything but trivial. In any respect of this word. Some of the organizations I have seen have now, completely thrown their hands in the air over it, and hand admin rights on the laptops back to the users. They tell them what they can and cannot break, and then let them install/manage their own software (apart from corporate software pushes). Remarkably, this appears to be reducing cost/time/headache on admin. As this is a fortune 100 company of which I speak, I am sure they are not alone. Someones workstation gets hosed, they re-image from a known start. Off they go again. Their data is their responsibility. Interesting to see how behavior changes then. [...] > Windows in a development intensive, HPC environment, is going to be > admin expensive. Hopefully future windows will look a great deal more like w2k8, so hopefully the pain will be lower. If I can get a reasonable cost license for w2k8, I could easily see running it on my laptop (over XP). Joe > > Jim > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From kyron at neuralbs.com Wed Sep 17 13:46:11 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 16:46:11 -0400 Subject: [Beowulf] Q: AMD Opteron (Barcelona) 2356 vs Intel Xeon 5460 In-Reply-To: <20080917202543.GF24126@bx9.net> References: <1220638410.4385019ctimchipman@myrealbox.com> <48C56BFF.6000304@aei.mpg.de> <48D15DE8.2090103@neuralbs.com> <20080917202543.GF24126@bx9.net> Message-ID: <48D16C93.5010002@neuralbs.com> Greg Lindahl wrote: > On Wed, Sep 17, 2008 at 03:43:36PM -0400, Eric Thibodeau wrote: > > >> Also, note that I've had issues with icc >> generating really fast but inaccurate code (fp model is not IEEE *by >> default*, I am sure _everyone_ knows this and I am stating the obvious >> here). >> > > All modern, high-performance compilers default that way. It's certainly > the case that sometimes it goes more horribly wrong than necessary, but > I wouldn't ding icc for this default. Compare results with IEEE mode. > > -- greg > Guess gcc isn't part of that gang. I haven't had the chance to play with the commercial compilers other than icc. ;) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kyron at neuralbs.com Wed Sep 17 13:52:58 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 16:52:58 -0400 Subject: [Beowulf] Re: MS Cray In-Reply-To: References: <48D16075.8090404@scalableinformatics.com> Message-ID: <48D16E2A.5030500@neuralbs.com> Lux, James P wrote: >> -----Original Message----- >> From: Joe Landman [mailto:landman at scalableinformatics.com] >> Sent: Wednesday, September 17, 2008 12:54 PM >> To: Robert G. Brown >> Cc: David Mathog; beowulf at beowulf.org; Lux, James P >> Subject: Re: [Beowulf] Re: MS Cray >> >> >> >> Robert G. Brown wrote: >> >> >>> The real question is why an admin-rich environment with >>> >> lots of full >> >>> time admins would ever buy into such a deal. If you've got a full >>> time admin ANYWAY, paying $150/month for support on top of this >>> (beyond the cost of the hardware is just insane. >>> >> Have you ever administered a lab full of these units? You >> need as much help as you can get to administer the windows >> machines. Sadly, while claims of there being more windows >> admins are true (thats not the sad >> part) you need (far) more to administer fewer windows >> machines than the fewer admins needed for more Linux machines >> (that is the sad part). >> >> We have seen 2 full time admins handle 4000+ Linux machines >> with time to develop software to make their lives easier >> (Incyte), as compared to seeing 10 windows admins struggle to >> keep 100 machines each up to date. >> > > I think part of the problem in the Windows world is the incredible diversity of applications (by which I include websites with significant client side processing) that wind up being run on them. Rich growth medium, lots of spontaneous mutations. > > When you get to large desktop rollouts, Windows can have fairly low admin overhead, but it's done by restricting flexibility (e.g. SMS, boot from the network, etc.) to reduce the nutritional value of the growth medium. If everyone boots the same image from the net, applying a patch to 10,000 computers is trivial. While such an environment would probably make everyone on this list exceedingly unhappy (I could guarantee there's no compiler of any kind in it..you might get a JRE, and edit your source code in MSWord), it would (and does) serve a huge number of folks in the business world perfectly well. > Nonsense, in that line of thoughts, all that is required is to maintain a "dev" desktop image to be served through the network. I do it with linux with quite some ease, I see no reasons why this wouldn't be the case under windows (meaning that once you're able to boot a network image under windows, the mechanics of it should be trivially flexible enough to point to differing images). > Windows in a development intensive, HPC environment, is going to be admin expensive. > ...thinking about all those admins that never wanted to learn Linux, the _will_ have to learn something new this time! Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From sp at numascale.com Wed Sep 17 12:39:34 2008 From: sp at numascale.com (Steffen Persvold) Date: Wed, 17 Sep 2008 21:39:34 +0200 Subject: [Beowulf] MS Cray In-Reply-To: <48D157EE.9050308@neuralbs.com> References: <48D01522.6030107@ldeo.columbia.edu> <48D01893.4010904@ias.edu> <48D030CC.5020005@ldeo.columbia.edu> <48D03598.2050407@scalableinformatics.com> <48D04B13.5060303@ldeo.columbia.edu> <48D10497.7080801@tamu.edu> <48D14916.3020506@ldeo.columbia.edu> <48D157EE.9050308@neuralbs.com> Message-ID: <48D15CF6.1020707@numascale.com> Eric Thibodeau wrote: > Gus Correa wrote: >> BTW, the Cray web site was changed today, >> and now I can configure/price the CX1 from Linux/Firefox. > I think I heard the "Oh crap!" from Cray from here when one of their > employees must have noticed the remarks on the BW ml ;)...this might > also explain why I was hitting errors on the page these past 2 > days...nice QA ;) The Cray website is currently down for all practical purposes. Even with IE7 the site just rarely loads correctly; 90% of the time one of the pane buttons on the site (Products, Support etc.) is missing the bitmap file etc... Other times you just get a "HTTP Error 503 - Service unavailable" error. I guess that's what you get with a M$ web server, ey ? cheers, --SP From ajt at rri.sari.ac.uk Wed Sep 17 14:00:11 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Wed, 17 Sep 2008 22:00:11 +0100 Subject: [Beowulf] MS Cray In-Reply-To: References:

Message-ID: <48D16FDB.3040900@rri.sari.ac.uk> Tim Cutts wrote: > > On 17 Sep 2008, at 2:22 pm, Lux, James P wrote: > >> But how is that any different than having a PC on your desk? >> >> I see the deskside supercomputer as a revisiting of the "workstation" >> class computer. Used to be that PCs and Apples were what sat on most >> peoples desks, but some had Apollo or Sun or Perq workstations, >> because they had applications that needed the computational horsepower >> (or, more likely, the high res hardware graphics support.. A CGA was >> pretty painful for doing PC board layout). >> >> Same sort of thing for having the old Tektronix 4014 graphics >> terminal, rather than hiking down to the computer center to pick up >> your flatbed plotter output. Hello, Jim and Tim. I agree with you, Jim - I used to do a lot of image analysis, and I had a 'Torch' VME 68030 workstation on my desk, followed by a Sun SPARC10... > We don't generally allow people here to buy their own PCs and Apples > either. They get a standard build from us, all centrally managed by > LanDESK. They also get a known type of hardware; they can't just buy > what the hell they like. I have more than 800 Windows desktops to > support. If they were all different and purchased ad-hoc by individual > users, I would be in even worse hell than I am already. Poacher turned gamekeeper, Tim? If you want a 'standard' locked-down desktop go to PC hell. If you don't, then support it yourself - What's the problem? > Most people don't build Beowulf clusters out of ad-hoc piles of machines > from God-knows-where. Most of us buy consistent hardware, because it's > impossible to support anything else. No, some of us are still part of the rebel alliance - building Beowulf clusters out of anything we can beg, borrow or steal to do the science that we are interested in. Were those DEC Alpha's you had 'consistent' hardware, or did you just buy a new generation of IBM blade servers to replace them after you decided to buy 'consistent' hardware? I've got all sorts of kit ranging from Athlon XP 2400+ to Opteron 2212's in our Beowulf cluster and they are all doing a worthwhile job. I doubt that my situation is unique. As the saying goes "It's not having what you want that matters, but wanting what you have..." ;-) > The Tektronix graphics terminal is slightly different, because it was > just that, a terminal, and consequently doesn't present such a headache. You know what really makes my blood boil is being criticised for using 'non-standard' PC's, Mac's... whatever because "it can't be supported", when I have never wanted or needed support! I'm all in favour of people who DO want a 'managed' desktop having a standard-issue PC running the evil empire's wares, but ONLY if that's what the end user wants. Yes, I've done PC support and I do have experience of the issues involved. So, back to the point, I'm very much in favour of the 'democratisation' of HPC as manifested by DIY Beowulf clusters and affordable desk-side horse-power. HPC on BIG-IRON is not under the control of scientists who want to use it. That's why they use 'discretionary' budgets to buy the sort of kit Orion tried to sell, Apple do sell and Cray are pushing onto the 'personal' HPC market. Indeed, Moshe Bar was so convinced that this was the way to go that he ended the openMosix project [tears]. Hmm... back to earth now - needed to get that off my chest :-) Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:ajt at rri.sari.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From kyron at neuralbs.com Wed Sep 17 14:05:57 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Wed, 17 Sep 2008 17:05:57 -0400 Subject: [Beowulf] ethernet bonding performance comparison "802.3ad" vs Adaptive Load Balancing In-Reply-To: References: Message-ID: <48D17135.80400@neuralbs.com> Rahul Nabar wrote: > I was experimenting with using channel bonding my twin eth ports to > get a combined bandwidth of (close to) 2 Gbps. The two relevant modes > were 4 (802.3ad) and 6 (alb=Adaptive Load Balancing). I was trying to > compare performance for both. > > Before running any sophisticated tests by netperf etc. I just tried to > copy a large file via scp and timed the two file-copies. > > Option1: > from node1 to node2. Both nodes have their twin ports bonded together > as bond0 with mode=4 (802.3ad). > > They are connected via a Dell PowerConnect 6248 switch. Configured the > switch so that I have two LAG groups combining the two ports coming > from the same node. LACP was turned on. > > Option2: > from node3 to node4. Use mode=6 (alb=Adaptive Load Balancing) No > special switch config. No LAG. No LACP. > > Result: For a 4GB file-transfer. Both modes took the same time; approx > 1min26 sec. > > These results are very mystifying to me. I was expecting mode4 > (802.3ad ) to be almost twice as fast since it is the only mode which > truly aggregates the twin channels. It ought to be the only one > effective for a peer-to-peer communication (mode 6 would only help > while talking with more than one peer) > > Any comments? Also the net file transfer speed seems way lower than > what I'd expect from a close to 2 Gbps connect; even accounting for > the protocol overheads. Do other people have some numbers for me from > their systems? > Well, apart from the fact that ssh is compressed and, as Digo pointed out and that 47 MB/sec is probably your HDD's transfer capacity as Shannon pointed out, also keep in mind your bus's capacity ( http://en.wikipedia.org/wiki/List_of_device_bandwidths is a nice list). So, unless you've got both NICs on PCI-E (or independant PCI channels, which I've only heard of in high-end Compaq servers with hotswap PCI interfaces) you're saturating your bus. Eric From ajt at rri.sari.ac.uk Wed Sep 17 14:09:48 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Wed, 17 Sep 2008 22:09:48 +0100 Subject: [Beowulf] MS Cray In-Reply-To: <9f8092cc0809170939g3eecfe43h398c761c6493e37@mail.gmail.com> References: <48D10ED0.5060005@vcu.edu> <9f8092cc0809170939g3eecfe43h398c761c6493e37@mail.gmail.com> Message-ID: <48D1721C.5080503@rri.sari.ac.uk> John Hearns wrote: > [...] > Just let me correct you there. Surely PDP-8s were calculators or Data > Processing whatchamacallits, > and emphatically NOT Computer Systems. > (A history lesson is called for here - I cannot remember the exact > terminology which allowed PDPs to be sold to individual labs and > departments) Hello, John. It's interesting to read about the history of PDP's in this context: http://en.wikipedia.org/wiki/Programmed_Data_Processor I'm a verteran of Version 7 Unix on the pdp11/23 and 11/34 myself, and I had an 11/23 in my front room until I bought a PC to replace it :-) Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:ajt at rri.sari.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From james.p.lux at jpl.nasa.gov Wed Sep 17 14:12:52 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Wed, 17 Sep 2008 14:12:52 -0700 Subject: [Beowulf] RE: MS Cray In-Reply-To: References: Message-ID: