From Michael.Frese at NumerEx-LLC.com Wed Jun 4 15:52:14 2008 From: Michael.Frese at NumerEx-LLC.com (Michael H. Frese) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] OFED/IB for FC8 Message-ID: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> Following Jeff Layton's post to this list [Cheap SDR IB] on January 28, we purchased 8 Infinihost LX's and an 8 port switch, and began trying to get the OpenFabrics (OFED) release of MVAPICH for Fedora Core 6 to run on our new machines. We develop and run a multiphysics code in a relatively fine grain parallel mode where latency dominates the performance scaling, so it seemed like a good thing to try. This is our first exposure to InfiniBand, though we have considerable experience with MPI, both in-memory and over GigE, including using netpipe to measure latency and bandwidth. Those machines have AMD Athlon X2 6000+'s on Asus M2N-SLI Deluxe motherboards with an open PCI Express slot that will handle x4. The main issue is that we are presently running Fedora Core 8 and the 2.6.21 SMP kernel, but there is no OFED release for FC8 yet. Is anyone else working on this? Has anyone succeeded at getting it to work? We started with OFED version 1.2.5 from http://www.openfabrics.org/downloads/OFED/ofed-1.2.5/OFED-1.2.5-RPMS/ We downloaded all the rpms from redhat-release-4AS-6.1 version. In particular the kernel rpms are kernel-ib-devel-1.2-2.6.9_55.ELsmp and kernel-ib-1.2-2.6.9_55.ELsmp. We used the 1.2.5 version because there don't seem to be any rpms for the 1.3 version. All the OFED rpm's for FC6 installed on FC8 without difficulty, except for opensm-3.0.3-0.ppc64.rpm It didn't say "missing dependencies ..." It just got stuck. We had to kill the 'rpm -ivh', remove the lock file and rebuild the rpm database. After that, # lsmod | grep ib shows about 15 IB related kernel mods. Even so, at this point, some of the IB stuff works. We can run ibnetdiscover and see the HCA's on the two machines that have the rpm's installed, and the switch, too. We could use that to make a topology file, but we don't know where to put it, or even if we should put it somewhere. We can run ibchecknet, and though it finds 4 nodes, it says they are all bad. It also reports "lid 0 address resolution: FAILED". We have not succeeded in getting ibping to work, and aren't really sure what how to specify the remote address for it. We found /usr/share/doc/ofed-docs-1.2/README.txt /usr/share/doc/ofed-docs-1.2/OFED_Installation_Guide.txt and, as described there, did # /etc/init.d/openibd start Loading QLogic InfiniPath driver: [FAILED] Loading HCA driver and Access Layer: [ OK ] Setting up InfiniBand network interfaces: Failed to configure IPoIB connected mode for ib0 Bringing up interface ib0: [FAILED] Setting up service network . . . [ done ] Loading ib_sdp [FAILED] Loading ib_vnic [FAILED] Module ib_vnic not loaded. Bringing up VNIC interfaces [FAILED] That mostly looks bad. Does anyone have any suggestions? We are willing to try a build from source, but we are unsure of what challenges might lie down that path. We'd rather not fall back to FC6, but we may have to do that. Thanks for your help. Mike Frese -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080604/085e9134/attachment.html From lindahl at pbm.com Wed Jun 4 17:15:49 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] OFED/IB for FC8 In-Reply-To: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> References: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> Message-ID: <20080605001549.GE27430@bx9.net> > All the OFED rpm's for FC6 installed on FC8 without difficulty, > except for opensm-3.0.3-0.ppc64.rpm This is the cause of most of your subsequent problems. Without an SM running somewhere on your network, the links don't come fully up. There are mailing lists devoted to OFED that you could ask on. Building the software from scratch is probably the most straight-forward way to get something that works. -- greg From john.hearns at streamline-computing.com Wed Jun 4 22:36:44 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] OFED/IB for FC8 In-Reply-To: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> References: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> Message-ID: <1212644214.7657.6.camel@Vigor13> On Wed, 2008-06-04 at 16:52 -0600, Michael H. Frese wrote: > Does anyone have any suggestions? > > We are willing to try a build from source, but we are unsure of what > challenges might lie down that path. > I agree with Greg. Build it from source - that way you will have the latest version (1.3), you will learn about the software stack whilst doing it and you will know which switches were used during the configuration process. Depending solely on distribution supplied RPMs for any HPC type software is a bad move IMHO - as you've just seen it might not have a feature you need, or might not install exactly the way you want it. And you'll always have the version supplied by the distribution, and won't be able to update if you hit a bug or need a new feature. Remember, we're in the era of open source. That's why you chose to use Linux - you have control. John Hearns From mfatica at gmail.com Wed Jun 4 16:27:40 2008 From: mfatica at gmail.com (Massimiliano Fatica) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] OFED/IB for FC8 In-Reply-To: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> References: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> Message-ID: <8e6393ac0806041627k1a0f58c0r319a603154d33068@mail.gmail.com> We have the same switch. I was able to get it to work with the latest OFED 1.3 ( available from the Mellanox web site). They have rpms for RHEL4 and RHEL5. Massimiliano On Wed, Jun 4, 2008 at 3:52 PM, Michael H. Frese < Michael.Frese@numerex-llc.com> wrote: > Following Jeff Layton's post to this list [Cheap SDR IB] on January 28, > we purchased 8 Infinihost LX's and an 8 port switch, and began trying to > get > the OpenFabrics (OFED) release of MVAPICH for Fedora Core 6 to run on our > new > machines. We develop and run a multiphysics code in a relatively fine > grain parallel mode > where latency dominates the performance scaling, so it seemed like a good > thing to try. > > This is our first exposure to InfiniBand, though we have considerable > experience with MPI, both in-memory and over GigE, including using netpipe > to > measure latency and bandwidth. > > Those machines have AMD Athlon X2 6000+'s on Asus M2N-SLI Deluxe > motherboards > with an open PCI Express slot that will handle x4. > > The main issue is that we are presently running Fedora Core 8 and the > 2.6.21 > SMP kernel, but there is no OFED release for FC8 yet. Is anyone else > working > on this? Has anyone succeeded at getting it to work? > > We started with OFED version 1.2.5 from > http://www.openfabrics.org/downloads/OFED/ofed-1.2.5/OFED-1.2.5-RPMS/ > We downloaded all the rpms from redhat-release-4AS-6.1 version. > In particular the kernel rpms are kernel-ib-devel-1.2-2.6.9_55.ELsmp and > kernel-ib-1.2-2.6.9_55.ELsmp. > > We used the 1.2.5 version because there don't seem to be any rpms for the > 1.3 version. > > All the OFED rpm's for FC6 installed on FC8 without difficulty, except for > opensm-3.0.3-0.ppc64.rpm > It didn't say "missing dependencies ..." It just got stuck. We had to kill > the 'rpm -ivh', remove the lock file > and rebuild the rpm database. After that, > > # lsmod | grep ib > > shows about 15 IB related kernel mods. > > Even so, at this point, some of the IB stuff works. We can run > ibnetdiscover and see the HCA's on the > two machines that have the rpm's installed, and the switch, too. We could > use > that to make a topology file, but we don't know where to put it, or even if > we > should put it somewhere. We can run ibchecknet, and though it finds 4 > nodes, > it says they are all bad. It also reports "lid 0 address resolution: > FAILED". We have not succeeded in getting ibping to work, and aren't > really > sure what how to specify the remote address for it. > > We found > > /usr/share/doc/ofed-docs-1.2/README.txt > /usr/share/doc/ofed-docs-1.2/OFED_Installation_Guide.txt > > and, as described there, did > > # /etc/init.d/openibd start > Loading QLogic InfiniPath driver: [FAILED] > Loading HCA driver and Access Layer: [ OK ] > Setting up InfiniBand network interfaces: > Failed to configure IPoIB connected mode for ib0 > Bringing up interface ib0: [FAILED] > Setting up service network . . . [ done ] > Loading ib_sdp [FAILED] > Loading ib_vnic [FAILED] > Module ib_vnic not loaded. > Bringing up VNIC interfaces [FAILED] > > That mostly looks bad. > > Does anyone have any suggestions? > > We are willing to try a build from source, but we are unsure of what > challenges might lie down that path. > > We'd rather not fall back to FC6, but we may have to do that. > > Thanks for your help. > > > Mike Frese > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080604/7b66ecc1/attachment.html From rainer at lfbs.RWTH-Aachen.DE Thu Jun 5 02:38:24 2008 From: rainer at lfbs.RWTH-Aachen.DE (Rainer Finocchiaro) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] OFED/IB for FC8 In-Reply-To: <20080605001549.GE27430@bx9.net> References: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> <20080605001549.GE27430@bx9.net> Message-ID: <4847B410.1070202@lfbs.rwth-aachen.de> Hi Michael, Greg Lindahl schrieb: >> All the OFED rpm's for FC6 installed on FC8 without difficulty, >> except for opensm-3.0.3-0.ppc64.rpm > > This is the cause of most of your subsequent problems. Without an SM > running somewhere on your network, the links don't come fully up. > > There are mailing lists devoted to OFED that you could ask > on. Building the software from scratch is probably the most > straight-forward way to get something that works. > > -- greg I completely agree with Greg. I will add some comments, as I have just installed the distribution (the hard way: under Debian). Following your link, I reach a download directory offering only ppc64-RPMs; in fact all precompiled RPMs for OFED-1.2.5 are for Power PC and not for x86. You could try OFED-1.2, where all the precompiled RPMs are for x86_64, which should be suitable for your processor, acutally depending on the type of distribution you installed (32bit vs. 64bit). Much better is to download more up-to-date OFED-1.3 sources. The package includes an install script, which builds and installs the RPMs for you. So you don't have to "fear" to install something which is not controlled by your package management system (RPM). Regards, Rainer From kus at free.net Thu Jun 5 08:42:22 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect Message-ID: How is possible to detect, that particular AMD Barcelona CPU has - or doesn't have - known hardware error problem ? To be more exact, Rev. B2 of Opteron 2350 - is it for CPU stepping w/error or w/o error ? Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Inst. of Organic Chemistry Moscow From hahn at mcmaster.ca Thu Jun 5 08:57:28 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: References: Message-ID: > To be more exact, Rev. B2 of Opteron 2350 - is it for CPU stepping w/error or > w/o error ? AMD, like Intel, does a reasonable job of disclosing such info: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.PDF the well-known problem is erattum 298, I think, and fixed in B3. From kus at free.net Thu Jun 5 09:39:02 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: Message-ID: In message from Mark Hahn (Thu, 5 Jun 2008 11:57:28 -0400 (EDT)): >> To be more exact, Rev. B2 of Opteron 2350 - is it for CPU stepping >>w/error or >> w/o error ? > >AMD, like Intel, does a reasonable job of disclosing such info: > >http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.PDF > >the well-known problem is erattum 298, I think, and fixed in B3. Yes, this AMD errata document says that in B3 revision the error "will be fixed". I heard that new CPUs w/o TLB+L3 error are shipped now, but are this CPUs really B3 or may be have some more new release ? Mikhail From hahn at mcmaster.ca Thu Jun 5 10:30:57 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: References: Message-ID: >> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.PDF >> >> the well-known problem is erattum 298, I think, and fixed in B3. > > Yes, this AMD errata document says that in B3 revision the error "will be > fixed". I believe the absence of 'x' in the B3 column of the table on p 15 means that it _is_ fixed in B3. From kus at free.net Thu Jun 5 10:48:32 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: Message-ID: In message from Mark Hahn (Thu, 5 Jun 2008 13:30:57 -0400 (EDT)): >>> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.PDF >>> >>> the well-known problem is erattum 298, I think, and fixed in B3. >> >> Yes, this AMD errata document says that in B3 revision the error >>"will be >> fixed". > >I believe the absence of 'x' in the B3 column of the table on p 15 >means that it _is_ fixed in B3. I received just now some preliminary data about Gaussian-03 run problems w/B2 and about absence of this problems w/B3. Yours Mikhail From kus at free.net Thu Jun 5 11:09:58 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: Message-ID: In message from Mark Hahn (Thu, 5 Jun 2008 13:55:01 -0400 (EDT)): >>> I believe the absence of 'x' in the B3 column of the table on p 15 >>> means that it _is_ fixed in B3. >> >> I received just now some preliminary data about Gaussian-03 run >>problems w/B2 >> and about absence of this problems w/B3. > >I'm mystified by this: B2 was broken, so using it without the bios >workaround is just a mistake or masochism. the workaround _did_ >apparently have performance implications, but that's why B3 exists... > >do you mean you know of G03 problems on B2 systems which are operating >_with_ the workaround? I don't know exactly, but I think the crash was under absence of workaround, because I was not informed that there was some kernel patches or BIOS changes. This was interesting for me also, because I have no information how this hardware problem may be affected in the "real life". Mikhail From kus at free.net Thu Jun 5 11:22:36 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: <588c11220806051116i37ff7aa1oec16a85a24009592@mail.gmail.com> Message-ID: In message from "Jason Clinton" (Thu, 5 Jun 2008 13:16:33 -0500): >On Thu, Jun 5, 2008 at 1:09 PM, Mikhail Kuzminsky >wrote: > >> In message from Mark Hahn (Thu, 5 Jun 2008 >>13:55:01 >> -0400 (EDT)): >> >>> I'm mystified by this: B2 was broken, so using it without the bios >>> workaround is just a mistake or masochism. the workaround _did_ >>>apparently >>> have performance implications, but that's why B3 exists... >>> >>> do you mean you know of G03 problems on B2 systems which are >>>operating >>> _with_ the workaround? >>> >> >> I don't know exactly, but I think the crash was under absence of >> workaround, because I was not informed that there was some kernel >>patches or >> BIOS changes. This was interesting for me also, because I have no >> information how this hardware problem may be affected in the "real >>life". >> Mikhail >> > >The B2 BIOS work-around is to disable the L3 cache which gives you a >10-20% >performance hit with no reduction in power consumption. > >The kernel patch is very extensive and, last I heard, under NDA. AMD >has >said publicly that the patch gives you a 1-2% performance hit. This URL is old, but may give some information: https://www.x86-64.org/pipermail/discuss/2007-December/010260.html Mikhail From lindahl at pbm.com Thu Jun 5 11:30:20 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: References: Message-ID: <20080605183020.GA11661@bx9.net> On Thu, Jun 05, 2008 at 10:09:58PM +0400, Mikhail Kuzminsky wrote: > This was interesting for me also, because I > have no information how this hardware problem may be affected in the > "real life". I have 4 chips with the bug, in 2 servers. I see about 1 lockup per month with my workload, which doesn't include any VMs. (VMs are reputed to trigger the bug quickly.) I found a webpage with the details, and indeed this is what I see: | The system may experience a machine check event reporting an L3 | protocol error has occurred. In this case, the MC4 status register | (MSR 0000_0410) will be equal to B2000000_000B0C0F or | BA000000_000B0C0F. The MC4 address register (MSR 0000_0412) will be | equal to 26h.' -- greg From hahn at mcmaster.ca Thu Jun 5 11:43:05 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: <588c11220806051116i37ff7aa1oec16a85a24009592@mail.gmail.com> References: <588c11220806051116i37ff7aa1oec16a85a24009592@mail.gmail.com> Message-ID: > The kernel patch is very extensive and, last I heard, under NDA. AMD has the kernel patch was publicly distributed in dec 07. it appears to add some kernel logic to avoid the specific L3 TLB states which don't behave correctly. the bios-level workaround is different, and appears to disable the L3 TLB - I don't know whether that actually disables the L3 itself... while extremely unfortunate - so much that it clearly threatens the viability of the company, I think AMD responded reasonably. From csamuel at vpac.org Fri Jun 6 04:31:45 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: <971758047.108911212751290783.JavaMail.root@zimbra.vpac.org> Message-ID: <937389196.108931212751905042.JavaMail.root@zimbra.vpac.org> ----- "Mark Hahn" wrote: > the kernel patch was publicly distributed in dec 07. > it appears to add some kernel logic to avoid the specific > L3 TLB states which don't behave correctly. the bios-level > workaround is different, and appears to disable the L3 TLB - > I don't know whether that actually disables the L3 itself... I believe the patch re-enables the L3 cache and then works around the problem in software. When we were running B2 Barcelonas with this patch we didn't hit the errata and didn't see the performance penalty we would have expected if the L3 was disabled. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Fri Jun 6 04:33:38 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: Message-ID: <741026114.108961212752018697.JavaMail.root@zimbra.vpac.org> ----- "Mikhail Kuzminsky" wrote: > Yes, this AMD errata document says that in B3 revision the error "will > be fixed". I heard that new CPUs w/o TLB+L3 error are shipped now, > but are this CPUs really B3 or may be have some more new release ? They certainly do exist, we've got 94 nodes of them here and no longer require the kernel patch to work around the errata. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From gerry.creager at tamu.edu Fri Jun 6 08:39:47 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments Message-ID: <48495A43.4060809@tamu.edu> We recently purchased a set of hardware for a cluster from a hardware vendor. We've encountered a couple of interesting issues with bringing the thing up that I'd like to get group comments on. Note that the RFP and negotiations specified this system was for a cluster installation, so there would be no misunderstanding... 1. We specified "No OS" in the purchase so that we could install CentOS as our base. We got a set of systems with a stub OS, and an EULA for the diagnostics embedded on the disk. After clicking thru the EULA, it tells us we have no OS on the disk, but does not fail to PXE. 2. BIOS had a couple of interesting defaults, including warn on keyboard error (Keyboard? Not intentionally. This is a compute node, and should never require a keyboard. Ever.) We also find the BIOS is set to boot from hard disk THEN PXE. But due to item 1, above, we never can fail over to PXE unless we load up a keyboard and monitor, and hit F12 to drop to PXE. In discussions with our sales rep, I'm told that we'd have had to pay extra to get a real bare hard disk, and that, for a fee, they'd have been willing to custom-configure the BIOS. OK, with the BIOS this isn't too unreasonable: They have a standard BIOS for all systems and if you want something special, paying for it's the norm... But, still, this is a CLUSTER installation we were quoted, not a desktop. Also, I'm now told that "almost every customer" ordered their cluster configuration service at several kilobucks per rack. Since the team I'm working with has some degree of experience in configuring and installing hardware and software on computational clusters, now measured in at least 10 separate cluster installations, this seemed like an unnecessary expense. However, we're finding vendor gotchas that are annoying at the least, and sometimes cause significant work-around time/effort. Finally, our sales guy yesterday was somewhat baffled as to why we'd ordered without OS, and further why we were using Linux over Windows for HPC. Not trying to revive the recent rant-fest about Windows HPC capabilities, can anyone cite real HPC applications generally run on significant clusters (I'll accept Cornell's work, although I remain personally convinced that the bulk of their Windows HPC work has been dedicated to maintaining grant funding rather than doing real work)? No, I won't identify the vendor. -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From dag at sonsorol.org Fri Jun 6 09:15:24 2008 From: dag at sonsorol.org (Chris Dagdigian) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48495A43.4060809@tamu.edu> References: <48495A43.4060809@tamu.edu> Message-ID: <13F1DDCE-2FE6-4916-B1A5-995237EA80B6@sonsorol.org> Bad job hiding the (obvious) vendor ;) I'm riding the bus back home to Boston after a cluster building gig and your experience exactly matches what I encountered when I walked into the datacenter to start work on a pile of dell 1950 servers. I'll do you one better - 4 nodes out of our "homogenous" cluster had reversed drive cabling which broke our imaging system as we had specific data to place on 2 drives of differing capacity. Regards, Chris /* Sent via phone - apologies for typos & terseness */ On Jun 6, 2008, at 11:39 AM, Gerry Creager wrote: > We recently purchased a set of hardware for a cluster from a > hardware vendor. We've encountered a couple of interesting issues > with bringing the thing up that I'd like to get group comments on. > Note that the RFP and negotiations specified this system was for a > cluster installation, so there would be no misunderstanding... > > 1. We specified "No OS" in the purchase so that we could install > CentOS as our base. We got a set of systems with a stub OS, and an > EULA for the diagnostics embedded on the disk. After clicking thru > the EULA, it tells us we have no OS on the disk, but does not fail > to PXE. > > 2. BIOS had a couple of interesting defaults, including warn on > keyboard error (Keyboard? Not intentionally. This is a compute > node, and should never require a keyboard. Ever.) We also find the > BIOS is set to boot from hard disk THEN PXE. But due to item 1, > above, we never can fail over to PXE unless we load up a keyboard > and monitor, and hit F12 to drop to PXE. > > In discussions with our sales rep, I'm told that we'd have had to > pay extra to get a real bare hard disk, and that, for a fee, they'd > have been willing to custom-configure the BIOS. OK, with the BIOS > this isn't too unreasonable: They have a standard BIOS for all > systems and if you want something special, paying for it's the > norm... But, still, this is a CLUSTER installation we were quoted, > not a desktop. > > Also, I'm now told that "almost every customer" ordered their > cluster configuration service at several kilobucks per rack. Since > the team I'm working with has some degree of experience in > configuring and installing hardware and software on computational > clusters, now measured in at least 10 separate cluster > installations, this seemed like an unnecessary expense. However, > we're finding vendor gotchas that are annoying at the least, and > sometimes cause significant work-around time/effort. > > Finally, our sales guy yesterday was somewhat baffled as to why we'd > ordered without OS, and further why we were using Linux over Windows > for HPC. Not trying to revive the recent rant-fest about Windows > HPC capabilities, can anyone cite real HPC applications generally > run on significant clusters (I'll accept Cornell's work, although I > remain personally convinced that the bulk of their Windows HPC work > has been dedicated to maintaining grant funding rather than doing > real work)? > > No, I won't identify the vendor. > -- > Gerry Creager -- gerry.creager@tamu.edu > Texas Mesonet -- AATLT, Texas A&M University > Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 > Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From john.hearns at streamline-computing.com Fri Jun 6 09:43:32 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48495A43.4060809@tamu.edu> References: <48495A43.4060809@tamu.edu> Message-ID: <1212770622.9679.5.camel@Vigor13> On Fri, 2008-06-06 at 10:39 -0500, Gerry Creager wrote: > 1. We specified "No OS" in the purchase so that we could install CentOS > as our base. We got a set of systems with a stub OS, and an EULA for > the diagnostics embedded on the disk. After clicking thru the EULA, it > tells us we have no OS on the disk, but does not fail to PXE. That sounds normal to me - that's the state we get servers in. > 2. BIOS had a couple of interesting defaults, including warn on > keyboard error (Keyboard? Not intentionally. This is a compute node, > and should never require a keyboard. Ever.) We also find the BIOS is > set to boot from hard disk THEN PXE. But due to item 1, above, we never > can fail over to PXE unless we load up a keyboard and monitor, and hit > F12 to drop to PXE. The "warn on keyboard error" is a bit of a shocker - I haven't see that one for ages. But the rest sound normal, and yes getting the keyboard and monitor out is normal. We get technicians to set the BIOSes on all compute nodes prior to delivery. I just don't see why though that the BIOSes from these major vendors are ALWAYS hard disk first then PXE. A default setting t'other way around would make much more sense. From landman at scalableinformatics.com Fri Jun 6 09:45:27 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48495A43.4060809@tamu.edu> References: <48495A43.4060809@tamu.edu> Message-ID: <484969A7.4030008@scalableinformatics.com> Hi Gary A first point, before going anywhere else ... you get what you pay for ... most of the time. The vast majority of rack-n-stack vendors do what you describe. They know one thing, any deviation from that leaves them blinking with open mouths ... they deliver what they have a level of comfort delivering. Gerry Creager wrote: > We recently purchased a set of hardware for a cluster from a hardware > vendor. We've encountered a couple of interesting issues with bringing > the thing up that I'd like to get group comments on. Note that the RFP > and negotiations specified this system was for a cluster installation, > so there would be no misunderstanding... > > 1. We specified "No OS" in the purchase so that we could install CentOS > as our base. We got a set of systems with a stub OS, and an EULA for > the diagnostics embedded on the disk. After clicking thru the EULA, it > tells us we have no OS on the disk, but does not fail to PXE. I would say it is likely due to the fact that altering their base cluster construction model is a problem (e.g. costs them money). FWIW: We boot our nodes diskless during testing, and diskful if this is the required state of the cluster. Nothing like actually testing the hardware you are going to deliver in the way your customers are going to use it. This said, it seems ... unlikely ... that this was their purpose. > > 2. BIOS had a couple of interesting defaults, including warn on > keyboard error (Keyboard? Not intentionally. This is a compute node, > and should never require a keyboard. Ever.) We also find the BIOS is > set to boot from hard disk THEN PXE. But due to item 1, above, we never > can fail over to PXE unless we load up a keyboard and monitor, and hit > F12 to drop to PXE. Egad. We (by hand) reconfigure the bios specifically so that there are no issues like this. > In discussions with our sales rep, I'm told that we'd have had to pay > extra to get a real bare hard disk, and that, for a fee, they'd have Heh.... if you want nothing what is it you have to pay? :) > been willing to custom-configure the BIOS. OK, with the BIOS this isn't > too unreasonable: They have a standard BIOS for all systems and if you > want something special, paying for it's the norm... But, still, this is > a CLUSTER installation we were quoted, not a desktop. Agreed. > > Also, I'm now told that "almost every customer" ordered their cluster > configuration service at several kilobucks per rack. Since the team I'm This is standard, it costs money to rack and stack. If you don't want it, you don't have to get it. > working with has some degree of experience in configuring and installing > hardware and software on computational clusters, now measured in at > least 10 separate cluster installations, this seemed like an unnecessary > expense. However, we're finding vendor gotchas that are annoying at the Yeah, in this case, it is unnecessary. If your team has the expertise, you don't need to pay for it. > least, and sometimes cause significant work-around time/effort. For the smaller companies that do cluster setup/installs, the idea is not to mess the customer up. > > Finally, our sales guy yesterday was somewhat baffled as to why we'd > ordered without OS, and further why we were using Linux over Windows for > HPC. Not trying to revive the recent rant-fest about Windows HPC You do understand how hard (e.g. how much money is flowing from) Microsoft is pushing their solution. Money talks. > capabilities, can anyone cite real HPC applications generally run on > significant clusters (I'll accept Cornell's work, although I remain > personally convinced that the bulk of their Windows HPC work has been > dedicated to maintaining grant funding rather than doing real work)? > > No, I won't identify the vendor. :) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From john.hearns at streamline-computing.com Fri Jun 6 09:55:05 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48495A43.4060809@tamu.edu> References: <48495A43.4060809@tamu.edu> Message-ID: <1212771315.9679.15.camel@Vigor13> On Fri, 2008-06-06 at 10:39 -0500, Gerry Creager wrote: > W > Also, I'm now told that "almost every customer" ordered their cluster > configuration service at several kilobucks per rack. Since the team I'm > working with has some degree of experience in configuring and installing > hardware and software on computational clusters, now measured in at > least 10 separate cluster installations, this seemed like an unnecessary > expense. However, we're finding vendor gotchas that are annoying at the > least, and sometimes cause significant work-around time/effort. Somebody has to pay for all those technicians to set you BIOSes. Seriously, we almost always do turn-key clusters for customers. We do what we term as "hardware only" deals - but you wouldn't recognise them as such. The project I'm thinking on consisted of supplying many Intel twin servers, and us setting those BIOSes prior to delivery, racking, labelling and cabling all servers and switches. Providing a loan cluster head node and installing our cluster distribution on there for a week long soak test prior to the customer accepting it and then reinstalling with their own OS. Quite, quite far from leaving a pile of boxes on the loading dock. I don't want to go into this one too deeply, but when we do true hardware-only deals (I'm thinking more of network switches here) you end up supporting things out of goodwill anyway. From bill at cse.ucdavis.edu Fri Jun 6 10:00:29 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48495A43.4060809@tamu.edu> References: <48495A43.4060809@tamu.edu> Message-ID: <48496D2D.5080801@cse.ucdavis.edu> Gerry Creager wrote: > 1. We specified "No OS" in the purchase so that we could install CentOS > as our base. We got a set of systems with a stub OS, and an EULA for > the diagnostics embedded on the disk. After clicking thru the EULA, it > tells us we have no OS on the disk, but does not fail to PXE. If you want to avoid hooking up a KVM to each node and rebooting it once or twice I'd suggest putting "Nodes must PXE boot by default" in your specifications. > 2. BIOS had a couple of interesting defaults, including warn on > keyboard error (Keyboard? Not intentionally. This is a compute node, > and should never require a keyboard. Ever.) We also find the BIOS is > set to boot from hard disk THEN PXE. But due to item 1, above, we never > can fail over to PXE unless we load up a keyboard and monitor, and hit > F12 to drop to PXE. Very strange standard for a server, let alone a cluster node. > In discussions with our sales rep, I'm told that we'd have had to pay > extra to get a real bare hard disk, and that, for a fee, they'd have > been willing to custom-configure the BIOS. OK, with the BIOS this isn't > too unreasonable: They have a standard BIOS for all systems and if you > want something special, paying for it's the norm... But, still, this is > a CLUSTER installation we were quoted, not a desktop. This whole thing sounds strangely like the vendor has already been picked. Certainly changing any default in the pipeline can cost money, even deleting a floppy, cd/dvd etc can cost money if the machine ships to the integration center with it installed. With that said when someone charges an unreasonable amount for said customizations they lose the bid and someone else wins. > Also, I'm now told that "almost every customer" ordered their cluster > configuration service at several kilobucks per rack. Since the team I'm Not sure of the relevance here. Sounds like the upsell and padding that sales folks love, it is there job to sell equipment preferably high margin at that. Seems way high for a BIOS reset, less so if it includes a cabling harness for power, console, rails premounted, and network. Again if it's a bid process.... > working with has some degree of experience in configuring and installing > hardware and software on computational clusters, now measured in at > least 10 separate cluster installations, this seemed like an unnecessary > expense. However, we're finding vendor gotchas that are annoying at the > least, and sometimes cause significant work-around time/effort. Well there's two choice, either deal with the gotchas, or make them part of the specifications. All vendors have their differences, defaults, and cost structures. Do you want a cluster that could conceivable allow users to start submitting jobs within a day? Or do you want to play BIOS games, testing, and integration that might take a week or two. Every time I order a cluster (well over 10 now) I get vendor queries of the "Sounds like X might mean you need Y which costs $Z". I'm always very clear, it's in the spec, and not meeting the spec will mean the bid isn't considered. Definitely seems like some high margin items end up included... without the margin. > Finally, our sales guy yesterday was somewhat baffled as to why we'd > ordered without OS, and further why we were using Linux over Windows for > HPC. Heh, some sales folks seem to have a right to exert design pressure on cluster design, not sure why your even entertaining that one. If you want to be particularly friendly I'd just point at top500.org and that linux is the standard and not the exception for beowulf clusters. > No, I won't identify the vendor. How about the number of letters in their name ;-). In general I find that the big vendors build in large profits (I.e. negotiating down to 50% of list price is not unusual) and often the preferred cluster defaults often mean higher costs instead of less, despite the typically higher volume purchases, identical compute nodes, don't need a dvd, don't need an OS, don't (typically) need a redundant power supply for compute nodes, etc. The smaller cluster specific shops default (usually) to mostly reasonable cluster configurations, and seem to default to smaller margins. In my experience, writing a spec that welcomes both ends up with the best deals. Even something trivial like specifying 14 or 15 disks in a array (often the max for an external array) instead of 16 (common for direct attached) can be the different to allow a competitive bid from a big vendor. Sometimes Intel or AMD intercedes to get a design win and sometimes a big vendor decides to get more competitive. Of course these specifications directly effect costs and lead to endless discussions on this list. KVM over IP? Serial console? Any console access at all? IMPI or just switched PDUs? But in my experience things like "must boot from PXE" is not a big deal, and not worth several kilobucks. From perry at piermont.com Fri Jun 6 10:45:41 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48496D2D.5080801@cse.ucdavis.edu> (Bill Broadley's message of "Fri\, 06 Jun 2008 10\:00\:29 -0700") References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> Message-ID: <87hcc6h2yi.fsf@snark.cb.piermont.com> Bill Broadley writes: >> 2. BIOS had a couple of interesting defaults, including warn on >> keyboard error (Keyboard? Not intentionally. This is a compute >> node, and should never require a keyboard. Ever.) We also find the >> BIOS is set to boot from hard disk THEN PXE. But due to item 1, >> above, we never can fail over to PXE unless we load up a keyboard >> and monitor, and hit F12 to drop to PXE. > > Very strange standard for a server, let alone a cluster node. I would be less disturbed about such things if it was trivial to alter the BIOS settings in a semi-automated way -- say by booting some standalone program, or loading a file from a USB thumb drive. Then you could just go up to each box with a USB thumb drive, turn it on, and have it fix itself in a consistent way. However, the fact that you can't generally automate fixing BIOS settings makes all of this far more annoying. Anyone have any cool tricks for how to consistently set the BIOS on large numbers of boxes without requiring steps that humans can screw up easily? -- Perry E. Metzger perry@piermont.com From tjrc at sanger.ac.uk Fri Jun 6 11:35:25 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <87hcc6h2yi.fsf@snark.cb.piermont.com> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> Message-ID: <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> On 6 Jun 2008, at 6:45 pm, Perry E. Metzger wrote: > > Bill Broadley writes: >>> 2. BIOS had a couple of interesting defaults, including warn on >>> keyboard error (Keyboard? Not intentionally. This is a compute >>> node, and should never require a keyboard. Ever.) We also find the >>> BIOS is set to boot from hard disk THEN PXE. But due to item 1, >>> above, we never can fail over to PXE unless we load up a keyboard >>> and monitor, and hit F12 to drop to PXE. >> >> Very strange standard for a server, let alone a cluster node. > > I would be less disturbed about such things if it was trivial to alter > the BIOS settings in a semi-automated way -- say by booting some > standalone program, or loading a file from a USB thumb drive. Then you > could just go up to each box with a USB thumb drive, turn it on, and > have it fix itself in a consistent way. However, the fact that you > can't generally automate fixing BIOS settings makes all of this far > more annoying. > > Anyone have any cool tricks for how to consistently set the BIOS on > large numbers of boxes without requiring steps that humans can screw > up easily? Nope. :-) This is, in my view, one of the major disadvantages of PC clusters. The crappy old BIOS that we're stuck with. Here, we mostly get around this problem by using blade servers rather than pizza boxes. Or at least using pizza boxes which have some form of command line access to a lights-out management processor that allows us to set the boot order, such as those on HP ProLiants and Sun X**** servers. So with c-Class blades from HP, for example, I don't really have a problem - once the chassis is configured, I make them all PXE boot by ssh'ing into the Onboard administrator and typing: set server boot first pxe all poweron server all Bingo, all 16 machines PXE boot at about 1 second intervals. Job's a good'un. As Joe says, you get what you pay for. I don't think I've *ever* had to futz around with BIOS settings on any recent bladeserver (I used to have to on our old RLX bladeservers, which periodically got confused and lost all the CMOS settings, which required manual fixing in the BIOS). But the IBM and HP stuff we use now, it's very rare indeed. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From matt at technoronin.com Fri Jun 6 15:55:44 2008 From: matt at technoronin.com (Matt Lawrence) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48495A43.4060809@tamu.edu> References: <48495A43.4060809@tamu.edu> Message-ID: On Fri, 6 Jun 2008, Gerry Creager wrote: > We recently purchased a set of hardware for a cluster from a hardware vendor. > We've encountered a couple of interesting issues with bringing the thing up > that I'd like to get group comments on. Note that the RFP and negotiations > specified this system was for a cluster installation, so there would be no > misunderstanding... > > 1. We specified "No OS" in the purchase so that we could install CentOS as > our base. We got a set of systems with a stub OS, and an EULA for the > diagnostics embedded on the disk. After clicking thru the EULA, it tells us > we have no OS on the disk, but does not fail to PXE. And standing on the perforated floor tiles in front of the system performing this task all day long has left me with sore and very cold feet. The good news is that the cooling in there is working much better. Our student worker is on vacation sailing in the Carribean, so I get the abuse.... -- Matt It's not what I know that counts. It's what I can remember in time to use. From gerry.creager at tamu.edu Fri Jun 6 17:34:53 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> Message-ID: <4849D7AD.30605@tamu.edu> Tim Cutts wrote: > > On 6 Jun 2008, at 6:45 pm, Perry E. Metzger wrote: > >> >> Bill Broadley writes: >>>> 2. BIOS had a couple of interesting defaults, including warn on >>>> keyboard error (Keyboard? Not intentionally. This is a compute >>>> node, and should never require a keyboard. Ever.) We also find the >>>> BIOS is set to boot from hard disk THEN PXE. But due to item 1, >>>> above, we never can fail over to PXE unless we load up a keyboard >>>> and monitor, and hit F12 to drop to PXE. >>> >>> Very strange standard for a server, let alone a cluster node. >> >> I would be less disturbed about such things if it was trivial to alter >> the BIOS settings in a semi-automated way -- say by booting some >> standalone program, or loading a file from a USB thumb drive. Then you >> could just go up to each box with a USB thumb drive, turn it on, and >> have it fix itself in a consistent way. However, the fact that you >> can't generally automate fixing BIOS settings makes all of this far >> more annoying. >> >> Anyone have any cool tricks for how to consistently set the BIOS on >> large numbers of boxes without requiring steps that humans can screw >> up easily? > > Nope. :-) This is, in my view, one of the major disadvantages of PC > clusters. The crappy old BIOS that we're stuck with. > > Here, we mostly get around this problem by using blade servers rather > than pizza boxes. Or at least using pizza boxes which have some form of > command line access to a lights-out management processor that allows us > to set the boot order, such as those on HP ProLiants and Sun X**** servers. > > So with c-Class blades from HP, for example, I don't really have a > problem - once the chassis is configured, I make them all PXE boot by > ssh'ing into the Onboard administrator and typing: > > set server boot first pxe all > poweron server all > > Bingo, all 16 machines PXE boot at about 1 second intervals. Job's a > good'un. As Joe says, you get what you pay for. I don't think I've > *ever* had to futz around with BIOS settings on any recent bladeserver > (I used to have to on our old RLX bladeservers, which periodically got > confused and lost all the CMOS settings, which required manual fixing in > the BIOS). But the IBM and HP stuff we use now, it's very rare indeed. Yeah.... Part of the problem. The last several clusters I've worked on, we didn't have to futz with the BIOS, either. HOWEVER, it's been pointed out to me that "You get what you pay for" and part of what you pay for is the competent folks making sure such futzing isn't required. gerry -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From tjrc at sanger.ac.uk Sat Jun 7 00:14:37 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <4849D7AD.30605@tamu.edu> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> <4849D7AD.30605@tamu.edu> Message-ID: <70D27FFA-DAD1-4493-9D51-72E149FCA5EF@sanger.ac.uk> On 7 Jun 2008, at 1:34 am, Gerry Creager wrote: > Yeah.... Part of the problem. The last several clusters I've worked > on, we didn't have to futz with the BIOS, either. HOWEVER, it's > been pointed out to me that "You get what you pay for" and part of > what you pay for is the competent folks making sure such futzing > isn't required. Well, there is that. Or at least, paying for people to do the dreary futzing for you. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From gerry.creager at tamu.edu Sat Jun 7 07:49:42 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <70D27FFA-DAD1-4493-9D51-72E149FCA5EF@sanger.ac.uk> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> <4849D7AD.30605@tamu.edu> <70D27FFA-DAD1-4493-9D51-72E149FCA5EF@sanger.ac.uk> Message-ID: <484AA006.9070009@tamu.edu> And done here. Tim Cutts wrote: > > On 7 Jun 2008, at 1:34 am, Gerry Creager wrote: > >> Yeah.... Part of the problem. The last several clusters I've worked >> on, we didn't have to futz with the BIOS, either. HOWEVER, it's been >> pointed out to me that "You get what you pay for" and part of what you >> pay for is the competent folks making sure such futzing isn't required. > > Well, there is that. Or at least, paying for people to do the dreary > futzing for you. > > Tim > > -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From gerry.creager at tamu.edu Sat Jun 7 07:50:44 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <484AA006.9070009@tamu.edu> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> <4849D7AD.30605@tamu.edu> <70D27FFA-DAD1-4493-9D51-72E149FCA5EF@sanger.ac.uk> <484AA006.9070009@tamu.edu> Message-ID: <484AA044.1040107@tamu.edu> Sorry about that. Wrong message when I hit "reply all". Time for more coffee. gerry Gerry Creager wrote: > And done here. > > Tim Cutts wrote: >> >> On 7 Jun 2008, at 1:34 am, Gerry Creager wrote: >> >>> Yeah.... Part of the problem. The last several clusters I've worked >>> on, we didn't have to futz with the BIOS, either. HOWEVER, it's been >>> pointed out to me that "You get what you pay for" and part of what >>> you pay for is the competent folks making sure such futzing isn't >>> required. >> >> Well, there is that. Or at least, paying for people to do the dreary >> futzing for you. >> >> Tim >> >> > -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From csamuel at vpac.org Sun Jun 8 17:09:15 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <87hcc6h2yi.fsf@snark.cb.piermont.com> Message-ID: <1110348284.110351212970155947.JavaMail.root@zimbra.vpac.org> ----- "Perry E. Metzger" wrote: > I would be less disturbed about such things if it was > trivial to alter the BIOS settings in a semi-automated > way -- say by booting some standalone program, or loading > a file from a USB thumb drive. Our most recent vendor went to the motherboard manufacturer and said "please can you cut us a BIOS with these default settings" and they did so. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From tjrc at sanger.ac.uk Sun Jun 8 21:05:02 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <1110348284.110351212970155947.JavaMail.root@zimbra.vpac.org> References: <1110348284.110351212970155947.JavaMail.root@zimbra.vpac.org> Message-ID: <69E48321-05BE-4DA1-B5CC-A92A5DF24F56@sanger.ac.uk> On 9 Jun 2008, at 1:09 am, Chris Samuel wrote: > > ----- "Perry E. Metzger" wrote: > >> I would be less disturbed about such things if it was >> trivial to alter the BIOS settings in a semi-automated >> way -- say by booting some standalone program, or loading >> a file from a USB thumb drive. > > Our most recent vendor went to the motherboard manufacturer > and said "please can you cut us a BIOS with these default > settings" and they did so. If you don't mind us asking, roughly how much extra did *that* cost? Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From csamuel at vpac.org Mon Jun 9 00:28:46 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <1237440751.110541212996492356.JavaMail.root@zimbra.vpac.org> Message-ID: <992702936.110561212996526583.JavaMail.root@zimbra.vpac.org> ----- "Tim Cutts" wrote: > On 9 Jun 2008, at 1:09 am, Chris Samuel wrote: > > > Our most recent vendor went to the motherboard manufacturer > > and said "please can you cut us a BIOS with these default > > settings" and they did so. > > If you don't mind us asking, roughly how much extra did *that* cost? Nothing. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From tjrc at sanger.ac.uk Mon Jun 9 01:43:51 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <992702936.110561212996526583.JavaMail.root@zimbra.vpac.org> References: <992702936.110561212996526583.JavaMail.root@zimbra.vpac.org> Message-ID: <484CED47.5050804@sanger.ac.uk> Chris Samuel wrote: > ----- "Tim Cutts" wrote: > > >> On 9 Jun 2008, at 1:09 am, Chris Samuel wrote: >> >> >>> Our most recent vendor went to the motherboard manufacturer >>> and said "please can you cut us a BIOS with these default >>> settings" and they did so. >>> >> If you don't mind us asking, roughly how much extra did *that* cost? >> > > Nothing. > Wow. How many nodes were you buying? And are we allowed to know who the vendor was? Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From csamuel at vpac.org Mon Jun 9 03:30:44 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <222401034.110641213007329196.JavaMail.root@zimbra.vpac.org> Message-ID: <767212096.110661213007444346.JavaMail.root@zimbra.vpac.org> ----- "Tim Cutts" wrote: > Wow. How many nodes were you buying? 95 nodes, each with two Barcelonas, so 760 cores all up. 32GB RAM (4GB/core) and 4x300GB SATA drives (RAID-0) per node. > And are we allowed to know who the vendor was? It's all public, so no reason why not. It was a local Melbourne mob called Xenon Systems, they sell SuperMicro based systems. Kudos to both of them for their support. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From apittman at concurrent-thinking.com Mon Jun 9 03:49:07 2008 From: apittman at concurrent-thinking.com (Ashley Pittman) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <48495A43.4060809@tamu.edu> References: <48495A43.4060809@tamu.edu> Message-ID: <1213008547.8064.9.camel@bruce.priv.wark.uk.streamline-computing.com> On Fri, 2008-06-06 at 10:39 -0500, Gerry Creager wrote: > > 2. BIOS had a couple of interesting defaults, including warn on > keyboard error (Keyboard? Not intentionally. This is a compute > node, > and should never require a keyboard. Ever.) We also find the BIOS > is > set to boot from hard disk THEN PXE. But due to item 1, above, we > never > can fail over to PXE unless we load up a keyboard and monitor, and > hit > F12 to drop to PXE. I can think of at least one cluster where the opposite has been true and PXE boot has been the default. The problem with this is if the head node PXE boots on the customers network and gets automatically re-installed as a windows workstation everybody gets egg on their face. Yes even "modern" BIOSes are bad but localboot first is a sensible default. Ashley Pittman. From matt at technoronin.com Mon Jun 9 06:11:53 2008 From: matt at technoronin.com (Matt Lawrence) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <1213008547.8064.9.camel@bruce.priv.wark.uk.streamline-computing.com> References: <48495A43.4060809@tamu.edu> <1213008547.8064.9.camel@bruce.priv.wark.uk.streamline-computing.com> Message-ID: On Mon, 9 Jun 2008, Ashley Pittman wrote: > I can think of at least one cluster where the opposite has been true and > PXE boot has been the default. The problem with this is if the head > node PXE boots on the customers network and gets automatically > re-installed as a windows workstation everybody gets egg on their face. > Yes even "modern" BIOSes are bad but localboot first is a sensible > default. I will have to disagree. Changing the BIOS settings in a single head node is preferable to having to connect to 126 compute nodes and change their BIOS settings. -- Matt It's not what I know that counts. It's what I can remember in time to use. From prentice at ias.edu Mon Jun 9 08:41:29 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] User resource limits Message-ID: <484D4F29.9090704@ias.edu> This topic is slightly off topic, since it's not a beowulf specific problem, but it is HPC-related: I have several fat servers with 4 cores and 32 GB of RAM, for jobs that aren't very parallel and need large amounts of RAM. They are not clustered in any way. At the moment, users ssh into these systems to run large jobs. Eventually, I will have these nodes managed by a queuing system. The problem: Every couple of days, one of these systems become unresponsive due to OOM errors. If we wait long enough, the offending job will complete, and everything will return to normal. Since these are multi-user shared resources, I don't have the luxury of waiting for the systems to clear themselves up, and I often have to hit the power button. I would like to impose some CPU and memory limits on users that are hard limits that can't be changed/overridden by the users. What is the best way to do this? All I know is environment variables or shell commands done as the user (ulimit, for example). -- Prentice From dnlombar at ichips.intel.com Mon Jun 9 09:12:32 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] User resource limits In-Reply-To: <484D4F29.9090704@ias.edu> References: <484D4F29.9090704@ias.edu> Message-ID: <20080609161232.GB11155@nlxdcldnl2.cl.intel.com> On Mon, Jun 09, 2008 at 11:41:29AM -0400, Prentice Bisbal wrote: > > I would like to impose some CPU and memory limits on users that are hard > limits that can't be changed/overridden by the users. What is the best > way to do this? All I know is environment variables or shell commands > done as the user (ulimit, for example). pam_limits and /etc/security/limits.conf -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From perry at piermont.com Mon Jun 9 09:53:41 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] User resource limits In-Reply-To: <484D4F29.9090704@ias.edu> (Prentice Bisbal's message of "Mon\, 09 Jun 2008 11\:41\:29 -0400") References: <484D4F29.9090704@ias.edu> Message-ID: <87od6av9be.fsf@snark.cb.piermont.com> Prentice Bisbal writes: > I would like to impose some CPU and memory limits on users that are hard > limits that can't be changed/overridden by the users. What is the best > way to do this? All I know is environment variables or shell commands > done as the user (ulimit, for example). ulimit is not quite "a command done by the user". The user manipulates their ulimits with the shell ulimit command, but the limits are in fact maintained by the kernel, and can be set by the administrator at maximum levels that the user cannot reduce. Read the man page for getrlimit/setrlimit for details on this. ulimits are inherited by a process from its parents, so if the process used at login (like sshd) sets them appropriately, the limits are inherited by the whole session. The administrator can set default ulimit ceilings in various login configuration files -- the file that is used depends on the specific OS you are running. If you say what OS and/or distro you are using, I can be of more specific help. Perry From perry at piermont.com Mon Jun 9 09:55:51 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] User resource limits In-Reply-To: <20080609161232.GB11155@nlxdcldnl2.cl.intel.com> (David N. Lombard's message of "Mon\, 9 Jun 2008 09\:12\:32 -0700") References: <484D4F29.9090704@ias.edu> <20080609161232.GB11155@nlxdcldnl2.cl.intel.com> Message-ID: <87k5gyv97s.fsf@snark.cb.piermont.com> "Lombard, David N" writes: > On Mon, Jun 09, 2008 at 11:41:29AM -0400, Prentice Bisbal wrote: >> >> I would like to impose some CPU and memory limits on users that are hard >> limits that can't be changed/overridden by the users. What is the best >> way to do this? All I know is environment variables or shell commands >> done as the user (ulimit, for example). > > pam_limits and /etc/security/limits.conf You're making assumptions about what OS he's running. He didn't say which flavor of Unix this is. We can only assume it is some POSIX OS because he mentions the ulimit command. Indeed, not even all Linuxes use that file, though many do. Perry From prentice at ias.edu Mon Jun 9 10:38:08 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] User resource limits In-Reply-To: <87k5gyv97s.fsf@snark.cb.piermont.com> References: <484D4F29.9090704@ias.edu> <20080609161232.GB11155@nlxdcldnl2.cl.intel.com> <87k5gyv97s.fsf@snark.cb.piermont.com> Message-ID: <484D6A80.20803@ias.edu> Perry E. Metzger wrote: > "Lombard, David N" writes: >> On Mon, Jun 09, 2008 at 11:41:29AM -0400, Prentice Bisbal wrote: >>> I would like to impose some CPU and memory limits on users that are hard >>> limits that can't be changed/overridden by the users. What is the best >>> way to do this? All I know is environment variables or shell commands >>> done as the user (ulimit, for example). >> pam_limits and /etc/security/limits.conf > > You're making assumptions about what OS he's running. He didn't say > which flavor of Unix this is. We can only assume it is some POSIX OS > because he mentions the ulimit command. Indeed, not even all Linuxes > use that file, though many do. > Yeah, my mistake - I forgot to include that important piece of data. My apologies. I'm running PU_IAS Linux 5.1. PU_IAS is a rebuild of RHEL, so anything that applies to RHEL applies to PU_IAS. http://plug.princeton.edu/linux/ I think David was assuming I was running Linux, and he was correct. thanks for your help. I have to go read some man pages now. Prentice From kus at free.net Mon Jun 9 15:01:40 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] size of swap partition Message-ID: A lot of time ago it was formulated simple rule for swap partition size (equal to main memory size). Currently we all have relative large RAM on the nodes (typically, I beleive, it is 2 or more GB per core; we have 16 GB per dual-socket quad-core Opteron node). What is typical modern swap size today? I understand that it depends from applications ;-) We, in particular, practically don't have jobs which run "out-of-RAM". For single core dual-socket Opteron nodes w/4GB RAM per node and "molecular modelling workload" we used 4 GB swap partition. But what are the reccomendations of modern praxis ? Mikhail Kuzminksy Computer Assistance to Chemical Research Center Zelinsky Inst. of Organic Chemistry Moscow From gerry.creager at tamu.edu Mon Jun 9 15:51:34 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] size of swap partition In-Reply-To: References: Message-ID: <484DB3F6.7010904@tamu.edu> Misha, We have the potential to have to swap whole jobs out of memory on a complete node. As a result, I recommend 1.5-2.0 times memory in swap if this is a consideration. I do know there's likely to be a bit of discussion as this varies widely from site to site and based on requirements. gerry Mikhail Kuzminsky wrote: > A lot of time ago it was formulated simple rule for swap partition size > (equal to main memory size). > > Currently we all have relative large RAM on the nodes (typically, I > beleive, it is 2 or more GB per core; we have 16 GB per dual-socket > quad-core Opteron node). What is typical modern swap size today? > > I understand that it depends from applications ;-) We, in particular, > practically don't have jobs which run "out-of-RAM". For single core > dual-socket Opteron nodes w/4GB RAM per node and "molecular modelling > workload" we used 4 GB swap partition. > > But what are the reccomendations of modern praxis ? > > Mikhail Kuzminksy > Computer Assistance to Chemical Research Center > Zelinsky Inst. of Organic Chemistry > Moscow _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From kyron at neuralbs.com Mon Jun 9 18:28:28 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] size of swap partition In-Reply-To: <484DB3F6.7010904@tamu.edu> References: <484DB3F6.7010904@tamu.edu> Message-ID: <484DD8BC.10208@neuralbs.com> Mikhail, Somewhat like Gerry said, ballpark figures have always been an arbitrary 1.5*RAM. This is completely ridiculous nowadays and should depend entirely on the applications you run. Typically, you should never swap out memory on a running application. I recommend you perform some metrics collection, doesn't have to be perfect and super-fine-grained. Something like Ganglia should be sufficient to give you an idea of how much swap you need, if ever you actually hit it...but don't! Eric PS: this is a redundant topic on the list ...do a little searching and you'll hit it ;) Gerry Creager wrote: > Misha, > > We have the potential to have to swap whole jobs out of memory on a > complete node. As a result, I recommend 1.5-2.0 times memory in swap > if this is a consideration. I do know there's likely to be a bit of > discussion as this varies widely from site to site and based on > requirements. > > gerry > > Mikhail Kuzminsky wrote: >> A lot of time ago it was formulated simple rule for swap partition size >> (equal to main memory size). >> >> Currently we all have relative large RAM on the nodes (typically, I >> beleive, it is 2 or more GB per core; we have 16 GB per dual-socket >> quad-core Opteron node). What is typical modern swap size today? >> >> I understand that it depends from applications ;-) We, in particular, >> practically don't have jobs which run "out-of-RAM". For single core >> dual-socket Opteron nodes w/4GB RAM per node and "molecular modelling >> workload" we used 4 GB swap partition. >> >> But what are the reccomendations of modern praxis ? >> >> Mikhail Kuzminksy >> Computer Assistance to Chemical Research Center >> Zelinsky Inst. of Organic Chemistry >> Moscow _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > From csamuel at vpac.org Mon Jun 9 20:56:28 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] User resource limits In-Reply-To: <565153331.116881213069858428.JavaMail.root@zimbra.vpac.org> Message-ID: <742070125.116971213070188917.JavaMail.root@zimbra.vpac.org> ----- "Prentice Bisbal" wrote: > I think David was assuming I was running Linux, and he was correct. > thanks for your help. I have to go read some man pages now. Be very aware that there are two different ulimits that affect memory allocations, *depending on the size of the allocation that is asked for*, if you have glibc 2.3 or newer (so most distros still in use). For allocations < 128KB the standard memory limit is applied as it uses brk(), but for allocations greater than that it uses mmap(). Unfortunately the kernel implementation of mmap() doesn't check the maximum memory size (RLIMIT_RSS) or maximum data size (RLIMIT_DATA) limits which were being set, but only the maximum virtual RAM size (RLIMIT_AS) - this is documented in the setrlimit(2) man page. :-( -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From hahn at mcmaster.ca Mon Jun 9 21:58:12 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] size of swap partition In-Reply-To: <484DB3F6.7010904@tamu.edu> References: <484DB3F6.7010904@tamu.edu> Message-ID: > We have the potential to have to swap whole jobs out of memory on a complete > node. that was our intent as well. among other things, this scheme enables running the cluster "split-personality" - mostly shorter/smaller even interactive jobs during the day, with big/long jobs running at night. unfortunately, you need a smart scheduler to do this, and ours is dumb. >> beleive, it is 2 or more GB per core; we have 16 GB per dual-socket >> quad-core Opteron node). What is typical modern swap size today? are you willing to use a node which is actually occupying 16 GB of swap? it is possible to tune how the kernel responds to memory crunches - for instance, you can always avoid OOM with the vm.overcommit_memory=2 sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap to get the desired limits.) in this mode, the kernel tracks how much VM it actually needs (worst-case, reflected in Committed_AS in /proc/meminfo) and compares that to a commit limit that reflects ram and swap. if you don't use overcommit_memory=2, you are basically borrowing VM space in hopes of not needing it. that can still be reasonable, considering how often processes have a lot of shared VM, and how many processes allocate but never touch lots of pages. but you have to ask yourself: would I like a system that was actually _using_ 16 GB of swap? if you have 16x disks, perhaps, but 16G will suck if you only have 1 disk. at least for overcommit_memory != 2, I don't see the point of configuring a lot of swap, since the only time you'd use it is if you were thrashing. sort of a "quality of life" argument. >> But what are the reccomendations of modern praxis ? it depends a lot on the size variance of your jobs, as well as their real/virtual ratio. the kernel only enforces RLIMIT_AS (vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did RLIMIT_RSS or not. if you use overcommit_memory=2, your desired max VM size determines the amount of swap. otherwise, go with something modest - memory size or so. but given that the smallest reasonable single disk these days is probably about 320GB, it's hard to justify being _too_ tight. From jclinton at advancedclustering.com Thu Jun 5 08:38:42 2008 From: jclinton at advancedclustering.com (Jason Clinton) Date: Thu Mar 18 01:07:15 2010 Subject: [Beowulf] OFED/IB for FC8 In-Reply-To: <4847B410.1070202@lfbs.rwth-aachen.de> References: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> <20080605001549.GE27430@bx9.net> <4847B410.1070202@lfbs.rwth-aachen.de> Message-ID: <588c11220806050838t5d2ede3fh3e2e3880869ba4df@mail.gmail.com> On Thu, Jun 5, 2008 at 4:38 AM, Rainer Finocchiaro < rainer@lfbs.rwth-aachen.de> wrote: > Hi Michael, > > Greg Lindahl schrieb: > >> All the OFED rpm's for FC6 installed on FC8 without difficulty, except for >>> opensm-3.0.3-0.ppc64.rpm >>> >> >> This is the cause of most of your subsequent problems. Without an SM >> running somewhere on your network, the links don't come fully up. >> ... > > > ... > Following your link, I reach a download directory offering only ppc64-RPMs; > in fact all precompiled RPMs for OFED-1.2.5 are for Power PC and not for > x86. > > .. > Much better is to download more up-to-date OFED-1.3 sources. The package > includes an install script, which builds and installs the RPMs for you. So > you don't have to "fear" to install something which is not controlled by > your package management system (RPM). As a side note, you've probably gotten yourself in to an unrecoverable state with RPM having already installed all those PPC RPM's on your Fedora 8 x86_64 systems. The easiest thing to do is probably reinstall but if you want, you can try removing them all with something like this: cd /path/to/downloaded/RPMS ls | grep -oP .+?\(\?=.x86_64\\.rpm\) | xargs rpm -e The command will extract the names of the RPM's known by RPM that you installed and then ask RPM to remove them. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080605/7bd7a474/attachment.html From jclinton at advancedclustering.com Thu Jun 5 08:41:14 2008 From: jclinton at advancedclustering.com (Jason Clinton) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] OFED/IB for FC8 In-Reply-To: <588c11220806050838t5d2ede3fh3e2e3880869ba4df@mail.gmail.com> References: <6.2.5.6.2.20080604150239.047ad270@NumerEx-LLC.com> <20080605001549.GE27430@bx9.net> <4847B410.1070202@lfbs.rwth-aachen.de> <588c11220806050838t5d2ede3fh3e2e3880869ba4df@mail.gmail.com> Message-ID: <588c11220806050841r743b0246rea9f940e5c8c7753@mail.gmail.com> On Thu, Jun 5, 2008 at 10:38 AM, Jason Clinton < jclinton@advancedclustering.com> wrote: > ls | grep -oP .+?\(\?=.x86_64\\.rpm\) | xargs rpm -e > Of course, replace "x86_64" with "ppc64" if indeed that is what you installed. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080605/e3be0b3d/attachment.html From jclinton at advancedclustering.com Thu Jun 5 10:46:54 2008 From: jclinton at advancedclustering.com (Jason Clinton) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: References: Message-ID: <588c11220806051046t3ad48a02q449a3ede0e299884@mail.gmail.com> On Thu, Jun 5, 2008 at 11:39 AM, Mikhail Kuzminsky wrote: > In message from Mark Hahn (Thu, 5 Jun 2008 11:57:28 > -0400 (EDT)): > >> To be more exact, Rev. B2 of Opteron 2350 - is it for CPU stepping w/error >>> or w/o error ? >>> >> >> AMD, like Intel, does a reasonable job of disclosing such info: >> >> >> http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/41322.PDF >> >> the well-known problem is erattum 298, I think, and fixed in B3. >> > > Yes, this AMD errata document says that in B3 revision the error "will be > fixed". I heard that new CPUs w/o TLB+L3 error are shipped now, > but are this CPUs really B3 or may be have some more new release ? Yes, what are currently shipping from AMD are B3 revision processors. The TLB-look-aside problem is fixed. There are other less-critical problems with B3, however. Specifically, power-related compatibility issues with various motherboards due to (according to the motherboard manufacturers) AMD changing the TDP late in the release process. I can't give any specific names or models that we know have problems, however. I can say that everyone involved is working on a resolution--usually through PCB revisions of the motherboards. A number of 1U power supplies that have previously worked with all Intel and AMD solutions are now insufficient, as well, due to 12V limitations. B3 pulls a *lot* of power. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080605/fe201ac3/attachment.html From jclinton at advancedclustering.com Thu Jun 5 11:16:33 2008 From: jclinton at advancedclustering.com (Jason Clinton) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: References: Message-ID: <588c11220806051116i37ff7aa1oec16a85a24009592@mail.gmail.com> On Thu, Jun 5, 2008 at 1:09 PM, Mikhail Kuzminsky wrote: > In message from Mark Hahn (Thu, 5 Jun 2008 13:55:01 > -0400 (EDT)): > >> I'm mystified by this: B2 was broken, so using it without the bios >> workaround is just a mistake or masochism. the workaround _did_ apparently >> have performance implications, but that's why B3 exists... >> >> do you mean you know of G03 problems on B2 systems which are operating >> _with_ the workaround? >> > > I don't know exactly, but I think the crash was under absence of > workaround, because I was not informed that there was some kernel patches or > BIOS changes. This was interesting for me also, because I have no > information how this hardware problem may be affected in the "real life". > Mikhail > The B2 BIOS work-around is to disable the L3 cache which gives you a 10-20% performance hit with no reduction in power consumption. The kernel patch is very extensive and, last I heard, under NDA. AMD has said publicly that the patch gives you a 1-2% performance hit. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080605/34e021ab/attachment.html From malallen at indiana.edu Fri Jun 6 11:10:43 2008 From: malallen at indiana.edu (Matt Allen) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <87hcc6h2yi.fsf@snark.cb.piermont.com> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> Message-ID: <17D5930B-CC49-49CF-A33C-6E7B01401FC1@indiana.edu> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > cool tricks to consistently set the BIOS We had a cluster of systems that supported configuring the BIOS from an image on a bootable floppy. I bought 96 3.5" floppy disks, put one in each node, and then used parallel scp to dd the desired image to each node's floppy from an NFS mount. Then I power-cycled them simultaneously, and listened to the sound of 96 floppy disks being read at the same time (more or less). I'm sure I'll never hear that sound again in my life. I'm not sure how relevant or cool that was (and it did take a few minutes to eject all those disks afterwards), but it took less time than rebooting each node, for sure, and I had a desk full of spare floppy disks for two or three years after that. Matt - -- 812.855.7318 voice Research Technologies - High-Performance Systems hps-admin@iu.edu - http://rtinfo.uits.indiana.edu/hps/ On Jun 6, 2008, at 1:45 PM, Perry E. Metzger wrote: > > Bill Broadley writes: >>> 2. BIOS had a couple of interesting defaults, including warn on >>> keyboard error (Keyboard? Not intentionally. This is a compute >>> node, and should never require a keyboard. Ever.) We also find the >>> BIOS is set to boot from hard disk THEN PXE. But due to item 1, >>> above, we never can fail over to PXE unless we load up a keyboard >>> and monitor, and hit F12 to drop to PXE. >> >> Very strange standard for a server, let alone a cluster node. > > I would be less disturbed about such things if it was trivial to alter > the BIOS settings in a semi-automated way -- say by booting some > standalone program, or loading a file from a USB thumb drive. Then you > could just go up to each box with a USB thumb drive, turn it on, and > have it fix itself in a consistent way. However, the fact that you > can't generally automate fixing BIOS settings makes all of this far > more annoying. > > Anyone have any cool tricks for how to consistently set the BIOS on > large numbers of boxes without requiring steps that humans can screw > up easily? > > -- > Perry E. Metzger perry@piermont.com > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) iEYEARECAAYFAkhJfaMACgkQsHrhTcWK+IZ2GwCeOYae5FD3OrApTAJ3U2hPXfip BtEAnA9Ub3kkoKbFtNOcJgl7vHAi3KO2 =qlG4 -----END PGP SIGNATURE----- From bari at onelabs.com Fri Jun 6 12:14:28 2008 From: bari at onelabs.com (bari) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> <9FD5E5B6-BCAF-4C00-A02B-A3678E46C77D@sanger.ac.uk> Message-ID: <48498C94.1010608@onelabs.com> Tim Cutts wrote: > > Nope. :-) This is, in my view, one of the major disadvantages of PC > clusters. The crappy old BIOS that we're stuck with. > Just out of curiosity beside the clusters at LANL and Sandia who here uses coreboot (LinuxBIOS) for BIOS? http://www.coreboot.org If not, why not? Lack of vendor support? -Bari From spambox at emboss.co.nz Sat Jun 7 12:03:07 2008 From: spambox at emboss.co.nz (Michael Brown) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <87hcc6h2yi.fsf@snark.cb.piermont.com> References: <48495A43.4060809@tamu.edu> <48496D2D.5080801@cse.ucdavis.edu> <87hcc6h2yi.fsf@snark.cb.piermont.com> Message-ID: <5ECDF0CF16C448CCA570A7C5DCBDBA71@Forethought> Perry E. Metzger wrote: > Anyone have any cool tricks for how to consistently set the BIOS on > large numbers of boxes without requiring steps that humans can screw > up easily? Get a USB stick that boots into Linux. Set up one machine the way you want, then boot it up using the USB stick. Do: dd if=/dev/nvram of=cmos.bin For each oth the other machines, boot them using the stick and do: dd if=cmos.bin of=/dev/nvram From johnh at streamline-computing.com Mon Jun 9 06:55:17 2008 From: johnh at streamline-computing.com (johnh@streamline-computing.com) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <1213008547.8064.9.camel@bruce.priv.wark.uk.streamline-computing.com> References: <1213008547.8064.9.camel@bruce.priv.wark.uk.streamline-computing.com> Message-ID: <6c91fc1fbdbca86c512bb52ae3cfab4f@87.127.209.200> > On Fri, 2008-06-06 at 10:39 -0500, Gerry Creager wrote: >> > > I can think of at least one cluster where the opposite has been true and > PXE boot has been the default. The problem with this is if the head > node PXE boots on the customers network and gets automatically > re-installed as a windows workstation everybody gets egg on their face. > Yes even "modern" BIOSes are bad but localboot first is a sensible > default. Our clusters are set such that all compute nodes PXE boot first then localboot. All head nodes should have the BIOS set to locaboot first. From maurice at harddata.com Mon Jun 9 13:03:58 2008 From: maurice at harddata.com (Maurice Hilarius) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <200806091657.m59Gum9n021891@bluewest.scyld.com> References: <200806091657.m59Gum9n021891@bluewest.scyld.com> Message-ID: <484D8CAE.8000008@harddata.com> Chris Samuel wrote: > > Our most recent vendor went to the motherboard manufacturer > and said "please can you cut us a BIOS with these default > settings" and they did so. > > cheers, > Chris Some manufacturers do, some do not. Asus , for example, do, for their OEM customers. OTOH, one may buy the BIOS customization software , from AMI, for example, for a support/licensing fee of about $10,000 per year. -- With our best regards, //Maurice W. Hilarius Telephone: 01-780-456-9771/ /Hard Data Ltd. FAX: 01-780-456-9772/ /11060 - 166 Avenue email:maurice@harddata.com/ /Edmonton, AB, Canada http://www.harddata.com// / T5X 1Y3/ / -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080609/858c4f5e/attachment.html From mark.kosmowski at gmail.com Tue Jun 10 06:44:29 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] size of swap partition Message-ID: > Message: 5 > Date: Tue, 10 Jun 2008 00:58:12 -0400 (EDT) > From: Mark Hahn > Subject: Re: [Beowulf] size of swap partition > To: Gerry Creager > Cc: Mikhail Kuzminsky , beowulf@beowulf.org > Message-ID: > > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > > We have the potential to have to swap whole jobs out of memory on a complete > > node. > > that was our intent as well. among other things, this scheme enables > running the cluster "split-personality" - mostly shorter/smaller even > interactive jobs during the day, with big/long jobs running at night. > unfortunately, you need a smart scheduler to do this, and ours is dumb. > > >> beleive, it is 2 or more GB per core; we have 16 GB per dual-socket > >> quad-core Opteron node). What is typical modern swap size today? > > are you willing to use a node which is actually occupying 16 GB of swap? > > it is possible to tune how the kernel responds to memory crunches - > for instance, you can always avoid OOM with the vm.overcommit_memory=2 > sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap > to get the desired limits.) in this mode, the kernel tracks how much VM > it actually needs (worst-case, reflected in Committed_AS in /proc/meminfo) > and compares that to a commit limit that reflects ram and swap. > > if you don't use overcommit_memory=2, you are basically borrowing VM > space in hopes of not needing it. that can still be reasonable, considering > how often processes have a lot of shared VM, and how many processes > allocate but never touch lots of pages. but you have to ask yourself: > would I like a system that was actually _using_ 16 GB of swap? if you > have 16x disks, perhaps, but 16G will suck if you only have 1 disk. > at least for overcommit_memory != 2, I don't see the point of configuring > a lot of swap, since the only time you'd use it is if you were thrashing. > sort of a "quality of life" argument. > > >> But what are the reccomendations of modern praxis ? > > it depends a lot on the size variance of your jobs, as well as > their real/virtual ratio. the kernel only enforces RLIMIT_AS > (vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did > RLIMIT_RSS or not. > > if you use overcommit_memory=2, your desired max VM size determines > the amount of swap. otherwise, go with something modest - memory size > or so. but given that the smallest reasonable single disk these days > is probably about 320GB, it's hard to justify being _too_ tight. Is anyone using those Gigabyte i-RAM type devices from swap? Or is RAM cheaper? What about using these devices as swap to "add RAM" to older equipment that is at the maximum mobo supported RAM limit? From walid.shaari at gmail.com Tue Jun 10 09:27:43 2008 From: walid.shaari at gmail.com (Walid) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] size of swap partition In-Reply-To: References: Message-ID: Hi, For an 8GB dual socket quad core node, choosing in the kick start file --recommended instead of specifying size RHEL5 allocates 1GB of memory. our developers say that they should not swap as this will cause an overhead, and they try to avoid it as much as possible regards Walid On 10/06/2008, Mark Kosmowski wrote: >> Message: 5 >> Date: Tue, 10 Jun 2008 00:58:12 -0400 (EDT) >> From: Mark Hahn >> Subject: Re: [Beowulf] size of swap partition >> To: Gerry Creager >> Cc: Mikhail Kuzminsky , beowulf@beowulf.org >> Message-ID: >> >> Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed >> >> > We have the potential to have to swap whole jobs out of memory on a >> > complete >> > node. >> >> that was our intent as well. among other things, this scheme enables >> running the cluster "split-personality" - mostly shorter/smaller even >> interactive jobs during the day, with big/long jobs running at night. >> unfortunately, you need a smart scheduler to do this, and ours is dumb. >> >> >> beleive, it is 2 or more GB per core; we have 16 GB per dual-socket >> >> quad-core Opteron node). What is typical modern swap size today? >> >> are you willing to use a node which is actually occupying 16 GB of swap? >> >> it is possible to tune how the kernel responds to memory crunches - >> for instance, you can always avoid OOM with the vm.overcommit_memory=2 >> sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap >> to get the desired limits.) in this mode, the kernel tracks how much VM >> it actually needs (worst-case, reflected in Committed_AS in /proc/meminfo) >> and compares that to a commit limit that reflects ram and swap. >> >> if you don't use overcommit_memory=2, you are basically borrowing VM >> space in hopes of not needing it. that can still be reasonable, >> considering >> how often processes have a lot of shared VM, and how many processes >> allocate but never touch lots of pages. but you have to ask yourself: >> would I like a system that was actually _using_ 16 GB of swap? if you >> have 16x disks, perhaps, but 16G will suck if you only have 1 disk. >> at least for overcommit_memory != 2, I don't see the point of configuring >> a lot of swap, since the only time you'd use it is if you were thrashing. >> sort of a "quality of life" argument. >> >> >> But what are the reccomendations of modern praxis ? >> >> it depends a lot on the size variance of your jobs, as well as >> their real/virtual ratio. the kernel only enforces RLIMIT_AS >> (vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did >> RLIMIT_RSS or not. >> >> if you use overcommit_memory=2, your desired max VM size determines >> the amount of swap. otherwise, go with something modest - memory size >> or so. but given that the smallest reasonable single disk these days >> is probably about 320GB, it's hard to justify being _too_ tight. > > Is anyone using those Gigabyte i-RAM type devices from swap? Or is > RAM cheaper? What about using these devices as swap to "add RAM" to > older equipment that is at the maximum mobo supported RAM limit? > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From kus at free.net Tue Jun 10 10:35:46 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] size of swap partition In-Reply-To: Message-ID: In message from Mark Hahn (Tue, 10 Jun 2008 00:58:12 -0400 (EDT)): ... >for instance, you can always avoid OOM with the vm.overcommit_memory=2 >sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap >to get the desired limits.) in this mode, the kernel tracks how much >VM >it actually needs (worst-case, reflected in Committed_AS in >/proc/meminfo) >and compares that to a commit limit that reflects ram and swap. > >if you don't use overcommit_memory=2, you are basically borrowing VM >space in hopes of not needing it. that can still be reasonable, >considering >how often processes have a lot of shared VM, and how many processes >allocate but never touch lots of pages. but you have to ask yourself: >would I like a system that was actually _using_ 16 GB of swap? if you >have 16x disks, perhaps, but 16G will suck if you only have 1 disk. >at least for overcommit_memory != 2, I don't see the point of >configuring >a lot of swap, since the only time you'd use it is if you were >thrashing. >sort of a "quality of life" argument. > >>> But what are the reccomendations of modern praxis ? > >it depends a lot on the size variance of your jobs, as well as their >real/virtual ratio. the kernel only enforces RLIMIT_AS >(vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did >RLIMIT_RSS or not. > >if you use overcommit_memory=2, your desired max VM size determines >the amount of swap. otherwise, go with something modest - memory size >or so. but given that the smallest reasonable single disk these days >is probably about 320GB, it's hard to justify being _too_ tight. :-) The disks we use in nodes is SATA WD/10K RPM w/70 GB :-)) We didn't set overcommit_memory=2, but really use strongly restricted scheduling police for SGE batch jobs using only few applications. We have only batch jobs (no interactive), moreover - practically only *long batch jobs*. As a result we have summary VM (requested per node) equal (or lower) than RAM. There is practically zero swap activity. The only exclusion are (seldom executed) small test jobs, non-parallelized, mainly for check of input data. They use small RAM amount. So it looks for me that I may set even lower than 1.5*RAM swap size (I think RAM+4G = 20G will be enough). In message from Walid (Tue, 10 Jun 2008 19:27:43 +0300): >Hi, >For an 8GB dual socket quad core node, choosing in the kick start >file --recommended instead of specifying size RHEL5 allocates 1GB of >memory. our developers say that they should not swap as this will >cause an overhead, and they try to avoid it as much as possible OpenSuSE 10.3 recommends swap size=2 GB only, but I don't know, performs SuSE inst software some estimation of server RAM or no. Yours Mikhail From csamuel at vpac.org Tue Jun 10 18:33:28 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <6c91fc1fbdbca86c512bb52ae3cfab4f@87.127.209.200> Message-ID: <1674199748.126391213148008004.JavaMail.root@zimbra.vpac.org> ----- johnh@streamline-computing.com wrote: > All head nodes should have the BIOS set to locaboot first. We set the interface on the internal cluster network to PXE and the external to not. Mind you, we control the external network too, so even if it did try it shouldn't do anything. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Tue Jun 10 18:43:01 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: <436509742.126541213148491810.JavaMail.root@zimbra.vpac.org> Message-ID: <457290452.126561213148581328.JavaMail.root@zimbra.vpac.org> ----- "Jason Clinton" wrote: > The kernel patch is very extensive and, last I heard, under NDA. AMD post the patches publicly to the x86-64 discuss list. The most recent ones covered 2.6.24 and 2.6.25 and were sent out in April. https://www.x86-64.org/pipermail/discuss/2008-April/010398.html -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From Dan.Kidger at quadrics.com Thu Jun 12 04:38:36 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger@quadrics.com) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] A couple of interesting comments In-Reply-To: <1674199748.126391213148008004.JavaMail.root@zimbra.vpac.org> References: <6c91fc1fbdbca86c512bb52ae3cfab4f@87.127.209.200> <1674199748.126391213148008004.JavaMail.root@zimbra.vpac.org> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A04884916EC@quadbrsex1.quadrics.com> Chris Samuel wrote: >----- johnh@streamline-computing.com wrote: >> All head nodes should have the BIOS set to localboot first. > >We set the interface on the internal cluster network to >PXE and the external to not. I agree. but note that if you use ROCKS, it insists on the other way round: It wants to always reinstall a node on power up *unless* it has a working bootable partition, and it deliberately trashes the boot sector on a clean boot - only replacing it if the node is shut down cleanly. The key point is that any machine that has state that should be kept (like a head node) should *never* PXE boot by default - possibly not even if the primary HDD won't boot properly - only PXE boot on human intervention. PXE booting is concept of trust - that you trust the machine upstream of you to have full control of you, to delete and reinstall whatever it wants. Daniel ------------------------------------------------------------- Dr. Daniel Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Mobile: +44 (0)779 209 1851 Bristol, BS1 2AA, UK Office: +44 (0)117 915 5519 ----------------------- www.quadrics.com -------------------- From walid.shaari at gmail.com Thu Jun 12 06:32:38 2008 From: walid.shaari at gmail.com (Walid) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] RHEL5 network throughput/scalability Message-ID: Hi All, I have an issue with a new cluster setup where the nodes are RHEL5.1(with the latest 5.2 kernel), when i try to write NFS data, the nodes scale linearly until they reach the 10th node, that is the bandwidth , and throughput seen from the NFS sever on the other side of the nodes shows a liner increment from around 100+Mbyte/sec up to 1Gbyte/sec, however when we add another extra node to the equation the bandwidth/throughput becomes erratic/inconsistent, and drops to around 500-700Mbyte/sec. however if i try the same setup with RHEL4U6 i do not get the same behaviour it sustains the bandwidth at 1Gbyte/sec. the setup is like this 48 nodes sharing 48 port access switch that is up linked using 10g link to a CISCO 6509 switch which is linked to a Clustered NFS File system that consist of eight heads where each head linked using a 10G link to the 6509. the above was a write test, so i thought may be the tcp congestion kicked in, or sliding windows problem, however when i do a read test it gets worse, the scalability now is reduced to 5 nodes that is one node is able to read around 100 MBps, two will read double, and so on until you add the fifth node where the bandwidth drops from around 500+MBps to around 300, and again from RHEL4 the behaviour is different. any pointers? TIA Walid -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080612/4e8f86fa/attachment.html From garantes at iq.usp.br Tue Jun 10 06:53:15 2008 From: garantes at iq.usp.br (Guilherme Menegon Arantes) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] size of swap partition In-Reply-To: <200806101313.m5ADC2t3014136@bluewest.scyld.com> References: <200806101313.m5ADC2t3014136@bluewest.scyld.com> Message-ID: <20080610135315.GA4894@dinamobile> On Tue, Jun 10, 2008 at 06:13:00AM -0700, beowulf-request@beowulf.org wrote: > > Date: Tue, 10 Jun 2008 00:58:12 -0400 (EDT) > From: Mark Hahn > Subject: Re: [Beowulf] size of swap partition > To: Gerry Creager > Cc: Mikhail Kuzminsky , beowulf@beowulf.org > > it is possible to tune how the kernel responds to memory crunches - > for instance, you can always avoid OOM with the vm.overcommit_memory=2 > sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap > to get the desired limits.) in this mode, the kernel tracks how much VM > it actually needs (worst-case, reflected in Committed_AS in /proc/meminfo) > and compares that to a commit limit that reflects ram and swap. > > ... > > their real/virtual ratio. the kernel only enforces RLIMIT_AS > (vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did > RLIMIT_RSS or not. And that brings me to another related question: Where I can get more information about VM usage/tunning options (such as vm.overcommit_ratio, RLIMIT_AS, etc) and VM metrics (such as vmstat) for the current linux kernels (>= 2.6.18)? I have looked on /usr/src/linux/Documentation/vm but anywhere else with more accessible/digested info? Regards, Guilherme -- Guilherme Menegon Arantes, PhD S?o Paulo, Brasil ______________________________________________________ From raq at cttc.upc.edu Tue Jun 10 10:33:30 2008 From: raq at cttc.upc.edu (Ramiro Alba Queipo) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches Message-ID: <1213119210.8051.143.camel@mundo> Hello everybody: We are about to build an HPC cluster with infiniband network starting from 22 dual socket nodes with AMD QUAD core processors and in a year or so we will be having about 120 nodes. We will be using infiniband both for calculation as for storage. The question is that we need a modular solution and we are having 3 candidates: a) Voltaire Grid Director SDR or DDR 288 ports (9988 or 2012 models)-> seems very good and well supported, but very expensive. b) Qlogic SilverStorm 9120 (144 ports) -> no price and support information yet c) Flextronics 10U 144 Port Modular-> very good at price but little support => risky option?. I am in a mess. What is your opinion about this matter? Are you using any of this products. Regards -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. For all your IT requirements visit: http://www.transtec.co.uk From landman at scalableinformatics.com Thu Jun 12 07:36:32 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <1213119210.8051.143.camel@mundo> References: <1213119210.8051.143.camel@mundo> Message-ID: <48513470.30308@scalableinformatics.com> Ramiro Alba Queipo wrote: > Hello everybody: > > We are about to build an HPC cluster with infiniband network starting > from 22 dual socket nodes with AMD QUAD core processors and in a year or > so we will be having about 120 nodes. We will be using infiniband both > for calculation as for storage. Hi Ramiro: You may experience some contention issues in this case if your code is very latency sensitive, and you do lots of IO. > The question is that we need a modular solution and we are having 3 > candidates: > > a) Voltaire Grid Director SDR or DDR 288 ports (9988 or 2012 models)-> > seems very good and well supported, but very expensive. > > b) Qlogic SilverStorm 9120 (144 ports) -> no price and support > information yet > > c) Flextronics 10U 144 Port Modular-> very good at price but little > support => risky option?. The Flextronics units are Mellanox IP/chips inside (as are, I believe, many/most of the others). That is, the risk is low from a "will it work" view. Flextronics is an ODM, so they may not provide the levels of support around the system that you might get with Voltaire et al. Do you want/need a 1:1 architecture (e.g. all ports are the same number of switch hops from each other), or are you able/willing to look into oversubscribed links? Part of this has to do with your traffic patterns, your code requirements on latency, and your storage bandwidth. The Voltaire units are good, we have used them in units for customers. No complaints. Flextronics should be fine, as should Qlogic. We have customers with all of these. Rarely hear of complaints on IB switches. > > I am in a mess. What is your opinion about this matter? Are you using > any of this products. > > Regards > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From djholm at fnal.gov Thu Jun 12 08:08:21 2008 From: djholm at fnal.gov (Don Holmgren) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <1213119210.8051.143.camel@mundo> References: <1213119210.8051.143.camel@mundo> Message-ID: Ramiro - You might want to also consider buying just a single 24-port switch for your 22 nodes, and then when you expand either replace with a larger switch, or build a distributed switch fabric with a number of leaf switches connecting into a central spine switch (or switches). By the time you expand to the larger cluster, switches based on the announced 36-port Mellanox crossbar silicon will be available and perhaps per port prices will have dropped sufficiently to justify the purchase delay and the disruption at the time of expansion. If your applications can tolerate some oversubscription (less than a 1:1 ratio of leaf-to-spine uplinks to leaf-to-node connections), a distributed switch fabric (leaf and spine) has the advantage of shorter (and cheaper) cables between the leaf switches and your nodes, and relatively fewer longer cables from the leaves back to the spine, compared with a single central switch. We have many Flextronics switches - SDR and DDR, 24-port and 144-port - on a pair of large clusters (520 nodes, and 600 nodes) built in 2005 and 2006. No complaints. But, we have been self-supporting, and I would guess you would have very different support structures with Voltaire or Qlogic. With the Flextronics switches you will definitely be using the OFED stack, and you will have to run a subnet manager on one of your nodes (dedicated is probably best). You could optionally buy an embedded subnet manager on the Voltaire or Qlogic switches, depending upon model, though I believe for a large fabric an external subnet manager is still recommended. Don Holmgren Fermilab On Tue, 10 Jun 2008, Ramiro Alba Queipo wrote: > Hello everybody: > > We are about to build an HPC cluster with infiniband network starting > from 22 dual socket nodes with AMD QUAD core processors and in a year or > so we will be having about 120 nodes. We will be using infiniband both > for calculation as for storage. > The question is that we need a modular solution and we are having 3 > candidates: > > a) Voltaire Grid Director SDR or DDR 288 ports (9988 or 2012 models)-> > seems very good and well supported, but very expensive. > > b) Qlogic SilverStorm 9120 (144 ports) -> no price and support > information yet > > c) Flextronics 10U 144 Port Modular-> very good at price but little > support => risky option?. > > I am in a mess. What is your opinion about this matter? Are you using > any of this products. > > Regards From andrew at moonet.co.uk Thu Jun 12 08:51:11 2008 From: andrew at moonet.co.uk (andrew holway) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: References: <1213119210.8051.143.camel@mundo> Message-ID: +1 for the 24 port flextronics switches. They are very cost effective for half bisectional networks upto 32 ports. It starts to get messy after that. I wonder how long we will be waiting for switches based on the 36p asic? On Thu, Jun 12, 2008 at 4:08 PM, Don Holmgren wrote: > > Ramiro - > > You might want to also consider buying just a single 24-port switch for your > 22 nodes, and then when you expand either replace with a larger switch, or > build a distributed switch fabric with a number of leaf switches connecting > into a central spine switch (or switches). By the time you expand to the > larger cluster, switches based on the announced 36-port Mellanox crossbar > silicon will be available and perhaps per port prices will have dropped > sufficiently to justify the purchase delay and the disruption at the time of > expansion. > > If your applications can tolerate some oversubscription (less than a 1:1 > ratio of leaf-to-spine uplinks to leaf-to-node connections), a distributed > switch fabric (leaf and spine) has the advantage of shorter (and cheaper) > cables between the leaf switches and your nodes, and relatively fewer longer > cables from the leaves back to the spine, compared with a single central > switch. > > We have many Flextronics switches - SDR and DDR, 24-port and 144-port - on a > pair of large clusters (520 nodes, and 600 nodes) built in 2005 and 2006. No > complaints. But, we have been self-supporting, and I would guess you would > have very different support structures with Voltaire or Qlogic. With the > Flextronics > switches you will definitely be using the OFED stack, and you will have to > run > a subnet manager on one of your nodes (dedicated is probably best). You > could > optionally buy an embedded subnet manager on the Voltaire or Qlogic > switches, > depending upon model, though I believe for a large fabric an external subnet > manager is still recommended. > > Don Holmgren > Fermilab > > > > > On Tue, 10 Jun 2008, Ramiro Alba Queipo wrote: > >> Hello everybody: >> >> We are about to build an HPC cluster with infiniband network starting >> from 22 dual socket nodes with AMD QUAD core processors and in a year or >> so we will be having about 120 nodes. We will be using infiniband both >> for calculation as for storage. >> The question is that we need a modular solution and we are having 3 >> candidates: >> >> a) Voltaire Grid Director SDR or DDR 288 ports (9988 or 2012 models)-> >> seems very good and well supported, but very expensive. >> >> b) Qlogic SilverStorm 9120 (144 ports) -> no price and support >> information yet >> >> c) Flextronics 10U 144 Port Modular-> very good at price but little >> support => risky option?. >> >> I am in a mess. What is your opinion about this matter? Are you using >> any of this products. >> >> Regards > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From richard.walsh at comcast.net Thu Jun 12 09:12:34 2008 From: richard.walsh at comcast.net (richard.walsh@comcast.net) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Is PowerXCell eDP fully IEEE 754 compliant ... ?? ... the old Cell is/was ... Message-ID: <061220081612.24009.48514AF20003BC1C00005DC92215567074089C040E99D20B9D0E080C079D@comcast.net> All, I have not been able to get an exact answer to this question. The older chip, while much slower in double-precision was fully IEEE compliant I am fairly sure. I believe that IBM has improved the compliance of single-precision in the PowerXCell (although it is still not fully compliant), but its double- precision has fallen back to the single precision level of compliance to allow for the performance boost. This leaves it at close to parity in compliance when compared to the competition (ATI and NVIDIA) in the volume-economics DLP accelerator arena. Could someone that is certain of the answer to this question clarify? Best Regards, rbw -- "Making predictions is hard, especially about the future." Niels Bohr -- Richard Walsh Thrashing River Consulting-- 5605 Alameda St. Shoreview, MN 55126 Phone #: 612-382-4620 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080612/4f5f0370/attachment.html From Shainer at mellanox.com Thu Jun 12 10:01:11 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F0129E5FB@mtiexch01.mti.com> > +1 for the 24 port flextronics switches. They are very cost effective > for half bisectional networks upto 32 ports. It starts to get > messy after that. > > I wonder how long we will be waiting for switches based on > the 36p asic? > Mellanox announced the availability of the switch asic this week, and can provide switch evaluation kits (36 port box and adapters with IB QDR capability) now. My estimation is that the production switches will be out Q3. Gilad. From andrew at moonet.co.uk Thu Jun 12 10:21:43 2008 From: andrew at moonet.co.uk (andrew holway) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F0129E5FB@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F0129E5FB@mtiexch01.mti.com> Message-ID: > Mellanox announced the availability of the switch asic this week, and > can provide switch evaluation kits (36 port box and adapters with IB QDR > capability) now. My estimation is that the production switches will be > out Q3. Which vendor? From bill at cse.ucdavis.edu Thu Jun 12 11:04:19 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Barcelona hardware error: how to detect In-Reply-To: <588c11220806051046t3ad48a02q449a3ede0e299884@mail.gmail.com> References: <588c11220806051046t3ad48a02q449a3ede0e299884@mail.gmail.com> Message-ID: <48516523.9010406@cse.ucdavis.edu> > Yes, what are currently shipping from AMD are B3 revision processors. The > TLB-look-aside problem is fixed. > > There are other less-critical problems with B3, however. Specifically, > power-related compatibility issues with various motherboards due to > (according to the motherboard manufacturers) AMD changing the TDP late in > the release process. I can't give any specific names or models that we know > have problems, however. I can say that everyone involved is working on a > resolution--usually through PCB revisions of the motherboards. A number of > 1U power supplies that have previously worked with all Intel and AMD > solutions are now insufficient, as well, due to 12V limitations. B3 pulls a > *lot* of power. I've heard reports of b3 pulling more power than b2, not sure if that's just the higher clock speed, or a b3 related change. Has anyone put a dual socket B3 system on a kill-a-watt and tested it under load? From bernard at vanhpc.org Thu Jun 12 12:45:02 2008 From: bernard at vanhpc.org (Bernard Li) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Roadrunner picture Message-ID: Hi all: I am sure most people have seen the following picture for Roadrunner circulating the Net: http://www.cnn.com/2008/TECH/06/09/fastest.computer.ap/index.html?iref=newssearch However, they don't look likes blades to me, more like 2U IBM x series servers. Perhaps those are the I/O nodes? Cheers, Bernard From prentice at ias.edu Thu Jun 12 13:03:07 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] User resource limits In-Reply-To: <742070125.116971213070188917.JavaMail.root@zimbra.vpac.org> References: <742070125.116971213070188917.JavaMail.root@zimbra.vpac.org> Message-ID: <485180FB.2090402@ias.edu> Chris Samuel wrote: > > Unfortunately the kernel implementation of mmap() doesn't check > the maximum memory size (RLIMIT_RSS) or maximum data size (RLIMIT_DATA) > limits which were being set, but only the maximum virtual RAM size > (RLIMIT_AS) - this is documented in the setrlimit(2) man page. > > :-( > Yeah... I was just reading the setrlimit man page. Does that mean that the only way I can limit RAM usage is with RLIMIT_AS? (Or "as" in limits.conf parlance) I would have to limit AS < RAM to keep a user from using all RAM. Since AS includes virtual memory, and VM = RAM + swap, wouldn't I be limiting users a little more than I'd hoped? -- Prentice From prentice at ias.edu Thu Jun 12 13:07:18 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Roadrunner picture In-Reply-To: References: Message-ID: <485181F6.8030904@ias.edu> Bernard Li wrote: > Hi all: > > I am sure most people have seen the following picture for Roadrunner > circulating the Net: > > http://www.cnn.com/2008/TECH/06/09/fastest.computer.ap/index.html?iref=newssearch > > However, they don't look likes blades to me, more like 2U IBM x series > servers. Perhaps those are the I/O nodes? > Perhaps that is a poorly chosen file photo, and not really a photo of Roadrunner? Your I/O node theory is plausible, too. Is IBM getting away from the BlueGene architecture From john.leidel at gmail.com Thu Jun 12 13:09:32 2008 From: john.leidel at gmail.com (John Leidel) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Roadrunner picture In-Reply-To: References: Message-ID: <1213301372.5092.0.camel@e521.site> Also at ComputerWorld: http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9085021&intsrc=news_ts_head On Thu, 2008-06-12 at 12:45 -0700, Bernard Li wrote: > Hi all: > > I am sure most people have seen the following picture for Roadrunner > circulating the Net: > > http://www.cnn.com/2008/TECH/06/09/fastest.computer.ap/index.html?iref=newssearch > > However, they don't look likes blades to me, more like 2U IBM x series > servers. Perhaps those are the I/O nodes? > > Cheers, > > Bernard > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From peter.st.john at gmail.com Thu Jun 12 13:09:43 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Roadrunner picture In-Reply-To: References: Message-ID: Bernard, I'm looking forward to hearing from our resident experts, but meanwhile: http://en.wikipedia.org/wiki/IBM_Roadrunner exlains the architecture some. The buzzword is "triblade", which is 3 blades (with an extension) employing two types of processors (AMD Opteron and IBM Cell) in a hybrid subsystem. I have no idea what a single Triblade looks like. The overallmachine is then composed of zillions of triblades. Wow,imagine a Beowulf of those (jk :-) Peter (designing a Beowulf of abaci to fit his current budget) On Thu, Jun 12, 2008 at 3:45 PM, Bernard Li wrote: > Hi all: > > I am sure most people have seen the following picture for Roadrunner > circulating the Net: > > > http://www.cnn.com/2008/TECH/06/09/fastest.computer.ap/index.html?iref=newssearch > > However, they don't look likes blades to me, more like 2U IBM x series > servers. Perhaps those are the I/O nodes? > > Cheers, > > Bernard > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080612/19388c7f/attachment.html From prentice at ias.edu Thu Jun 12 13:58:54 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Roadrunner picture In-Reply-To: References: Message-ID: <48518E0E.8080604@ias.edu> Bernard Li wrote: > Hi all: > > I am sure most people have seen the following picture for Roadrunner > circulating the Net: > > http://www.cnn.com/2008/TECH/06/09/fastest.computer.ap/index.html?iref=newssearch > > However, they don't look likes blades to me, more like 2U IBM x series > servers. Perhaps those are the I/O nodes? > This might be what your seeing: "Each CU also has access to the Panasas file system through twelve System x3755 machines." - http://en.wikipedia.org/wiki/IBM_Roadrunner From richard.walsh at comcast.net Thu Jun 12 14:05:09 2008 From: richard.walsh at comcast.net (richard.walsh@comcast.net) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Roadrunner picture Message-ID: <061220082105.9640.48518F850002D8B0000025A82215586394089C040E99D20B9D0E080C079D@comcast.net> Skipped content of type multipart/alternative-------------- next part -------------- An embedded message was scrubbed... From: "Peter St. John" Subject: Re: [Beowulf] Roadrunner picture Date: Thu, 12 Jun 2008 20:16:19 +0000 Size: 762 Url: http://www.scyld.com/pipermail/beowulf/attachments/20080612/d2cdc94e/attachment.mht From jan.heichler at gmx.net Thu Jun 12 14:27:56 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] MVAPICH2 and osu_latency Message-ID: <6410235104.20080612232756@gmx.net> Dear all! I found this http://mvapich.cse.ohio-state.edu/performance/mvapich2/opteron/MVAPICH2-opteron-gen2-DDR.shtml as reference value for MPI-latency of Infiniband. I try to reproduce those numbers at the moment but i'm stuck with # OSU MPI Latency Test v3.0 # Size Latency (us) 0 3.07 1 3.17 2 3.16 4 3.15 8 3.19 Equipment is two quadsocket Opteron Blades (Supermicro) with Mellanox Ex DDR cards. Single 24 port switch connects them. Can anybody help with suggestions what i can do to lower the latency? Regards, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080612/999953c2/attachment.html From tom.elken at qlogic.com Thu Jun 12 15:04:29 2008 From: tom.elken at qlogic.com (Tom Elken) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] MVAPICH2 and osu_latency In-Reply-To: <6410235104.20080612232756@gmx.net> References: <6410235104.20080612232756@gmx.net> Message-ID: <6DB5B58A8E5AB846A7B3B3BFF1B4315A0214C01D@AVEXCH1.qlogic.org> So you're concerned with the gap between the 2.63 us that OSU measured and your 3.07 us you measured. I wouldn't be too concerned. MPI latency can be quite dependent on the systems you use. OSU used dual-processor 2.8 Ghz processors. Such as system has ~60 ns latency to local memory. On your 4-socket Opteron system, your local memory latency is probably in the 90-100 ns range. Assuming you are also using MVAPICH2, this is probably the main difference for the latency shortfall you are seeing. Another possibility is that the CPU you are running the MPI test on is not the closest CPU to the PCIe chipset. Thus, you may be taking some HT hops on the way to the PCIe bus and adapter card. -Tom ________________________________ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jan Heichler Sent: Thursday, June 12, 2008 2:28 PM To: Beowulf Mailing List Subject: [Beowulf] MVAPICH2 and osu_latency Dear all! I found this http://mvapich.cse.ohio-state.edu/performance/mvapich2/opteron/MVAPICH2- opteron-gen2-DDR.shtml as reference value for MPI-latency of Infiniband. I try to reproduce those numbers at the moment but i'm stuck with # OSU MPI Latency Test v3.0 # Size Latency (us) 0 3.07 1 3.17 2 3.16 4 3.15 8 3.19 Equipment is two quadsocket Opteron Blades (Supermicro) with Mellanox Ex DDR cards. Single 24 port switch connects them. Can anybody help with suggestions what i can do to lower the latency? Regards, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080612/16bf3067/attachment.html From hahn at mcmaster.ca Thu Jun 12 15:26:20 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] User resource limits In-Reply-To: <485180FB.2090402@ias.edu> References: <742070125.116971213070188917.JavaMail.root@zimbra.vpac.org> <485180FB.2090402@ias.edu> Message-ID: >> Unfortunately the kernel implementation of mmap() doesn't check >> the maximum memory size (RLIMIT_RSS) or maximum data size (RLIMIT_DATA) >> limits which were being set, but only the maximum virtual RAM size >> (RLIMIT_AS) - this is documented in the setrlimit(2) man page. >> >> :-( I think it's a perfectly reasonable choice. RSS enforcement means accounting and checks on what would otherwise be fast paths. besides, I think it also lacks transparency, since a process's RSS is affected by random other system events, other users, etc. using a memory limit that is triggered on actual allocation events (mmap, brk) makes a lot of sense to me, and that means virtual size, exactly what RLIMIT_AS does... > limits.conf parlance) I would have to limit AS < RAM to keep a user from > using all RAM. Since AS includes virtual memory, and VM = RAM + swap, > wouldn't I be limiting users a little more than I'd hoped? I don't follow that. why would you want to keep a user from using all ram (which assumes the ram is otherwise free/unused/wasted)? the only real trick with RLIMIT_AS and vm.overcommit=2 is that it's hard to predict the vsz of processes. normally, vsz is modestly larger than rss, but sysv shm, mmaped libries perturb this, as well as the dubious practice (more common in fortran I think) of allocating max-sized arrays even if you only ever use a small part. my experience so far is that setting RLIMIT_AS to around ram size is reasonable. we have had good luck with swap=ram (or a little more), vm.overcommit_memory=2 and vm.overcommit_ratio=100. the overcommit settings alone do a poor job - you also need RLIMIT_AS. regards, mark hahn. From kyron at neuralbs.com Thu Jun 12 15:32:13 2008 From: kyron at neuralbs.com (Eric Thibodeau) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] size of swap partition In-Reply-To: <20080610135315.GA4894@dinamobile> References: <200806101313.m5ADC2t3014136@bluewest.scyld.com> <20080610135315.GA4894@dinamobile> Message-ID: <4851A3ED.5030206@neuralbs.com> You can check out the following: http://linux-mm.org/LinuxMM Guilherme Menegon Arantes wrote: > On Tue, Jun 10, 2008 at 06:13:00AM -0700, beowulf-request@beowulf.org wrote: > >> Date: Tue, 10 Jun 2008 00:58:12 -0400 (EDT) >> From: Mark Hahn >> Subject: Re: [Beowulf] size of swap partition >> To: Gerry Creager >> Cc: Mikhail Kuzminsky , beowulf@beowulf.org >> >> it is possible to tune how the kernel responds to memory crunches - >> for instance, you can always avoid OOM with the vm.overcommit_memory=2 >> sysctl (you'll need to tune vm.overcommit_ratio and the amount of swap >> to get the desired limits.) in this mode, the kernel tracks how much VM >> it actually needs (worst-case, reflected in Committed_AS in /proc/meminfo) >> and compares that to a commit limit that reflects ram and swap. >> >> ... >> >> their real/virtual ratio. the kernel only enforces RLIMIT_AS >> (vsz in ps),assuming a 2.6 kernel - I forget whether 2.4 did >> RLIMIT_RSS or not. >> > > > And that brings me to another related question: Where I can get more > information about VM usage/tunning options (such as vm.overcommit_ratio, > RLIMIT_AS, etc) and VM metrics (such as vmstat) for the current linux > kernels (>= 2.6.18)? I have looked on /usr/src/linux/Documentation/vm > but anywhere else with more accessible/digested info? > > Regards, > > Guilherme > > -- > > Guilherme Menegon Arantes, PhD S?o Paulo, Brasil > ______________________________________________________ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080612/8c8f7ca1/attachment.html From bernard at vanhpc.org Thu Jun 12 15:33:27 2008 From: bernard at vanhpc.org (Bernard Li) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Roadrunner picture In-Reply-To: <061220082105.9640.48518F850002D8B0000025A82215586394089C040E99D20B9D0E080C079D@comcast.net> References: <061220082105.9640.48518F850002D8B0000025A82215586394089C040E99D20B9D0E080C079D@comcast.net> Message-ID: Dear all: Thanks for all the responses. I was at the Roadrunner booth at SC07. They had a handout explaining the Roadrunner architecture which also has a picture of racks of blades (maybe not of Roadrunner, but blades nevertheless). If I remember correctly they even have the blades on display. John's ComputerWorld link also has some pictures of the blades. So I guess I was just really trying to figure out what nodes those pictures are showing. Most likely the I/O nodes although there is also the off-chance that they are just random racks of servers ;-) Cheers, Bernard On Thu, Jun 12, 2008 at 2:05 PM, wrote: > All, > > Not a expert, but I know a thing or two. The triblade is two CB2 blades > which each hold each two PowerXCell processors in a cc-NUMA arrangement. > They sandwch a LS21 blade that is connected to each through a 16x PCIe to HT > bridge. These three are uni-body constructed. The CB2s resemble the QS22 > blade > that goes into the IBM BladCenter H chassis. They are vertical full-height > blades which fit 14 to an enclosure. The RoadRunner triblade is at least > double-wide > and maybe more. Do not know the measurements. > > The photo confuses me though, because am pretty sure these are vertically > racked. > Another thing to note is that programming the triblade is tri-binary ... > x86, Power, > and SPE. MPI processes are doled out to the Opteron blade. The PowerXCells > are programmed beneath MPI as SIMD accelrators. The systems processing > power is largely resident in the PowerXCell (~200 peak Gflops per CB2), the > Opteron only accounts for about 44 teraflops of the total peak performance > with is > in the vicinity of 1400 teraflops. Linpack runs at about 85% efficient on > the system > and is running on the SPE only I am pretty sure. Running Linpack it > generates 650 > Mflops per watt making it pretty Green I guess ... which is what you would > expect > from a DLP engine. As I recall Blue Gene is about 350 Mflops per watt. But > the > 650 number maybe does not count the LS21 power consumption. Anyway ... > > Hope that was useful ... now, can someone tell about the IEEE-754-ness of > its eDP > units? > > rbw > -- > > "Making predictions is hard, especially about the future." > > Niels Bohr > > -- > > Richard Walsh > Thrashing River Consulting-- > 5605 Alameda St. > Shoreview, MN 55126 > > Phone #: 612-382-4620 > > > -------------- Original message -------------- > From: "Peter St. John" > Bernard, > > I'm looking forward to hearing from our resident experts, but > meanwhile: http://en.wikipedia.org/wiki/IBM_Roadrunner exlains the > architecture some. The buzzword is "triblade", which is 3 blades (with an > extension) employing two types of processors (AMD Opteron and IBM Cell) in > a hybrid subsystem. I have no idea what a single Triblade looks like. The > overallmachine is then composed of zillions of triblades. > Wow,imagine a Beowulf of those (jk :-) > > Peter (designing a Beowulf of abaci to fit his current budget) > > On Thu, Jun 12, 2008 at 3:45 PM, Bernard Li wrote: >> >> Hi all: >> >> I am sure most people have seen the following picture for Roadrunner >> circulating the Net: >> >> >> http://www.cnn.com/2008/TECH/06/09/fastest.computer.ap/index.html?iref=newssearch >> >> However, they don't look likes blades to me, more like 2U IBM x series >> servers. Perhaps those are the I/O nodes? >> >> Cheers, >> >> Bernard >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > ---------- Forwarded message ---------- > From: "Peter St. John" > To: "Bernard Li" > Date: Thu, 12 Jun 2008 20:16:19 +0000 > Subject: Re: [Beowulf] Roadrunner picture > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From jan.heichler at gmx.net Thu Jun 12 20:06:58 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] MVAPICH2 and osu_latency In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F0129E69E@mtiexch01.mti.com> References: <6410235104.20080612232756@gmx.net> <9FA59C95FFCBB34EA5E42C1A8573784F0129E69E@mtiexch01.mti.com> Message-ID: <1745371589.20080613050658@gmx.net> An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080613/c8422ac6/attachment.html From jan.heichler at gmx.net Thu Jun 12 20:11:40 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] MVAPICH2 and osu_latency In-Reply-To: <6DB5B58A8E5AB846A7B3B3BFF1B4315A0214C01D@AVEXCH1.qlogic.org> References: <6410235104.20080612232756@gmx.net> <6DB5B58A8E5AB846A7B3B3BFF1B4315A0214C01D@AVEXCH1.qlogic.org> Message-ID: <1788564189.20080613051140@gmx.net> An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080613/a767e25f/attachment.html From apittman at concurrent-thinking.com Fri Jun 13 03:02:07 2008 From: apittman at concurrent-thinking.com (Ashley Pittman) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] MVAPICH2 and osu_latency In-Reply-To: <1788564189.20080613051140@gmx.net> References: <6410235104.20080612232756@gmx.net> <6DB5B58A8E5AB846A7B3B3BFF1B4315A0214C01D@AVEXCH1.qlogic.org> <1788564189.20080613051140@gmx.net> Message-ID: <1213351327.6775.18.camel@bruce.priv.wark.uk.streamline-computing.com> On Fri, 2008-06-13 at 05:11 +0200, Jan Heichler wrote: > > > > > So you're concerned with the gap > between the 2.63 us that OSU > measured and your 3.07 us you > measured. I wouldn't be too > concerned. > > 1st: i get a value of 2.96 with MVAPICH 1.0.0 - this is exactly the > value that i find on the mvapich website ;-) > > > It is not about being concerned not to get "optimal performance" - i > know that such micro-benchmarks are of limited use... but i have a > customer requirement. And since it seems possible it would be helpfull > to get there Rule #1. Don't commit yourself to a latency/bandwidth figure unless you have run the specific micro-benchmark *on the specific chipset/hardware configuration* you intend to deliver this commitment on. 0.4uS is well withing the bounds of fluctuation you get from different PCI chipsets. You're probably best of going back to the hardware vendor to see if there is anything they can tweak (mmtr perhaps?) but beyond that I suspect your problem will have political solution rather than a technical one. > The value is everytime the same. Shouldn't it be different then every > run? And: how can i move the process? numactl or taskset just works on > the local process i assume. How can i move the "remote process" on the > other host? Run numactl on the other host as well. That said it's unusual for the value to the be same every run, this probably won't change anything as all numactl can do is to stabilise the results towards the bottom of the range observed without it. Ashley Pittman. From prentice at ias.edu Fri Jun 13 05:38:13 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] User resource limits In-Reply-To: References: <742070125.116971213070188917.JavaMail.root@zimbra.vpac.org> <485180FB.2090402@ias.edu> Message-ID: <48526A35.6010609@ias.edu> Mark Hahn wrote: >>> Unfortunately the kernel implementation of mmap() doesn't check >>> the maximum memory size (RLIMIT_RSS) or maximum data size (RLIMIT_DATA) >>> limits which were being set, but only the maximum virtual RAM size >>> (RLIMIT_AS) - this is documented in the setrlimit(2) man page. >>> >>> :-( > > I think it's a perfectly reasonable choice. RSS enforcement means > accounting and checks on what would otherwise be fast paths. > besides, I think it also lacks transparency, since a process's RSS is > affected by random other system events, other users, etc. > > using a memory limit that is triggered on actual allocation events > (mmap, brk) makes a lot of sense to me, and that means virtual size, > exactly what RLIMIT_AS does... > >> limits.conf parlance) I would have to limit AS < RAM to keep a user from >> using all RAM. Since AS includes virtual memory, and VM = RAM + swap, >> wouldn't I be limiting users a little more than I'd hoped? > > I don't follow that. why would you want to keep a user from using all > ram (which assumes the ram is otherwise free/unused/wasted)? > Because these are multi-user systems that are not managed by a queuing system, and users are running large jobs on them. Once every couple of days, we have to hard-reboot one of them b/c they become unresponsive when they run out of memory (OOM messages in the logs verify this). I think I explained this in more detail in my original e-mail. > the only real trick with RLIMIT_AS and vm.overcommit=2 is that it's hard > to predict the vsz of processes. normally, vsz is modestly larger than > rss, > but sysv shm, mmaped libries perturb this, as well as the dubious practice > (more common in fortran I think) of allocating max-sized arrays even if > you only ever use a small part. > > my experience so far is that setting RLIMIT_AS to around ram size is > reasonable. we have had good luck with swap=ram (or a little more), > vm.overcommit_memory=2 and vm.overcommit_ratio=100. the overcommit > settings alone do a poor job - you also need RLIMIT_AS. vm.overcommit? never heard of that before. I'm going to google that now. -- Prentice From landman at scalableinformatics.com Fri Jun 13 06:11:33 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] User resource limits In-Reply-To: <48526A35.6010609@ias.edu> References: <742070125.116971213070188917.JavaMail.root@zimbra.vpac.org> <485180FB.2090402@ias.edu> <48526A35.6010609@ias.edu> Message-ID: <48527205.1030400@scalableinformatics.com> Prentice Bisbal wrote: > vm.overcommit? never heard of that before. I'm going to google that now. vm tuning knobs/dials /proc/sys/vm/overcommit_memory /proc/sys/vm/overcommit_ratio or via sysctl landman@lightning:~$ sysctl -a | grep -i overcommit ... vm.overcommit_memory = 0 vm.overcommit_ratio = 50 -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From raq at cttc.upc.edu Fri Jun 13 08:19:28 2008 From: raq at cttc.upc.edu (Ramiro Alba Queipo) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <48513470.30308@scalableinformatics.com> References: <1213119210.8051.143.camel@mundo> <48513470.30308@scalableinformatics.com> Message-ID: <1213370368.5895.46.camel@mundo> On Thu, 2008-06-12 at 10:36 -0400, Joe Landman wrote: > Ramiro Alba Queipo wrote: > > Hello everybody: > > > > We are about to build an HPC cluster with infiniband network starting > > from 22 dual socket nodes with AMD QUAD core processors and in a year or > > so we will be having about 120 nodes. We will be using infiniband both > > for calculation as for storage. > > Hi Ramiro: > > You may experience some contention issues in this case if your code > is very latency sensitive, and you do lots of IO. Our software is home-made on CFD using MPI (lam until now and openmpi from now on) but our solvers are neither very latency sensitive nor do lots of IO at this moment, so I think that now it is a sensible desition ?What do you think? > > > The question is that we need a modular solution and we are having 3 > > candidates: > > > > a) Voltaire Grid Director SDR or DDR 288 ports (9988 or 2012 models)-> > > seems very good and well supported, but very expensive. > > > > b) Qlogic SilverStorm 9120 (144 ports) -> no price and support > > information yet > > > > c) Flextronics 10U 144 Port Modular-> very good at price but little > > support => risky option?. > > The Flextronics units are Mellanox IP/chips inside (as are, I believe, > many/most of the others). That is, the risk is low from a "will it > work" view. Flextronics is an ODM, so they may not provide the levels > of support around the system that you might get with Voltaire et al. > > Do you want/need a 1:1 architecture (e.g. all ports are the same number > of switch hops from each other), or are you able/willing to look into > oversubscribed links? Part of this has to do with your traffic > patterns, your code requirements on latency, and your storage bandwidth. I do not know many details, but we are using nowadays solvers adapted to live with high latencies, and with infiniband we expect to scale better and we expect to run tasks using 500 (8 cores/node) or more cores (thought not right now). In fact we are doing some test at Marenostrum supercomputer in Barcelona with about 1000 cores. The question that worries me is if we will be limited at mid-term by a solution based on 24 ports switches joined by say 4 ports (not a fat-tree topology which waste a lot more of ports), and be loosing a latency/bandwidth that we then will be needing when having 130 ports at the end of next year. By the way: a) How many hops a Flextronics 10U 144 Port Modular is doing? b) And the others? c) How much latency am I loosing in each hop? (In the case of Voltaire switches: ISR 9024 - 24 Ports: 140 ns ; ISR 2004 - 96 ports: 420 ns d) Each port I am using to connect a switch to another one is summing up its bandwidth to the total (20 Gb/s * 4 = 80 Gbs when using 4 ports to connect) The alternatives are: a) Start with a good 24 port swith and grow up loosing latency and bandwidth b) Buy a 48 or 96 ports spending more money to have more ports at full bandwidth/latency c) Use the Flextronix 10U 144 Port Modular solution which will allow us to scale well in a couple years Thanks for your answer Regards -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que est? net. For all your IT requirements visit: http://www.transtec.co.uk From raq at cttc.upc.edu Fri Jun 13 08:28:06 2008 From: raq at cttc.upc.edu (Ramiro Alba Queipo) Date: Thu Mar 18 01:07:16 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: References: <1213119210.8051.143.camel@mundo> Message-ID: <1213370886.5895.53.camel@mundo> On Thu, 2008-06-12 at 10:08 -0500, Don Holmgren wrote: > Ramiro - > > You might want to also consider buying just a single 24-port switch for your 22 > nodes, and then when you expand either replace with a larger switch, or build a > distributed switch fabric with a number of leaf switches connecting into a > central spine switch (or switches). By the time you expand to the larger > cluster, switches based on the announced 36-port Mellanox crossbar silicon will > be available and perhaps per port prices will have dropped sufficiently to > justify the purchase delay and the disruption at the time of expansion. Could you explain me this solution? I did not know about it > > If your applications can tolerate some oversubscription (less than a 1:1 ratio > of leaf-to-spine uplinks to leaf-to-node connections), a distributed switch > fabric (leaf and spine) has the advantage of shorter (and cheaper) cables > between the leaf switches and your nodes, and relatively fewer longer cables > from the leaves back to the spine, compared with a single central switch. What do you mean with a distributed switch fabric? What is the difference with a modular solution? Thanks for your answer Regards -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. For all your IT requirements visit: http://www.transtec.co.uk From egan at sense.net Fri Jun 13 08:43:31 2008 From: egan at sense.net (Egan Ford) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Roadrunner picture In-Reply-To: Message-ID: <02f001c8cd6c$3c5c1990$4f833909@oberon> Perhaps this will help: http://www.lanl.gov/roadrunner/ And: http://www.lanl.gov/orgs/hpc/roadrunner/pdfs/Koch%20-%20Roadrunner%20Overvie w/RR%20Seminar%20-%20System%20Overview.pdf Pages 20 - 29 IANS, the triblade is really a quadblade, blade 1 is the Opteron Blade, blade 2 is a bridge, blades 3 and 4 are the Cell blades. Lots of other good stuff here: http://www.lanl.gov/orgs/hpc/roadrunner/rrseminars.shtml > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] On Behalf Of Bernard Li > Sent: Thursday, June 12, 2008 4:33 PM > To: richard.walsh@comcast.net > Cc: Beowulf Mailing List > Subject: Re: [Beowulf] Roadrunner picture > > > Dear all: > > Thanks for all the responses. I was at the Roadrunner booth > at SC07. They had a handout explaining the Roadrunner > architecture which also has a picture of racks of blades > (maybe not of Roadrunner, but blades nevertheless). If I > remember correctly they even have the blades on display. > > John's ComputerWorld link also has some pictures of the blades. > > So I guess I was just really trying to figure out what nodes > those pictures are showing. Most likely the I/O nodes > although there is also the off-chance that they are just > random racks of servers ;-) > > Cheers, > > Bernard > > On Thu, Jun 12, 2008 at 2:05 PM, wrote: > > All, > > > > Not a expert, but I know a thing or two. The triblade is two CB2 > > blades which each hold each two PowerXCell processors in a cc-NUMA > > arrangement. They sandwch a LS21 blade that is connected to each > > through a 16x PCIe to HT bridge. These three are uni-body > constructed. > > The CB2s resemble the QS22 blade that goes into the IBM > BladCenter H > > chassis. They are vertical full-height blades which fit 14 to an > > enclosure. The RoadRunner triblade is at least double-wide > > and maybe more. Do not know the measurements. > > > > The photo confuses me though, because am pretty sure these are > > vertically racked. Another thing to note is that programming the > > triblade is tri-binary ... x86, Power, > > and SPE. MPI processes are doled out to the Opteron blade. > The PowerXCells > > are programmed beneath MPI as SIMD accelrators. The > systems processing > > power is largely resident in the PowerXCell (~200 peak > Gflops per CB2), the > > Opteron only accounts for about 44 teraflops of the total > peak performance > > with is > > in the vicinity of 1400 teraflops. Linpack runs at about > 85% efficient on > > the system > > and is running on the SPE only I am pretty sure. Running Linpack it > > generates 650 > > Mflops per watt making it pretty Green I guess ... which is > what you would > > expect > > from a DLP engine. As I recall Blue Gene is about 350 > Mflops per watt. But > > the > > 650 number maybe does not count the LS21 power consumption. > Anyway ... > > > > Hope that was useful ... now, can someone tell about the > IEEE-754-ness > > of its eDP units? > > > > rbw > > -- > > > > "Making predictions is hard, especially about the future." > > > > Niels Bohr > > > > -- > > > > Richard Walsh > > Thrashing River Consulting-- > > 5605 Alameda St. > > Shoreview, MN 55126 > > > > Phone #: 612-382-4620 > > > > > > -------------- Original message -------------- > > From: "Peter St. John" > > Bernard, > > > > I'm looking forward to hearing from our resident experts, but > > meanwhile: http://en.wikipedia.org/wiki/IBM_Roadrunner exlains the > > architecture some. The buzzword is "triblade", which is 3 blades > > (with an > > extension) employing two types of processors (AMD Opteron > and IBM Cell) in > > a hybrid subsystem. I have no idea what a single Triblade > looks like. The > > overallmachine is then composed of zillions of triblades. > > Wow,imagine a Beowulf of those (jk :-) > > > > Peter (designing a Beowulf of abaci to fit his current budget) > > > > On Thu, Jun 12, 2008 at 3:45 PM, Bernard Li > > wrote: > >> > >> Hi all: > >> > >> I am sure most people have seen the following picture for > Roadrunner > >> circulating the Net: > >> > >> > >> > http://www.cnn.com/2008/TECH/06/09/fastest.computer.ap/index.html?ire > >> f=newssearch > >> > >> However, they don't look likes blades to me, more like 2U IBM x > >> series servers. Perhaps those are the I/O nodes? > >> > >> Cheers, > >> > >> Bernard > >> _______________________________________________ > >> Beowulf mailing list, Beowulf@beowulf.org > >> To change your subscription (digest mode or unsubscribe) visit > >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > ---------- Forwarded message ---------- > > From: "Peter St. John" > > To: "Bernard Li" > > Date: Thu, 12 Jun 2008 20:16:19 +0000 > > Subject: Re: [Beowulf] Roadrunner picture > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf From jan.heichler at gmx.net Fri Jun 13 08:55:10 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <1213370368.5895.46.camel@mundo> References: <1213119210.8051.143.camel@mundo> <48513470.30308@scalableinformatics.com> <1213370368.5895.46.camel@mundo> Message-ID: <1589632152.20080613175510@gmx.net> Hallo Ramiro, Freitag, 13. Juni 2008, meintest Du: RAQ> By the way: RAQ> a) How many hops a Flextronics 10U 144 Port Modular is doing? 3 RAQ> b) And the others? 3 too. RAQ> c) How much latency am I loosing in each hop? (In the case of Voltaire RAQ> switches: ISR 9024 - 24 Ports: 140 ns ; ISR 2004 - 96 ports: 420 ns 150 ns is the usual value - maybe it is just 140. As far as i know all the switch vendors use the samel 24-port silicon by Mellanox. Because of that you find the same number of switchports everywhere: 24, 144 and 288. 288 is the maximum number of ports you can get when you use a 24port silicon as basis and build a 3-HOP Fat-Tree. RAQ> The alternatives are: RAQ> a) Start with a good 24 port swith and grow up loosing latency and RAQ> bandwidth You can use the 24-port switches to create a full bisectional bandwidth network if you want that. Since all the big switches are based on the 24-port silicon this is no problem. RAQ> c) Use the Flextronix 10U 144 Port Modular solution which will allow us RAQ> to scale well in a couple years But you have to pay now for an expansion that you might never get... Or want a fancy new network technology as soon as you upgrade. Regards, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080613/2af742f7/attachment.html From raq at cttc.upc.edu Fri Jun 13 09:52:45 2008 From: raq at cttc.upc.edu (Ramiro Alba Queipo) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <1589632152.20080613175510@gmx.net> References: <1213119210.8051.143.camel@mundo> <48513470.30308@scalableinformatics.com> <1213370368.5895.46.camel@mundo> <1589632152.20080613175510@gmx.net> Message-ID: <1213375965.5895.66.camel@mundo> On Fri, 2008-06-13 at 17:55 +0200, Jan Heichler wrote: > Hallo Ramiro, > > > RAQ> The alternatives are: > > > RAQ> a) Start with a good 24 port swith and grow up loosing latency > and > > RAQ> bandwidth > > > You can use the 24-port switches to create a full bisectional > bandwidth network if you want that. Since all the big switches are > based on the 24-port silicon this is no problem. Yes, But How many ports must I waste if I do not want to loose to much bandwidth. And about latency, It could be negligible compared to latency at the infiniband card (~3 microsecs in front of 480 nanosecs at switch), right? > > > RAQ> c) Use the Flextronix 10U 144 Port Modular solution which will > allow us > > RAQ> to scale well in a couple years > > > But you have to pay now for an expansion that you might never get... > Or want a fancy new network technology as soon as you upgrade. Well, I most probably will be needing 48 ports in five months and 120 ports in about a year Thanks for your answer Regards -- Aquest missatge ha estat analitzat per MailScanner a la cerca de virus i d'altres continguts perillosos, i es considera que està net. For all your IT requirements visit: http://www.transtec.co.uk From walid.shaari at gmail.com Fri Jun 13 10:31:38 2008 From: walid.shaari at gmail.com (Walid) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] RHEL5 network throughput/scalability In-Reply-To: <588c11220806130856j34c2949dq3ac538ecc4409ca0@mail.gmail.com> References: <588c11220806130856j34c2949dq3ac538ecc4409ca0@mail.gmail.com> Message-ID: 2008/6/13 Jason Clinton : > > We've seen fairly erratic behavior induced by newer drivers for NVidia > NForce-based NIC's with forcedeth. If that's your source NIC in the above > scenario, that could be the source of the issue as congestion timing has > probably changed. Have you tried updating your source NIC driver to > whichever is the newest? Nearly all NIC vendors that are incorporated on > server motherboards put out updated drivers on their websites. > > Jason, The NIC is broadcom (Bnx2 driver), I have, however i had to go to the RHEL5.1 kernel as it does not want to compile on the new kernels. however it did not change much, i have also played with the different congestion settings, and even though Vegas seems to be the most performance capable, it still did not solve it, I had one test that i did not write down where it did perform well, I have saved the sysctl and will check what parameters have made the difference. regards Walid -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080613/b7107d80/attachment.html From jan.heichler at gmx.net Fri Jun 13 10:43:31 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <1213375965.5895.66.camel@mundo> References: <1213119210.8051.143.camel@mundo> <48513470.30308@scalableinformatics.com> <1213370368.5895.46.camel@mundo> <1589632152.20080613175510@gmx.net> <1213375965.5895.66.camel@mundo> Message-ID: <9410452169.20080613194331@gmx.net> Hallo Ramiro, Freitag, 13. Juni 2008, meintest Du: RAQ> On Fri, 2008-06-13 at 17:55 +0200, Jan Heichler wrote: >> You can use the 24-port switches to create a full bisectional >> bandwidth network if you want that. Since all the big switches are >> based on the 24-port silicon this is no problem. RAQ> Yes, But How many ports must I waste if I do not want to loose to much RAQ> bandwidth. Exactly 50%. 12 Ports go to clients and 12 ports go to the spine. I can send you a sketch if you are interested. RAQ> And about latency, It could be negligible compared to latency at the RAQ> infiniband card (~3 microsecs in front of 480 nanosecs at switch), RAQ> right? The number of HOPs is always the same... 3 up to 288 ports. With the new 36 port silicon that will change. Up to 648 Clients with 3 HOPs if i'm not wrong. >> RAQ> c) Use the Flextronix 10U 144 Port Modular solution which will >> allow us >> RAQ> to scale well in a couple years >> But you have to pay now for an expansion that you might never get... >> Or want a fancy new network technology as soon as you upgrade. RAQ> Well, I most probably will be needing 48 ports in five months and 120 RAQ> ports in about a year 48 Ports is 6 24 Port Switches. here is a quick overview: client ports #of 24 port switches you need for full bisectional bandwidth 24 1 36 5 48 6 60 8 72 9 84 11 96 12 108 14 120 15 132 16 144 18 Somewhere around 11 and 14 is normally the break even for a 144-port switch - depends on your cost prices of course ;-) I hope i didn't miscalculate ;-) RAQ> Thanks for your answer Any time! Regards, Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080613/a7ab7f56/attachment.html From djholm at fnal.gov Fri Jun 13 12:05:22 2008 From: djholm at fnal.gov (Don Holmgren) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <1213370886.5895.53.camel@mundo> References: <1213119210.8051.143.camel@mundo> <1213370886.5895.53.camel@mundo> Message-ID: On Fri, 13 Jun 2008, Ramiro Alba Queipo wrote: > On Thu, 2008-06-12 at 10:08 -0500, Don Holmgren wrote: >> Ramiro - >> >> You might want to also consider buying just a single 24-port switch for your 22 >> nodes, and then when you expand either replace with a larger switch, or build a >> distributed switch fabric with a number of leaf switches connecting into a >> central spine switch (or switches). By the time you expand to the larger >> cluster, switches based on the announced 36-port Mellanox crossbar silicon will >> be available and perhaps per port prices will have dropped sufficiently to >> justify the purchase delay and the disruption at the time of expansion. > > Could you explain me this solution? I did not know about it As far as I know, all currently available commercial Infiniband switches are based on the Mellanox 24-port non-blocking silicon switch chip (InfiniScale III). The 96, 144, and 288 port modular switches from the various companies use a number of these individual chips in a layered (3-hop) design that provides full bisection bandwidth. One can also construct a full bisection bandwidth 144-port (say) switch out of twelve 24-port switches: out of the total 288 switch ports, 144 ports connect to nodes, and 144 ports connect to other switch ports. The latency should be identical to that of a 144-port chassis, as both would use three hops (disregarding the negligible ~ nanosecond per foot of extra cable length delay when using 24-port switches). Usually the per port cost for a large switch is less than the per port cost for a bunch of 24-port switches. When you don't need a full 144-port switch, you can either buy the large chassis and only buy a limited number of blades, or go with a set of 24-port switches. For smaller networks a set of 24-port switches is cheaper. The next generation switch silicon will be 36 ports (InfiniScale IV), rather than 24. Obviously I can't predict for certain that the large switches to be built out of this silicon will be cheaper than the current models, but it is reasonable to guess that this will be the case. > >> >> If your applications can tolerate some oversubscription (less than a 1:1 ratio >> of leaf-to-spine uplinks to leaf-to-node connections), a distributed switch >> fabric (leaf and spine) has the advantage of shorter (and cheaper) cables >> between the leaf switches and your nodes, and relatively fewer longer cables >> from the leaves back to the spine, compared with a single central switch. > > > What do you mean with a distributed switch fabric? > What is the difference with a modular solution? > > Thanks for your answer > > Regards I think both of these questions are answered above. But to be clear, by "distributed" I mean that instead of one large switch chassis one would use a number of 24-port switches. In this case it is very natural to put the individual switches next to their nodes. See, for example, the "A New Approach to Clustering - Distributed Federated Switches" white paper at the Mellanox web site. When the switches are next to the nodes, the cable plant can be a lot easier to deal with. Don't underestimate the pain of having 144 fairly hefty Infiniband cables all terminating into a 10U chassis. One additional item of note when using a distributed fabric: if your typical jobs use a small number of nodes, then it is quite possible to configure your batch scheduler so that the nodes belonging to an individual job all connect to the same leaf switch. This means that your messages only have to go through one switch hop, so latency is reduced compared with going through three hops in a large modular switch chassis (although I seriously doubt that the quarter microsecond of latency difference here matters to many codes). Perhaps of more significance, though, is that you can use oversubscription to lower the cost of your fabric. Instead of connecting 12 ports of a leaf switch to nodes and using the other 12 ports as uplinks, you might get away with 18 nodes and 6 uplinks, or 20 nodes and 4 uplinks. As core counts are increasing, this is becoming more and more viable for some applications. Don From patrick at myri.com Fri Jun 13 12:43:31 2008 From: patrick at myri.com (Patrick Geoffray) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: References: <1213119210.8051.143.camel@mundo> <1213370886.5895.53.camel@mundo> Message-ID: <4852CDE3.4030006@myri.com> Hi Don, Don Holmgren wrote: > latency difference here matters to many codes). Perhaps of more > significance, though, is that you can use oversubscription to lower the > cost of your fabric. Instead of connecting 12 ports of a leaf switch to > nodes and using the other 12 ports as uplinks, you might get away with > 18 nodes and 6 uplinks, or 20 nodes and 4 uplinks. As core counts are > increasing, this is becoming more and more viable for some applications. It's important to note that the "full-bisection" touted by vendors is on paper only. In reality, static routing provides full-bisection for a very small subset of patterns, the average effective bisection on a diameter-3 Clos is ~40% of link rate (adaptive routing improves that a lot, but breaks packet order on the wire which is a requirement for some network protocols). In practice, "paper" full-bisection is near free when using a single enclosure, since all spine cables are on the backplane. For larger networks, where you have to pay for real cables to the spine level, then it may make sense to be oversubscribed if the effective bisection is already bad (static routing), or if your collective communication on large jobs are not bandwidth bounded. However, the later is often false on many-cores. Patrick From walid.shaari at gmail.com Fri Jun 13 12:55:23 2008 From: walid.shaari at gmail.com (Walid) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] RHEL5 network throughput/scalability In-Reply-To: References: <588c11220806130856j34c2949dq3ac538ecc4409ca0@mail.gmail.com> Message-ID: Dear All, It is lame, however i managed to get the following kernel paramter to scale well in terms of both performance per node, and scalability over a high bandwidth low latency network net.ipv4.tcp_workaround_signed_windows = 1 net.ipv4.tcp_congestion_control = vegas net.ipv4.tcp_tso_win_divisor = 8 net.ipv4.tcp_rmem = 4096 87380 174760 net.ipv4.tcp_wmem = 4096 16384 131072 net.ipv4.tcp_mem = 786432 1048576 1572864 net.ipv4.route.max_size = 8388608 net.ipv4.route.gc_thresh = 524288 net.ipv4.icmp_ignore_bogus_error_responses = 0 net.ipv4.icmp_echo_ignore_broadcasts = 0 net.ipv4.tcp_max_orphans = 262144 net.core.netdev_max_backlog = 2000 regards Walid 2008/6/13 Walid : > 2008/6/13 Jason Clinton : > >> >> We've seen fairly erratic behavior induced by newer drivers for NVidia >> NForce-based NIC's with forcedeth. If that's your source NIC in the above >> scenario, that could be the source of the issue as congestion timing has >> probably changed. Have you tried updating your source NIC driver to >> whichever is the newest? Nearly all NIC vendors that are incorporated on >> server motherboards put out updated drivers on their websites. >> >> Jason, > > The NIC is broadcom (Bnx2 driver), I have, however i had to go to the > RHEL5.1 kernel as it does not want to compile on the new kernels. however it > did not change much, i have also played with the different congestion > settings, and even though Vegas seems to be the most performance capable, > it still did not solve it, I had one test that i did not write down where it > did perform well, I have saved the sysctl and will check what parameters > have made the difference. > > regards > > Walid -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080613/8080d153/attachment.html From prentice at ias.edu Fri Jun 13 13:03:49 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] User resource limits In-Reply-To: <20080609161232.GB11155@nlxdcldnl2.cl.intel.com> References: <484D4F29.9090704@ias.edu> <20080609161232.GB11155@nlxdcldnl2.cl.intel.com> Message-ID: <4852D2A5.1050803@ias.edu> Lombard, David N wrote: > On Mon, Jun 09, 2008 at 11:41:29AM -0400, Prentice Bisbal wrote: >> I would like to impose some CPU and memory limits on users that are hard >> limits that can't be changed/overridden by the users. What is the best >> way to do this? All I know is environment variables or shell commands >> done as the user (ulimit, for example). > > pam_limits and /etc/security/limits.conf > In limits.conf, what is the units for as, is it bytes, or kb? It doesn't say in the man page for limits.conf, and the page for setrlimit says bytes RLIMIT_AS. -- Prentice From dnlombar at ichips.intel.com Fri Jun 13 15:58:40 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] User resource limits In-Reply-To: <4852D2A5.1050803@ias.edu> References: <484D4F29.9090704@ias.edu> <20080609161232.GB11155@nlxdcldnl2.cl.intel.com> <4852D2A5.1050803@ias.edu> Message-ID: <20080613225840.GA8097@nlxdcldnl2.cl.intel.com> On Fri, Jun 13, 2008 at 04:03:49PM -0400, Prentice Bisbal wrote: > Lombard, David N wrote: > > On Mon, Jun 09, 2008 at 11:41:29AM -0400, Prentice Bisbal wrote: > >> I would like to impose some CPU and memory limits on users that are hard > >> limits that can't be changed/overridden by the users. What is the best > >> way to do this? All I know is environment variables or shell commands > >> done as the user (ulimit, for example). > > > > pam_limits and /etc/security/limits.conf > > > In limits.conf, what is the units for as, is it bytes, or kb? It doesn't > say in the man page for limits.conf, and the page for setrlimit says > bytes RLIMIT_AS. As does the kernel's mm/mmap.c and, for >=2.6.25, fs/proc/base.c Note, mmap.c has been validating memory requests against RLIMIT_AS since at least 2.6.3 (earliest kernel sources I have online); 2.6.25 includes a /proc file "limits" that displays the limits (see fs/proc/base.c). HTH -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From perry at piermont.com Fri Jun 13 19:21:05 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] RHEL5 network throughput/scalability In-Reply-To: (Walid's message of "Fri\, 13 Jun 2008 22\:55\:23 +0300") References: <588c11220806130856j34c2949dq3ac538ecc4409ca0@mail.gmail.com> Message-ID: <87od64lpta.fsf@snark.cb.piermont.com> A number of these seem rather odd, or unrelated to performance. Walid writes: > It is lame, however i managed to get the following kernel paramter to scale > well in terms of both performance per node, and scalability over a high > bandwidth low latency network > > net.ipv4.tcp_workaround_signed_windows = 1 This is a workaround for a buggy remote TCP. If you have a homogeneous network of linux boxes, it will have no effect. > net.ipv4.tcp_congestion_control = vegas I'm under the impression that the Vegas congestion control policy is not well loved by the experts on TCP performance. > net.ipv4.route.max_size = 8388608 This sets the size of the routing cache. You've set it to a rather large and fairly random number. > net.ipv4.icmp_ignore_bogus_error_responses = 0 > net.ipv4.icmp_echo_ignore_broadcasts = 0 Why would paying attention to bogus ICMPs and to ICMP broadcasts help performance? Neither should be prevalent enough to make any difference, and one would naively expect performance to be improved by ignoring such things, not by paying attention to them... > net.ipv4.tcp_max_orphans = 262144 I'm not clear on why this would help unless you were expecting really massive numbers of unattached sockets -- also you're saying that up to 16M of kernel memory can be used for this purpose... > net.core.netdev_max_backlog = 2000 This implies your processes are going to get massive numbers of TCP connections per unit time. Are they? It is certainly not a *general* performance improvement... Perry From csamuel at vpac.org Sat Jun 14 00:25:56 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] User resource limits In-Reply-To: <638248235.163311213426794764.JavaMail.root@zimbra.vpac.org> Message-ID: <511559027.163331213428356775.JavaMail.root@zimbra.vpac.org> ----- "David N Lombard" wrote: > Note, mmap.c has been validating memory requests against RLIMIT_AS > since at least 2.6.3 (earliest kernel sources I have online); Yup, it was glibc that changed from using brk() all the time to using mmap() for allocations > 128KB in 2.3. Looking at mm/mmap.c it seems that sys_brk() only checks RLIMIT_DATA, but I'm wondering if the call out to find_vma_intersection() ends up calling may_expand_vm() and hence enforce RLIMIT_AS as well ? > 2.6.25 includes a /proc file "limits" that displays the limits > (see fs/proc/base.c). Now that could be really handy, here's an example output from my home desktop: $ cat /proc/$$/limits Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited ms Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 71680 71680 processes Max open files 1024 1024 files Max locked memory 32768 32768 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 71680 71680 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From kus at free.net Sat Jun 14 17:26:24 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Powersave on Beowulf nodes Message-ID: What is about using of powersaved (and dbus and HAL daemons) on Beowulf nodes ? Currently I installed SuSE 10.3 where all the corresponding daemons are running (by default) at the runlevel=3. I simple added issuing of powersave -f at the end of booting. /proc/acpi/thermal_zone/ is empty, and powersave can't give me temperature and FANs information. I don't see now any serious advantages of powersaved daemon using in "performance" mode (using performance scheme). We have many jobs in SGE at every time moment, and underload situation (where it's reasonable to decrease CPUs frequency) is not the our danger :-) So I'm thinking about simple stopping of all the corresponding daemons. Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry Moscow From walid.shaari at gmail.com Sat Jun 14 22:32:29 2008 From: walid.shaari at gmail.com (Walid) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] RHEL5 network throughput/scalability In-Reply-To: <87od64lpta.fsf@snark.cb.piermont.com> References: <588c11220806130856j34c2949dq3ac538ecc4409ca0@mail.gmail.com> <87od64lpta.fsf@snark.cb.piermont.com> Message-ID: 2008/6/14 Perry E. Metzger : > > A number of these seem rather odd, or unrelated to performance. > > Walid writes: > > It is lame, however i managed to get the following kernel paramter to > scale > > well in terms of both performance per node, and scalability over a high > > bandwidth low latency network > > > > net.ipv4.tcp_workaround_signed_windows = 1 > > This is a workaround for a buggy remote TCP. If you have a homogeneous > network of linux boxes, it will have no effect. True, however one of the assumption we have is that the NFS Filer (runing some modifioed version of BSD )is broken > > > net.ipv4.tcp_congestion_control = vegas > > I'm under the impression that the Vegas congestion control policy is > not well loved by the experts on TCP performance. we were working on the assumption that there is congestion involved, and tried the several different algorithms involved, we did even try veno (that is mainly for wirless) just for the sake of testing, and vegas seemed to give us the boost in perfomance. my basic humble research did not show much when it comes to low latency, high bandwidth LAN networks, let me know if you do have any pointers, URLS that are against vegas or recomend otherwise, also i can see that most of these algorthims have options, that i just used thier defaults, not sure if this is the correct way to go about it?! > > > net.ipv4.route.max_size = 8388608 > > This sets the size of the routing cache. You've set it to a rather > large and fairly random number. that's the value in RHEL4U6 system that is working on the same network setup, just made a diff between the RHEL4 and RHEL5 and checked what values are different, and worked by trial and error from thier. > > net.ipv4.icmp_ignore_bogus_error_responses = 0 > > net.ipv4.icmp_echo_ignore_broadcasts = 0 > > Why would paying attention to bogus ICMPs and to ICMP broadcasts help > performance? Neither should be prevalent enough to make any > difference, and one would naively expect performance to be improved by > ignoring such things, not by paying attention to them... > agree, have to test them in isloation, as i said i was working out from trial and error based on the assumption that RHEL4 was working fine in the current environment > > net.ipv4.tcp_max_orphans = 262144 > > I'm not clear on why this would help unless you were expecting really > massive numbers of unattached sockets -- also you're saying that up to > 16M of kernel memory can be used for this purpose... removed, this was again from RHEL4, the default in RHEL5 is actually less > > net.core.netdev_max_backlog = 2000 > > This implies your processes are going to get massive numbers of TCP > connections per unit time. Are they? It is certainly not a *general* > performance improvement... this is what looks like happening actually, and so far that's the one paramter that looks like making it scale better, however i am going to investigate this further, and can share the information back to you, no respone yet from RH. regards Walid -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080615/3ab3dc92/attachment.html From diep at xs4all.nl Sun Jun 15 04:05:56 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Mersenne primes? In-Reply-To: <7899080805011310xe372098y85a3dc01fbeb2be3@mail.gmail.com> References: <7899080805011310xe372098y85a3dc01fbeb2be3@mail.gmail.com> Message-ID: <638550ED-F8C6-484F-9B9A-C12579CE2111@xs4all.nl> hi Mark, Your question is not so much about lucaslehmer, but rather about FFT and DWT. No doubt some attempts have been made to parallellize FFT already to multiply big numbers. DWT, as used by GIMPS, is also taking implicit modulo, basically speeding up a factor 2 nearly. That's what you lose when using some generic approach to multiply. Additionally the DWT from GIMPS has been written in SSE2 assembler, so it only works at x86 hardware that has some sort of SSE2 support. This is very bloody fast code optimized for intel P4 at the time. You won't be able to approach that speed writing C code. There is some code that parallellizes all this, yet it would prove more numbers to be composite when running it embarrassingly parallel. Not because it can't be done, but because that Woltman assembler is so bloody fast. In itself such FFT's can be done a lot faster in integers, just on paper. However all the cpu's seem to be faster for floating point. AMD is by far fastest right now for integer multiplication (oldie K8 beating core2 amazingly). See GMP mailing list for explanation. Intel seems to be faster for running the Woltman code. See benchmarks at mersenneforum.org Those x86/x64 cpu's are rather slow for multiplication. To be honest, finding a mersenne is not so interesting now, already too many join GIMPS. A single test single core is like a month with odds like 1 in a 100k it is a prime. Personally i'm trying to find some numbers p = (2^n + 1) / 3 busy around 1 million bits now. Note those won't win you the $100k, for 2 reasons. First of all they can't get proven prime easily, as they are "industry grade primes", secondly you need that army of 30 + Tflop or so that GIMPS is using. For future it would make sense IMHO to parallellize DWT within 1 socket. That allows better cache reusage. Bandwidth from RAM to the caches already is a problem. Let's not start discussing bandwidth that the highend network cards can deliver :) There is some C codes there that do this (note i have not managed to download it) and there is rumours Woltman also parallellized his own code one day, but most of those codes are indeed only well usable within 1 socket, already losing quite some throughput there. If your intention is to just do a single FFT as fast as possible, i'm sure there must be something out there. Would be fun to optimize code like that, yet unpaid without cluster access it is hard to get it done :) Right now the quickest way to find a real big 100% prime is to join primegrids search for 321 (p = k * 2^n - 1 where k = 3). They recently found another one. It also can use the same transform. For k's above 31, the assembler code runs a lot slower, so you lose big throughput already. Because you've got to test that many numbers to find a single prime, nearly all the dudes in that prime world, when at home, are busy with throughput anyway and not so much highend clusters. Highend clusters in that domain get used for other purposes, those busy with that will shut up and not post here data that is interesting to read. For me it is easy to shout something there as i'm currently unemployed looking for a job (a job abroad outside Netherlands would be cool too, feel free to ask my CV/resume) and in meantime am busy with an innocent computer chess engine. Where GUI's sell real well in computer games, engines are not commercial at all, they just search for the holy grail massively parallel, thereby combining hundreds of algorithms/enhancements. Vincent On May 1, 2008, at 10:10 PM, Mark Reynolds wrote: > I'm fairly new to Beowulfery but am reading TFM from > http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php (great book by > the way). Does anyone know of any programs to find Mersenne primes > that are suited to parallelised environments such as a cluster made up > of nodes with multi-core processors? I've found mprimes from > http://www.mersenne.org/ but the FAQ states: > Although, a program could be written for dual-CPU systems (it would be > quite time-consuming), the machine will still get more throughput by > working on separate exponents. > Has anybody tried this and can Lucas-Lehmer testing be effectively > parallalised? Currently, you have to run an instance for each core for > it to get any benefit. Also I'd like to modify it so it could, for > example, search between certain numerical ranges independent of the > work units sent out through the website or to take advantage of 64-bit > hardware. Can someone more knowledgeable about this sort of thing tell > me whether this is practical or if it has been done already? > > Thanks. > > -- > "To win one hundred victories in one hundred battles is not the > highest skill. To subdue the enemy without fighting is the highest > skill."? Sun-Tzu > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From diep at xs4all.nl Sun Jun 15 04:31:12 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <1210016466.4924.1.camel@Vigor13> References: <1210016466.4924.1.camel@Vigor13> Message-ID: Seems the next CELL is 100% confirmed double precision. Yet if you look back in history, Nvidia promises on this can be found years back. The only problem with hardware like Tesla is that it is rather hard to get technical information; like which instructions does Tesla support in hardware? This is crucial to know in order to speedup your code. It is already tough to get realworld codes on GPU's faster than at cpu's. The equivalent CPU code has been optimized real bigtime, knowing everything about hardware. How fast is latency from RAM when all 128 SP's are busy with that? Nvidia gives out zero information and doesn't support anyone either for this. That has to change in order to get GPU calculations more into mainstream. When i calculate on paper for some applications, a GPU can be potentially factor 4-8 faster than a standard quadcore 2.4ghz is right now. Getting that performance out of the GPU is more than a fulltime task however, without having indepth technical hardware data on the GPU. Vincent On May 5, 2008, at 9:40 PM, John Hearns wrote: > On Fri, 2008-05-02 at 14:05 +0100, Ricardo Reis wrote: >> Does anyone knows if/when there will be double floating point on >> those >> little toys from nvidia? >> >> > Ricardo, > I think CUDA is a gret concept, and am starting to work with it at > home. > I recently went to a talk by David Kirk, as part of the "world tour". > I think the answer to your question is Real Soon Now. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From landman at scalableinformatics.com Sun Jun 15 06:51:44 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: References: <1210016466.4924.1.camel@Vigor13> Message-ID: <48551E70.7070507@scalableinformatics.com> Vincent Diepeveen wrote: > Seems the next CELL is 100% confirmed double precision. > Yet if you look back in history, Nvidia promises on this can be found > years back. [scratches head /] Vincent, it may be possible that some of us on this list may in fact be bound by NDA (non-disclosure agreements), and cannot talk about hardware which has not been announced. > The only problem with hardware like Tesla is that it is rather hard to > get technical information; like which instructions does Tesla support in > hardware? [scratches head /] Hmmm .... www.nvidia.com/cuda is a good starting point. I might suggest http://www.nvidia.com/object/cuda_what_is.html as a start on information. More to the point, you can look at http://www.nvidia.com/object/cuda_develop.html -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From Shainer at mellanox.com Sun Jun 15 09:08:20 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <4852CDE3.4030006@myri.com> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> > > latency difference here matters to many codes). Perhaps of more > > significance, though, is that you can use oversubscription to lower > > the cost of your fabric. Instead of connecting 12 ports of a leaf > > switch to nodes and using the other 12 ports as uplinks, > you might get > > away with > > 18 nodes and 6 uplinks, or 20 nodes and 4 uplinks. As core > counts are > > increasing, this is becoming more and more viable for some > applications. > > It's important to note that the "full-bisection" touted by > vendors is on paper only. In reality, static routing provides > full-bisection for a very small subset of patterns, the > average effective bisection on a > diameter-3 Clos is ~40% of link rate (adaptive routing > improves that a lot, but breaks packet order on the wire > which is a requirement for some network protocols). > Static routing is the best approach if your pattern is known. In other cases it depends on the applications. LANL and Mellanox have presented a paper on static routing and how to get the maximum of it last ISC. There are cases where adaptive routing will show a benefit, and this is why we see the IB vendors add adaptive routing support as well. But in general, the average effective bandwidth is much much higher than the 40% you claim. > In practice, "paper" full-bisection is near free when using a > single enclosure, since all spine cables are on the > backplane. For larger networks, where you have to pay for > real cables to the spine level, then it may make sense to be > oversubscribed if the effective bisection is already bad > (static routing), or if your collective communication on > large jobs are not bandwidth bounded. However, the later is > often false on many-cores. There are some vendors that uses only the 24 port switches to build very large scale clusters - 3000 nodes and above, without any oversubscription, and they find it more cost effective. Using single enclosures is easier, but the cables are not expensive and you can use the smaller components. I used the 24 ports switches to connect my 96 node cluster. I will replace my current setup with the new 36 InfiniBand port switches this month, since they provide lower latency and adaptive routing capabilities. And if you are bandwidth bounded, using IB QDR will help. You will be able to drive more than 3GB/s from each server. From landman at scalableinformatics.com Sun Jun 15 09:36:15 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> Message-ID: <485544FF.9090101@scalableinformatics.com> On a different note ... Gilad Shainer wrote: > routing capabilities. And if you are bandwidth bounded, using IB QDR > will help. You will be able to drive more than 3GB/s from each server. This is theoretical max, correct? What sort of real observed DMA bandwidth are people seeing with the QDR adapters? That is, how quickly can I copy 32MB from main memory to the PCIe (even gen2) based card? Or how quickly (e.g. what is the real observed data rate) for copying streaming data, say from a nice NFS over RDMA session, to the card? Please correct me if I am wrong, but I suspect that this will be one of rate limiting factors at least in the near term. Not that this is a bad thing, 3GB/s is quite good, but if the practical best case memory->PCIe bandwidth only let us get 2GB/s, or even 1.5 GB/s, this is important to know. If you or someone out there with QDR could measure this, I think a few of us (at least on the storage side) would like to know this. If someone wants to loan us cards and cables (cough cough), we would be happy to do this measurement between pairs of JackRabbits. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From diep at xs4all.nl Sun Jun 15 10:36:26 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48551E70.7070507@scalableinformatics.com> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> Message-ID: <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> Joseph, I'm even licensed CUDA developer by Nvidia for Tesla, but even the documents there are very very poor. Knowing latencies is really important. Another big problem is that if you write such a program measuring the latencies, no dude is gonna run it. The promises of nvidia was for videocards long ago released, all of them being single precision. We know that for sure as most of those videocards have been released years ago already. Not to mention the 8800. If you look however to the latest supercomputer announced, 1 peta instructions a second @ 12960 new generation CELL cpu's, that means a tad more than 77 gflop double precision for each CELL. It is a bit weird if you claim to be NDA bound, whereas the news has it in big capitals what the new IBM CELL can deliver. See for example: http://www.cbc.ca/cp/technology/080609/z060917A.html Also i calculated back that all powercosts together it is about 410 watt a node, each node having a dual cell cpu. That's network + harddrives and RAM together of course. I calculated 2.66 MW for the entire machine based upon this article. Sounds reasonably good to me. Now interesting to know is how they are gonna use that cpu power effectively. In any case it is a lot better than what the GPU's can deliver so far in practical situations known to me. So claims by programmers other than the nvidia engineer claims. Especially stuff like matrix calculations, as the weak part of the GPU hardware is the latency to and from the machine RAM (not to confuse with device RAM). From/to 8800 hardware a reliable person i know measured it at 3000 messages a second, which would make it about several hundreds of microseconds of communication latency speed. So a very reasonable question to ask is what the latency is from the stream processors to the device RAM. A 8800 document i read says 600 cycles. It doesn't mention for how many streamprocessors this is though. Also surprising to know i learned that RAM lookups do not get cached. That means a lot of extra work when 128 stream processors hammer regurarly onto the device RAM for stuff that CPU's simply cache in their L1 or L2 cache and todays even L3 caches. So knowing such technical data is total crucial as there is no way to escape the memory controllers latency in a lot of different software that searches for the holy grail. Thanks, Vincent On Jun 15, 2008, at 3:51 PM, Joe Landman wrote: > > > Vincent Diepeveen wrote: >> Seems the next CELL is 100% confirmed double precision. >> Yet if you look back in history, Nvidia promises on this can be >> found years back. > > [scratches head /] > > Vincent, it may be possible that some of us on this list may in > fact be bound by NDA (non-disclosure agreements), and cannot talk > about hardware which has not been announced. > > >> The only problem with hardware like Tesla is that it is rather >> hard to >> get technical information; like which instructions does Tesla >> support in hardware? > > [scratches head /] > > Hmmm .... www.nvidia.com/cuda is a good starting point. > > I might suggest http://www.nvidia.com/object/cuda_what_is.html as a > start on information. More to the point, you can look at http:// > www.nvidia.com/object/cuda_develop.html > > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman@scalableinformatics.com > web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com > phone: +1 734 786 8423 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > From hahn at mcmaster.ca Sun Jun 15 10:44:21 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> Message-ID: > Static routing is the best approach if your pattern is known. In other sure, but how often is the pattern actually known? I mean in general: aren't most clusters used for multiple, shifting purposes? > There are some vendors that uses only the 24 port switches to build very > large scale clusters - 3000 nodes and above, without any > oversubscription, and they find it more cost effective. Using single so the switch fabric would be a 'leaf' layer with 12 up and 12 down, and a top layer with 24 down, right? so 3000 nodes means 250 leaves and 125 tops, 9000 total ports so 4500 cables. > enclosures is easier, but the cables are not expensive and you can use > the smaller components. in federated networks, I think cables wind up being 15-20% of the network price. for instance, if we take the simplest possible approach, and equip this 3000-node cluster with a non-blocking federated fabric (assuming just sdr) from colfax's current price list: subtot unit n what 375000 125 3000 ib nic 117000 39 3000 1m host ib cables 148500 99 1500 8m leaf-top ib cables 900000 2400 375 24pt unman switch 1540500 total (cable 17%) I'm still confused about IB pricing, since the street price of nics, cables and switches are dramatically more expensive than colfax. (to the paranoid, colfax would appear to be a mellanox shell company...) for completeness, here's the same bom with "normal" public prices: subtot unit n what 2100000 700 3000 ib nic 330000 110 3000 1m host ib cables 330000 220 1500 8m leaf-top ib cables 1500000 4000 375 24pt unman switch 4260000 total (cable 15%) interestingly, if nodes were about 3700 apiece (about what you'd expect for intel dual-socket quad-core 2G/core), the interconnect winds up being 28% of the cost. From Shainer at mellanox.com Sun Jun 15 14:46:05 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <485544FF.9090101@scalableinformatics.com> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F0129E864@mtiexch01.mti.com> Joe Landman wrote: > On a different note ... > > Gilad Shainer wrote: > > > routing capabilities. And if you are bandwidth bounded, > using IB QDR > > will help. You will be able to drive more than 3GB/s from > each server. > > This is theoretical max, correct? What sort of real observed > DMA bandwidth are people seeing with the QDR adapters? That > is, how quickly can I copy 32MB from main memory to the PCIe > (even gen2) based card? Or how quickly (e.g. what is the > real observed data rate) for copying streaming data, say from > a nice NFS over RDMA session, to the card? > No, this is the actual number. IB QDR theoretical max is 4GB/s (uni-directional BW) > Please correct me if I am wrong, but I suspect that this will > be one of rate limiting factors at least in the near term. > > Not that this is a bad thing, 3GB/s is quite good, but if the > practical best case memory->PCIe bandwidth only let us get > 2GB/s, or even 1.5 GB/s, this is important to know. > PCIe x8 Gen1 is maxed in around 1.5-1.6GB/s, and PCIe x8 Gen2 in around 3.3-3.4GB/s. There are multiple PCIe Gen2 servers out there from multiple vendors. I am using Supermicro based for my testing, but there are other solutions. Here is some non-tuned MPI results bandwidth (IB QDR on PCIe x8 Gen2 servers): Bytes PingPong (MB/s) SendRecv (MB/s) 512 695.71 707.2 1024 1160.13 1178.18 2048 1664.83 1972.34 4096 1957.76 2597.41 8192 2165.76 3025.58 16384 2642.91 4187.19 32768 2962.5 5076.31 65536 3165.03 5693.37 131072 3274.05 6078.59 262144 3332.28 6287.49 524288 3362.12 6399.46 1048576 3377.28 6458.3 2097152 3385.08 6489.52 4194304 3388.02 6457.68 > If you or someone out there with QDR could measure this, I > think a few of us (at least on the storage side) would like > to know this. If someone wants to loan us cards and cables > (cough cough), we would be happy to do this measurement > between pairs of JackRabbits. I might be able to help you here. Gilad. From james.p.lux at jpl.nasa.gov Sun Jun 15 15:42:27 2008 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:17 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> Message-ID: <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> Quoting Vincent Diepeveen , on Sun 15 Jun 2008 10:36:26 AM PDT: > Joseph, > > It is a bit weird if you claim to be NDA bound, whereas the news has it in > big capitals what the new IBM CELL can deliver. > I don't know that Joseph was claiming to be NDA bound, just that there might be people who could potentially comment who are. But, in any case, more than once I've been in a situation where I was unable to discuss or otherwise acknowledge information that was public because of either classification guidelines or NDAs. There's a big difference between carefully controlled press releases and someone responding to random questions. And, likewise there's a big difference between someone making intelligent speculation about a technology and having certain knowledge. From hahn at mcmaster.ca Sun Jun 15 21:47:21 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> Message-ID: > It is a bit weird if you claim to be NDA bound, whereas the news has it in > big capitals what the new IBM CELL can deliver. I thought he was referring to double-precision on Nvidia gpus, which have indeed not been shipped publicly (afaik). > So a very reasonable question to ask is what the latency is from the stream > processors to the device RAM. sure, they're GPUs, not general-purpose coprocessors. but both AMD and Intel are making noises about changing this. AMD seems to be moving GPU units on-chip, where they would presumably share L3, cache coherency, etc. Intel's Larrabee approach seems to be to add wider vector units to normal x86 cores (and more of them). I personally think the latter is much more promising from an HPC perspective. but then again, both AMD and Nvidia have major cred on the line - they have to deliver competitive levels of the traditional GPU programming model. From diep at xs4all.nl Mon Jun 16 08:11:56 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:17 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> Message-ID: Jim, Reality is that the person who SPECULATES that something is good also hides behind a DNA. This is another typical case of that. On the one hand claiming a NDA, on the other hand implying that is a very good product that will get released. My world is a bit binary there. Especially because of VERY BAD experiences in the past. If NVIDIA clearly indicates towards me that they aren't gonna release any technical data nor support anyone with technical data (the word NDA has not been used by the way by either party, it was a very clear NJET from 'em), and all information i've got so far is very dissapointing, then hiding behind a NDA is considered very bad manners. Either shut up entirely or do not hide behind a NDA. I won't quote persons nor the manufacturer they represent, but i remember such speculations all too well from the past. Hard statistics show that in like 5% of the cases the speculant was correct. In the other 95% of the cases the reason was that they didn't know a fok about what their competitors were gonna show up with. Tunnelvision is common. Good products don't need this type of NDA- promotion. In case of NVIDIA if you google a tad you will figure out that the double precision promise has been done more than once, many many years ago, and each time we got dissappointed. In the end even it was the case that the graphics cards cannot get used for gpu programming, for whatever technical reason (pixel units blocking bandwidth to the RAM or whatever giving a bottleneck of a factor 5 slowdown or so) Then instead of a $200 pci-e card, we needed to buy expensive Tesla's for that, without getting very relevant indepth technical information on how to program for that type of hardware. The few trying on those Tesla's, though they won't ever post this as their job is fulltime GPU programming, report so far very dissappointing numbers for applications that really matter for our nations. Truth is that you can always do a claim of 1 teraflop of computing power. If that doesn't get backupped by technical documents how to get it out of the hardware if your own testprograms show that you can't get that out of the hardware, it is rather useless to start programming for such a platform. It is questionable whether it is interesting to design some algorithms for GPU's; it takes endless testing of every tiny detail to figure out what the GPU can and cannot do and to get accurate timings. By the time you finish with that, you can also implement the same design in FPGA or ASIC/VLSI whatever. As that is of course the type of interested parties in GPU programming; considering the amount of computing power they need, for the same budget they can also make their own CPU's. For other companies that i tried to get interested, there is a lot of hesitation to even *investigate* that hardware, let alone give a contract job to port their software to such hardware. Nvidia for all those civilian and military parties is very very unattractive as of now. IBM now shows up with a working supercomputer using new generation CELL processors which have a sustained 77 Gflop double precision a chip which means a tad more than 150 Gflop for each node. Each rackmount node is relative cheap and performs very well. 1 Petaflop @ 2.66 MW, that's really good. Vincent On Jun 16, 2008, at 12:42 AM, Jim Lux wrote: > Quoting Vincent Diepeveen , on Sun 15 Jun 2008 > 10:36:26 AM PDT: > >> Joseph, >> >> It is a bit weird if you claim to be NDA bound, whereas the news >> has it in >> big capitals what the new IBM CELL can deliver. >> > > I don't know that Joseph was claiming to be NDA bound, just that > there might be people who could potentially comment who are. > > But, in any case, more than once I've been in a situation where I > was unable to discuss or otherwise acknowledge information that was > public because of either classification guidelines or NDAs. > There's a big difference between carefully controlled press > releases and someone responding to random questions. And, likewise > there's a big difference between someone making intelligent > speculation about a technology and having certain knowledge. > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From hahn at mcmaster.ca Mon Jun 16 08:22:55 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Powersave on Beowulf nodes In-Reply-To: References: Message-ID: > What is about using of powersaved (and dbus and HAL daemons) on Beowulf nodes > ? ... > thinking about simple stopping of all the corresponding daemons. I would not even have these desktop-ish things installed. IMO, a cluster compute node should be quite stripped down - no extra daemons, minimal UI stuff. From prentice at ias.edu Mon Jun 16 08:38:44 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: References: <1210016466.4924.1.camel@Vigor13> Message-ID: <48568904.3020307@ias.edu> Vincent Diepeveen wrote: > > That has to change in order to get GPU calculations more into mainstream. > > When i calculate on paper for some applications, a GPU can be potentially > factor 4-8 faster than a standard quadcore 2.4ghz is right now. > > Getting that performance out of the GPU is more than a fulltime task > however, > without having indepth technical hardware data on the GPU. Completely untrue. One of my colleagues, who does a lot of work with GPU processors for astrophysics calculations, was able to increase the performance of the MD5 algorithm by ~100x with about 1.5 days of work. He called this this code that he wrote "(totally unoptimized, a straight CUDA C implementation of Rivest's algorithm". He tinkered some more, adding some optimizations, and I believe he ended up with 350x performance improvement. Here, I quote his e-mail on his first round of coding that he sent me: The other day in NYC on HPC-UG meeting someone mentioned that GPUs would be perfect for password cracking, with which I wholeheartedly agreed (on theoretical grounds). But theory is nothing without experiment :) , so I spent the last night and this morning writing a GPU MD5 hash routine (totally unoptimized, a straight CUDA C implementation of Rivest's algorithm). The results? * GPU (single GeForce 8800 Ultra on cylon): 57,640,967.264473 hash/second * The same algorithm on the CPU (Intel(R) Core(TM)2 Quad CPU Q6700 @ 2.66GHz on cylon): 543,839.652381 hash/second A factor of ~100 difference. Sweet. Another point of comparison: the fastest, assembly-level optimized x86 MD5 code, running on a _dual_ 3.2 GHz Xeon (see http://c3rb3r.openwall.net/mdcrack/) can do 42e6 hash/sec. And remember, I wrote the CUDA code in a day and a half, with _no_ optimization. Nice. In another words, one GPU card with an amateurishly written MD5 code can brute-force crack an 8-character MD5 hashed password consisting of [0-9A-Za-z] in about 6 weeks. Now imagine if someone who knew what they were doing optimized the code, and got a cluster of Tesla's instead of a single gaming card that I used.... Cool :-) . -- Prentice From Craig.Tierney at noaa.gov Mon Jun 16 08:44:58 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> Message-ID: <48568A7A.50509@noaa.gov> Mark Hahn wrote: >> It is a bit weird if you claim to be NDA bound, whereas the news has >> it in >> big capitals what the new IBM CELL can deliver. > > I thought he was referring to double-precision on Nvidia gpus, > which have indeed not been shipped publicly (afaik). An article posted today about the GTX280, which is to be release tomorrow, states that the GTX280 has "support for the IEEE-754R double-precision floating-point standard." http://www.maximumpc.com/sites/maximumpc.com/themes/maximumpc/wow.php?back=article/unveiled_nvidias_next_gen_gpu Craig > >> So a very reasonable question to ask is what the latency is from the >> stream processors to the device RAM. > > sure, they're GPUs, not general-purpose coprocessors. but both AMD and > Intel are making noises about changing this. AMD seems to be moving GPU > units on-chip, where they would presumably share L3, cache coherency, > etc. Intel's Larrabee approach seems to be to add wider vector units to > normal x86 cores (and more of them). I personally think the latter is > much more promising from an HPC perspective. but then again, both AMD > and Nvidia have major cred on the line - they have to deliver competitive > levels of the traditional GPU programming model. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Craig Tierney (craig.tierney@noaa.gov) From john.leidel at gmail.com Mon Jun 16 08:56:41 2008 From: john.leidel at gmail.com (John Leidel) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48568A7A.50509@noaa.gov> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <48568A7A.50509@noaa.gov> Message-ID: <1213631801.6549.0.camel@e521.site> Another article on the same card posted here: http://www.pcper.com/article.php?aid=578 On Mon, 2008-06-16 at 09:44 -0600, Craig Tierney wrote: > Mark Hahn wrote: > >> It is a bit weird if you claim to be NDA bound, whereas the news has > >> it in > >> big capitals what the new IBM CELL can deliver. > > > > I thought he was referring to double-precision on Nvidia gpus, > > which have indeed not been shipped publicly (afaik). > > An article posted today about the GTX280, which is to be release tomorrow, > states that the GTX280 has "support for the IEEE-754R double-precision > floating-point standard." > > http://www.maximumpc.com/sites/maximumpc.com/themes/maximumpc/wow.php?back=article/unveiled_nvidias_next_gen_gpu > > Craig > > > > > >> So a very reasonable question to ask is what the latency is from the > >> stream processors to the device RAM. > > > > sure, they're GPUs, not general-purpose coprocessors. but both AMD and > > Intel are making noises about changing this. AMD seems to be moving GPU > > units on-chip, where they would presumably share L3, cache coherency, > > etc. Intel's Larrabee approach seems to be to add wider vector units to > > normal x86 cores (and more of them). I personally think the latter is > > much more promising from an HPC perspective. but then again, both AMD > > and Nvidia have major cred on the line - they have to deliver competitive > > levels of the traditional GPU programming model. > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > From James.P.Lux at jpl.nasa.gov Mon Jun 16 09:09:04 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:17 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> Message-ID: <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> At 08:11 AM 6/16/2008, Vincent Diepeveen wrote: >Jim, > >Reality is that the person who SPECULATES that something is good >also hides behind a DNA. This is another typical case of that. perhaps.. >On the one hand claiming a NDA, on the other hand implying that is a >very good product that will get >released. Perhaps.. It's also that someone might be under NDA, be involved in the technical side of a development, but not be aware of the machinations of the marketing side. I'll bet more than one person has seen features added or removed because of "product positioning" after they last saw the thing they worked on. You might toil on a project, it gets released to internal manufacturing, and 6 to 12 months later, it pops out on the market and has significant differences from what you last saw. ( At a place I used to work, there were always comments about not letting the engineers go to the trade show where the product was being demoed..) >My world is a bit binary there. Especially because of VERY BAD >experiences in the past. With NDAs? (I'm sure lots of people have had less than wonderful experiences in that regard) >Either shut up entirely or do not hide behind a NDA. That's kind of hard when one has expertise in an area, and one wants to correct a misinterpretation or misstatement made by someone without as many facts to hand. (On the other hand, there's always the risk that the commenter is themself missing part of the story...) In the other 95% of the cases the reason was that they didn't know a >fok about what their competitors were gonna show up with. >Tunnelvision is common. Good products don't need this type of NDA- promotion. Actually, they do. In a perfect world, the inherent quality would result in the world beating a path to your door to get your inherently superior product. In a real world, you might not have the resources to bring the product to market as quickly as someone else, so you need time to get IP protection in place, for instance. >In case of NVIDIA if you google a tad you will figure out that the >double precision promise has been done more than once, >many many years ago, and each time we got dissappointed. Well.. I suspect that you didn't actually pay Nvidia for that capability, so the promise is just a marketing pledge, which is of no real legal value. (except as noted below). I can see that Nvidia can make a wise business decision to support or not support some capability based on the cost to provide it vs revenue they'll get. (Note that if you're in a situation of market power, and you announce capabilities or products that you have no real intention of producing, just to scare off the competitors, you can get into trouble. IBM S/360 is a case in point. Proving it is another matter, eh?) >Then instead of a $200 pci-e card, we needed to buy expensive Tesla's >for that, without getting very relevant indepth technical >information on how to program for >that type of hardware. That's the price one pays for being on the fringes of the mainstream. Go out and pay $10-20K for a custom coprocessor card from a small volume company and the mfr will pay a lot more attention to you. For an Nvidia, with a half a billion a year in revenue, the niche supercomputing market is a pimple on a pimple on a pimple of their behind. >The few trying on those Tesla's, though they won't ever post this as >their job is fulltime GPU programming, report so far very >dissappointing numbers for applications that >really matter for our nations. if they really matter, then serious money needs to be thrown at it. While I'm not generally an apologist for the "fiduciary responsibility to the shareholder" mindset, merely because something is interesting or intellectually valuable doesn't get it funded. >Truth is that you can always do a claim of 1 teraflop of computing >power. If that doesn't get backupped by technical documents >how to get it out of the hardware if your own testprograms show that >you can't get that out of the hardware, it is rather useless to start > programming for such a platform. Yep.. that's why *I* always want to see the documents before committing significant development resources to a project. More than once, I've been burned by someone's great idea that didn't pan out. >It is questionable whether it is interesting to design some >algorithms for GPU's; it takes endless testing of every tiny detail >to figure out >what the GPU can and cannot do and to get accurate timings. This can appeal to a certain type of person. It's like tweaking the engine in a car, and one does it, usually, for the challenge, not because it's a cost effective way to solve a problem. It's also attractive to someone who has a lot more time than money. >By the >time you finish with that, you can also implement the same design in >FPGA or ASIC/VLSI whatever. One can, but the cost to make an ASIC is pretty high (figure $1M for a spin). You can buy an awful lot of tinkering and probing time for that that million bucks. (about 8000-10000 hours). FPGAs don't have the flops/watt efficiency that an ASIC can get to, although they are getting better. >As that is of course the type of >interested parties in GPU programming; >considering the amount of computing power they need, for the same >budget they can also make their own CPU's. > >For other companies that i tried to get interested, there is a lot of >hesitation to even *investigate* that hardware, let alone give a >contract job to port their software to such hardware. Nvidia for all those >civilian and military parties is very very unattractive as of now. Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's scale of operations. We face this all the time with NASA work. Semiconductor manufacturers have no real reason to produce special purpose or customized versions of their products for space use, because they can sell all they can make to the consumer market. More than once, I've had a phone call along the lines of this: "Jim: I'm interested in your new ABC321 part." "Rep: Great. I'll just send the NDA over and we can talk about it." "Jim: Great, you have my email and my fax # is..." "Rep: By the way, what sort of volume are you going to be using?" "Jim: Oh, 10-12.." "Rep: thousand per week, excellent..." "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every year." "Rep: Oh..." {Well, to be fair, it's not that bad, they don't hang up on you.. but that's the idea... and that's before we get into things like lot traceability} From raysonlogin at gmail.com Mon Jun 16 09:13:27 2008 From: raysonlogin at gmail.com (Rayson Ho) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Powersave on Beowulf nodes In-Reply-To: References: Message-ID: <73a01bf20806160913q112b521fvf92cfb4496597c27@mail.gmail.com> Grid Engine can directly control the nodes and do power saving, see: http://wiki.gridengine.info/wiki/index.php/PowerSaving http://code.google.com/p/rocks-solid/wiki/PowerSaveAlgorithm Rayson On Sat, Jun 14, 2008 at 7:26 PM, Mikhail Kuzminsky wrote: > What is about using of powersaved (and dbus and HAL daemons) on Beowulf > nodes ? > > Currently I installed SuSE 10.3 where all the corresponding daemons are > running (by default) at the runlevel=3. I simple added issuing of > > powersave -f > at the end of booting. > /proc/acpi/thermal_zone/ is empty, and powersave can't give me temperature > and FANs information. I don't see now any serious advantages of powersaved > daemon using in "performance" mode (using performance scheme). > We have many jobs in SGE at every time moment, and underload situation > (where it's reasonable to decrease CPUs frequency) is not the our danger :-) > So I'm thinking about simple stopping of all the corresponding daemons. > > Mikhail Kuzminsky > Computer Assistance to Chemical Research Center > Zelinsky Institute of Organic Chemistry > Moscow > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From gerry.creager at tamu.edu Mon Jun 16 09:39:41 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:07:17 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> Message-ID: <4856974D.1050100@tamu.edu> MOD +2: Informative. Jim Lux wrote: > At 08:11 AM 6/16/2008, Vincent Diepeveen wrote: >> Jim, >> >> Reality is that the person who SPECULATES that something is good >> also hides behind a DNA. This is another typical case of that. > > > perhaps.. > > >> On the one hand claiming a NDA, on the other hand implying that is a >> very good product that will get >> released. > > Perhaps.. It's also that someone might be under NDA, be involved in the > technical side of a development, but not be aware of the machinations of > the marketing side. I'll bet more than one person has seen features > added or removed because of "product positioning" after they last saw > the thing they worked on. You might toil on a project, it gets released > to internal manufacturing, and 6 to 12 months later, it pops out on the > market and has significant differences from what you last saw. > > ( At a place I used to work, there were always comments about not > letting the engineers go to the trade show where the product was being > demoed..) > > > >> My world is a bit binary there. Especially because of VERY BAD >> experiences in the past. > > > With NDAs? (I'm sure lots of people have had less than wonderful > experiences in that regard) > > >> Either shut up entirely or do not hide behind a NDA. > > That's kind of hard when one has expertise in an area, and one wants to > correct a misinterpretation or misstatement made by someone without as > many facts to hand. (On the other hand, there's always the risk that the > commenter is themself missing part of the story...) > > > In the other 95% of the cases the reason was that they didn't know a >> fok about what their competitors were gonna show up with. >> Tunnelvision is common. Good products don't need this type of NDA- >> promotion. > > Actually, they do. In a perfect world, the inherent quality would > result in the world beating a path to your door to get your inherently > superior product. In a real world, you might not have the resources to > bring the product to market as quickly as someone else, so you need time > to get IP protection in place, for instance. > > > > >> In case of NVIDIA if you google a tad you will figure out that the >> double precision promise has been done more than once, >> many many years ago, and each time we got dissappointed. > > Well.. I suspect that you didn't actually pay Nvidia for that > capability, so the promise is just a marketing pledge, which is of no > real legal value. (except as noted below). I can see that Nvidia can > make a wise business decision to support or not support some capability > based on the cost to provide it vs revenue they'll get. > > (Note that if you're in a situation of market power, and you announce > capabilities or products that you have no real intention of producing, > just to scare off the competitors, you can get into trouble. IBM S/360 > is a case in point. Proving it is another matter, eh?) > > > >> Then instead of a $200 pci-e card, we needed to buy expensive Tesla's >> for that, without getting very relevant indepth technical information >> on how to program for >> that type of hardware. > > That's the price one pays for being on the fringes of the mainstream. > Go out and pay $10-20K for a custom coprocessor card from a small volume > company and the mfr will pay a lot more attention to you. For an > Nvidia, with a half a billion a year in revenue, the niche > supercomputing market is a pimple on a pimple on a pimple of their behind. > > >> The few trying on those Tesla's, though they won't ever post this as >> their job is fulltime GPU programming, report so far very >> dissappointing numbers for applications that >> really matter for our nations. > > if they really matter, then serious money needs to be thrown at it. > While I'm not generally an apologist for the "fiduciary responsibility > to the shareholder" mindset, merely because something is interesting or > intellectually valuable doesn't get it funded. > > > >> Truth is that you can always do a claim of 1 teraflop of computing >> power. If that doesn't get backupped by technical documents >> how to get it out of the hardware if your own testprograms show that >> you can't get that out of the hardware, it is rather useless to start >> programming for such a platform. > > Yep.. that's why *I* always want to see the documents before committing > significant development resources to a project. More than once, I've > been burned by someone's great idea that didn't pan out. > > >> It is questionable whether it is interesting to design some >> algorithms for GPU's; it takes endless testing of every tiny detail >> to figure out >> what the GPU can and cannot do and to get accurate timings. > > This can appeal to a certain type of person. It's like tweaking the > engine in a car, and one does it, usually, for the challenge, not > because it's a cost effective way to solve a problem. It's also > attractive to someone who has a lot more time than money. > > > >> By the >> time you finish with that, you can also implement the same design in >> FPGA or ASIC/VLSI whatever. > > One can, but the cost to make an ASIC is pretty high (figure $1M for a > spin). You can buy an awful lot of tinkering and probing time for that > that million bucks. (about 8000-10000 hours). > > FPGAs don't have the flops/watt efficiency that an ASIC can get to, > although they are getting better. > > >> As that is of course the type of >> interested parties in GPU programming; >> considering the amount of computing power they need, for the same >> budget they can also make their own CPU's. >> >> For other companies that i tried to get interested, there is a lot of >> hesitation to even *investigate* that hardware, let alone give a >> contract job to port their software to such hardware. Nvidia for all >> those >> civilian and military parties is very very unattractive as of now. > > Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's > scale of operations. We face this all the time with NASA work. > Semiconductor manufacturers have no real reason to produce special > purpose or customized versions of their products for space use, because > they can sell all they can make to the consumer market. More than once, > I've had a phone call along the lines of this: > > "Jim: I'm interested in your new ABC321 part." > "Rep: Great. I'll just send the NDA over and we can talk about it." > "Jim: Great, you have my email and my fax # is..." > "Rep: By the way, what sort of volume are you going to be using?" > "Jim: Oh, 10-12.." > "Rep: thousand per week, excellent..." > "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every > year." > "Rep: Oh..." > > {Well, to be fair, it's not that bad, they don't hang up on you.. but > that's the idea... and that's before we get into things like lot > traceability} > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From christian.bell at qlogic.com Mon Jun 16 10:32:17 2008 From: christian.bell at qlogic.com (Christian Bell) Date: Thu Mar 18 01:07:17 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> References: <4852CDE3.4030006@myri.com> <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> Message-ID: <20080616173217.GS9479@mv.qlogic.com> On Sun, 15 Jun 2008, Gilad Shainer wrote: > Static routing is the best approach if your pattern is known. In other > cases it depends on the applications. LANL and Mellanox have presented a > paper on static routing and how to get the maximum of it last ISC. There > are cases where adaptive routing will show a benefit, and this is why we > see the IB vendors add adaptive routing support as well. But in general, > the average effective bandwidth is much much higher than the 40% you > claim. As Mark Hahn pointed out, how are so-called "known patterns" representative of any real system? Even a single application with a "known pattern" doesn't translate as-is in practice, even less so on a capacity system. While the "shift all-to-all pattern" referred to in the paper you cite is interesting (on paper) in that it stresses the entire connectivity of a FBB fabric, it remains a simulation carried out in isolation. Sticking to simulation, I find that looking at the observed switch latency at the egress ports as a function of switch loading using random communication patterns to be a more interesting data point. However, it can be even more revealing to try to scale an expensive communication operation on a real system, only to notice that this is where the paper FBB breaks down. 40% looks like a large number but it's not uncommon to see application writers report large speedups by breaking down bandwidth-bound problems into latency-sensitive ones. This looks counter-intuitive because the paper FBB can be available on the fabric, but suggests that systems don't always deliver FBB *even if* the pattern is known and the fabric is otherwise idle. Given a capacity system, there's huge potential in looking at alternative routing methods -- all of which is orthogonal to system size and the ability to describe and understand one's communication pattern. > There are some vendors that uses only the 24 port switches to build very > large scale clusters - 3000 nodes and above, without any > oversubscription, and they find it more cost effective. Using single > enclosures is easier, but the cables are not expensive and you can use > the smaller components. I used the 24 ports switches to connect my 96 > node cluster. I will replace my current setup with the new 36 InfiniBand > port switches this month, since they provide lower latency and adaptive > routing capabilities. And if you are bandwidth bounded, using IB QDR > will help. You will be able to drive more than 3GB/s from each server. Along similar lines (and with less product placement), buying more cores and less IB can help you solve (and scale) larger problems. In a world of quick inferences, one is also permitted to conclude that implementors find it *performance cost effective* to *not* pay top dollar for paper FBB when only a fraction of the FBB performance can be achieved. Don't need the oversubscription for those cases. Based on some of the points provided so far, it's as if "known patterns" are already well served by IB static routing and that the "other cases" will now benefit from newer IB "adaptive routing support". As working for an IB vendor myself, or for anyone out there who spends a lot of their time solving bandwidth-bound problems or working on capacity systems, the discussion can't possibly be reduced to this. The design space for routing methods based on modular switches is much larger that what is currently offered by the Mellanox portfolio. So I'd suggest that *no*, static routing is not necessarily the best approach even if your pattern is known. . . christian -- christian.bell@qlogic.com QLogic Host Solutions Group From landman at scalableinformatics.com Mon Jun 16 11:30:39 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> Message-ID: <4856B14F.9020104@scalableinformatics.com> Vincent Diepeveen wrote: > Jim, > > Reality is that the person who SPECULATES that something is good > also hides behind a DNA. This is another typical case of that. Hiding behind DNA? Gotta be thin ... (ok ok, I'll keep my day job ...) > On the one hand claiming a NDA, on the other hand implying that is a > very good product that will get > released. An NDA is a device to allow a company to get its act together while getting feedback on the product they want to release. It is quite useful. [...] > If NVIDIA clearly indicates towards me that they aren't gonna release > any technical data nor support anyone with technical er ... have you looked at the site I indicated? > data (the word NDA has not been used by the way by either party, it was > a very clear NJET from 'em), > and all information i've got so far is very dissapointing, then hiding > behind a NDA is considered very bad manners. No, it's not. It seems to me that given the number of applications out there now using their kit, that rumors of a lack of information to use their kit might be ... er ... not well founded. > Either shut up entirely or do not hide behind a NDA. Ah ... Ok. Manners, manners Vincent. [...] > In case of NVIDIA if you google a tad you will figure out that the > double precision promise has been done more than once, Hmmm.... I would suggest you sign an NDA with them, and learn what it is they are talking about to their customers, though I suspect they may not wish to pre-divulge information to you given what you have indicated here. [...] > IBM now shows up with a working supercomputer using new generation CELL > processors which have a sustained 77 Gflop double precision > a chip which means a tad more than 150 Gflop for each node. Each > rackmount node is relative cheap and performs very well. Cell is good. Now try to program it. Yes, it is a SMOP ( http://www.catb.org/jargon/html/S/SMOP.html ) > 1 Petaflop @ 2.66 MW, that's really good. Yup. Exaflop at 2.66 GW. Waaa hoooo!!!! -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From deadline at eadline.org Mon Jun 16 14:40:56 2008 From: deadline at eadline.org (Douglas Eadline) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] A few words about NVidia and HPC Message-ID: <53810.192.168.1.213.1213652456.squirrel@mail.eadline.org> I visited NVidia last week for HPC editors day. I am working on some write-ups, but let me make a few general comments: - NVidia is very serious about HPC. There is of course a limited amount of real estate on the die, so there are trade-offs as to video/HPC - There will be double precision support across the product line (which is) * The GeForce is intended for entertainment market * The Tesla is intended for the HPC market and will use high grade memory for 24x7 HPC use, and support up to 4 GB of on-board RAM * The Quadro is for the high design/graphics market - In the very short time Cuda has been available (about a year) there are some applications that have seen large improvements in performance (not all apps need double precision) - Cuda provides an interesting abstraction layer which hides the dynamic thread execution hardware on the chip - there is more, but then I would be writing an article. Now, should you put your cluster on Ebay and buy video cards? I have no idea. What I do know is the consumer market has provided us with another interesting hardware platform for HPC applications. Plus, the companies (Nvidia, and AMD/ATI) are helping out as well. Interesting times. -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Mon Jun 16 14:47:15 2008 From: deadline at eadline.org (Douglas Eadline) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] June New York/Jersey HPC users meeting Message-ID: <34281.192.168.1.213.1213652835.squirrel@mail.eadline.org> If you live or work in the New York/North Jersey Metropolitan area, mark your calender for this Thursday, June 19th. The NYCA-HUG (New York City Area HPC Users Group) will be trying to answer the ultimate question Torque or Sun Grid Engine? We will be discussing the pros/cons of each scheduler for HPC clusters. Come and add your experiences, wants, and rants. Then you decide. Links for the details http://www.clustermonkey.net//content/view/229/1/ http://www.linux-mag.com/id/4140 -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From patrick at myri.com Mon Jun 16 15:21:51 2008 From: patrick at myri.com (Patrick Geoffray) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Infiniband modular switches In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F0129E83A@mtiexch01.mti.com> Message-ID: <4856E77F.7080000@myri.com> Gilad Shainer wrote: > Static routing is the best approach if your pattern is known. In other If your pattern is known, and if it is persistent, and it is perfectly synchronized, and if you have a single job running on the fabric, and if you have total control of the process/node mapping and if there is no down/bad links, and if there is no other traffic pattern in the application, then yes static routing is the best. In real life, where there are multiple jobs running at once on various parts of a cluster, where there are always some marginal links, when you cannot guarantee on which nodes a job will be allocated, and applications have multiple communication patterns (collectives) and load is usually unbalanced, static routing is the worst. > cases it depends on the applications. LANL and Mellanox have presented a > paper on static routing and how to get the maximum of it last ISC. There Single app, dedicated machine, total control of the network. Similarly, I could have a pretty good shot at predicting the next lotto numbers if I would know the position (and speed) of all atoms in the universe (Dr Brown, this is for you !). > are cases where adaptive routing will show a benefit, and this is why we > see the IB vendors add adaptive routing support as well. But in general, > the average effective bandwidth is much much higher than the 40% you > claim. Have a look at the slides 17 and 19 of the following set of slides (and slides 21 and 22 to illustrate my point above): http://www.openib.org/archives/spring2007sonoma/Monday%20April%2030/Leininger-Seager-Adaptive-Routing-OFA-Sonoma-2007-v03.pdf Hoefler and al have shown an average effective bisection of ~40% on Infiniband (OMNeT simulations) in a paper submitted to Cluster2008. In a paper to be presented at Hot Interconnects this year, I have measured the effective bisection (SendRecv on random pairs) on a 512-node Myri-10G cluster (single enclosure, 32-port crossbars) under various routing implementations. Below is the link to pretty graphs with static and probing adaptive routing: http://patrick.geoffray.googlepages.com/staticvsadaptiverouting You can see that the worst case static routing goes quickly below 40%, but the average eventually goes there as well. > There are some vendors that uses only the 24 port switches to build very > large scale clusters - 3000 nodes and above, without any > oversubscription, and they find it more cost effective. Using single > enclosures is easier, but the cables are not expensive and you can use Price of cables usually depends on the length (copper and fiber). Using small switches at the edges allows to use very short cables to the hosts (in-rack) but you still have to use the same number of longer cables to connect to the spine. With a single enclosure, you may need longer cables to reach the hosts (different rack), but you don't need cables to the spine as they are on the switch backplane (and PCB is free). Short cables may not be expensive, but they are not free. Furthermore, physical cables are much less reliable than wire on PCB, and they take more space, more power. Patrick From bill at cse.ucdavis.edu Mon Jun 16 17:51:21 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> Message-ID: <48570A89.3040608@cse.ucdavis.edu> Vincent Diepeveen wrote: > Jim, > > Reality is that the person who SPECULATES that something is good > also hides behind a DNA. This is another typical case of that. Er, who's hiding? Get your credit card out and buy one. CUDA is easily available. Today's new nvidia release has resulted in a handful at www.newegg.com, I'm sure many others will sell them to you as well. > On the one hand claiming a NDA, on the other hand implying that is a > very good product that will get > released. Indeed, there were numerous rumors, even mentions from researchers with pre-release hardware who claimed 2nd half 07 for double precision availability. > If NVIDIA clearly indicates towards me that they aren't gonna release > any technical data nor support anyone with technical > data (the word NDA has not been used by the way by either party, it was > a very clear NJET from 'em), > and all information i've got so far is very dissapointing, then hiding > behind a NDA is considered very bad manners. The CUDA docs look pretty good to me, as usual the definitions of latency, bandwidth, message passing, flops, and overall performance varies. So whatever you have in mind the best bet is to actually try it. Fortunately nvidia makes it rather easy to get going. If you write a microbenchmark I suspect you could get it run. > Then instead of a $200 pci-e card, we needed to buy expensive Tesla's > for that, without getting Even tesla's were single precision I believe, at least until today. > The few trying on those Tesla's, though they won't ever post this as > their job is fulltime GPU programming, > report so far very dissappointing numbers for applications that really > matter for our nations. People seem happy, of course they are far from general purpose, but the progress (especially with CUDA) seems pretty good. > Truth is that you can always do a claim of 1 teraflop of computing > power. If that doesn't get backupped by technical documents You mean like (mentioned on the list previously) http://arxiv.org/abs/0709.3225 > how to get it out of the hardware if your own testprograms show that you > can't get that out of the hardware, > it is rather useless to start programming for such a platform. GPUs are hardly suited for everyone, that doesn't make them useless. Personally finding a port of McCalpin's stream seeing 50GB/sec or so caught my attention. Sure it's not a recompile and go, but at least it's fairly c like. Function Rate (MB/s) Avg time Min time Max time Copy: 50077.8888 0.0013 0.0013 0.0013 Scale: 50637.4974 0.0013 0.0013 0.0013 Add: 51090.5662 0.0019 0.0019 0.0019 Triad: 50527.6617 0.0019 0.0019 0.0019 www.nvidia.com claims said card has a 57GB/sec memory system, today's new 260/280 are in the 130-140GB/sec range (advertised, not stream). > It is questionable whether it is interesting to design some algorithms > for GPU's; it takes endless testing of every tiny detail to figure out > what the GPU can and cannot do and to get accurate timings. By the time > you finish with that, you can also implement the same design in > FPGA or ASIC/VLSI whatever. As that is of course the type of interested Sounds wildly off base to me. But if you implement a FPGA/ASIC + memory system that costs less than $200 qty 1 and can be programmed in a few hours to implement stream at or above 50GB/sec let me know. > parties in GPU programming; > considering the amount of computing power they need, for the same budget > they can also make their own CPU's. Even assuming man months of highly paid programmers I don't see how you get from that cost to the budget for making your own CPU. > For other companies that i tried to get interested, there is a lot of > hesitation to even *investigate* that hardware, let alone give a contract > job to port their software to such hardware. Nvidia for all those > civilian and military parties is very very unattractive as of now. CUDA is the start of practical use of a GPU for non graphics related jobs, seems like a good start to me, sure not everything fits, but it's progress. Today's new chips add double precision which definitely helps. > IBM now shows up with a working supercomputer using new generation CELL > processors which have a sustained 77 Gflop double precision > a chip which means a tad more than 150 Gflop for each node. Each > rackmount node is relative cheap and performs very well. Sure, for rather specialized tasks. Each SPE has what, 32KB of memory? I think of it more as a DSP than a CPU. BTW the previous generation cell did double precision as well, it was just 1/10th as fast as single precision. The new revision is around 100GFlops/sec up from 14 ish. From lindahl at pbm.com Mon Jun 16 18:14:00 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48570A89.3040608@cse.ucdavis.edu> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <48570A89.3040608@cse.ucdavis.edu> Message-ID: <20080617011359.GB6176@bx9.net> On Mon, Jun 16, 2008 at 05:51:21PM -0700, Bill Broadley wrote: > Personally finding a port of McCalpin's stream seeing 50GB/sec or so > caught my attention. Well, given that GPUs don't do so hot on Linpack, one should hope that they're close to peak on *something* ! For a while I was trying to incite some friends to go do a CPU that plugged into VRAM, but it didn't get very far. -- greg From perry at piermont.com Mon Jun 16 18:48:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48568904.3020307@ias.edu> (Prentice Bisbal's message of "Mon\, 16 Jun 2008 11\:38\:44 -0400") References: <1210016466.4924.1.camel@Vigor13> <48568904.3020307@ias.edu> Message-ID: <87hcbs96go.fsf@snark.cb.piermont.com> Prentice Bisbal writes: > Completely untrue. One of my colleagues, who does a lot of work with GPU > processors for astrophysics calculations, was able to increase the > performance of the MD5 algorithm by ~100x with about 1.5 days of work. That's rather surprising. MD5 is a pure integer algorithm, and is well known for being unfriendly to vectorization. There is also extensive work by Keromytis et al on the use of GPUs for accelerating cryptographic operations, and I don't think they achieved anything like that sort of performance improvement. I'll point out, by the way... >* GPU (single GeForce 8800 Ultra on cylon): > 57,640,967.264473 hash/second ...that implies moving at least 3.7e9 bytes of data (MD5 operates on blocks of 64 bytes) into the GPU per second, entirely ignoring the 64 Feistel rounds within the GPU. Each round is 4 xors and a rotate, and they can't be done in parallel, so we get a total of about 1.8e10 integer ops (entirely ignoring the world shuffling) per second. That's... rather a lot. Perry -- Perry E. Metzger perry@piermont.com From bill at cse.ucdavis.edu Mon Jun 16 18:58:40 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080617011359.GB6176@bx9.net> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <48570A89.3040608@cse.ucdavis.edu> <20080617011359.GB6176@bx9.net> Message-ID: <48571A50.8020002@cse.ucdavis.edu> Greg Lindahl wrote: > On Mon, Jun 16, 2008 at 05:51:21PM -0700, Bill Broadley wrote: > >> Personally finding a port of McCalpin's stream seeing 50GB/sec or so >> caught my attention. > > Well, given that GPUs don't do so hot on Linpack, one should hope that > they're close to peak on *something* ! Heh. Is there a published linpack for some CUDA based solution? Or possibly code available? > For a while I was trying to incite some friends to go do a CPU that > plugged into VRAM, but it didn't get very far. Well hopefully ATI and AMD will figure out a way to give a $300 CPU as much memory bandwidth as a $200 video card. Seems like with the continuing increase in cores, not to mention the yet again increased width of the SSE registers would make increased bandwidth more usable. I'm pretty sure hypertransport allows for a significant number of outstanding memory transactions, so even a single gpu/cpu hybrid could farm out a 100GB/sec memory system to numerous sockets.... sounds like a good justification for HT3 to me. From perry at piermont.com Mon Jun 16 19:06:49 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <87hcbs96go.fsf@snark.cb.piermont.com> (Perry E. Metzger's message of "Mon\, 16 Jun 2008 21\:48\:55 -0400") References: <1210016466.4924.1.camel@Vigor13> <48568904.3020307@ias.edu> <87hcbs96go.fsf@snark.cb.piermont.com> Message-ID: <87d4mg95mu.fsf@snark.cb.piermont.com> "Perry E. Metzger" writes: > ...that implies moving at least 3.7e9 bytes of data (MD5 > operates on blocks of 64 bytes) into the GPU per second, entirely > ignoring the 64 Feistel rounds within the GPU. Each round is 4 xors > and a rotate, and they can't be done in parallel, so we get a total of > about 1.8e10 integer ops (entirely ignoring the world shuffling) per > second. That's... rather a lot. By the way, as an aside, dedicated IPSec hardware can keep up with doing HMAC-MD5 at gigabit ethernet speeds -- I don't think anyone has shown hardware capable of doing HMAC-MD5 faster than 10G ethernet. (I'm not even sure anyone has hardware that will keep up on 10GigE). Your friend is claiming he can do faster -- about 30Gbit/sec -- beating custom hardware optimized purely for doing MD5. That would clearly be of a lot of interest to many people if it were true. Perry From shaeffer at neuralscape.com Mon Jun 16 19:52:43 2008 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080617011359.GB6176@bx9.net> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <48570A89.3040608@cse.ucdavis.edu> <20080617011359.GB6176@bx9.net> Message-ID: <20080617025243.GC8621@synapse.neuralscape.com> On Mon, Jun 16, 2008 at 06:14:00PM -0700, Greg Lindahl wrote: > On Mon, Jun 16, 2008 at 05:51:21PM -0700, Bill Broadley wrote: > > > Personally finding a port of McCalpin's stream seeing 50GB/sec or so > > caught my attention. > > Well, given that GPUs don't do so hot on Linpack, one should hope that > they're close to peak on *something* ! > > For a while I was trying to incite some friends to go do a CPU that > plugged into VRAM, but it didn't get very far. > > -- greg Hi all, I don't want anyone to take this the wrong way. I think Nvidia is very serious about their technology. But they do appear to have some QA issues with their technologies. I might want to consider that, if I was an early adopter of new technology from Nvidia. BTW, I have done no serious evaluation of that statement, although I have experience finding some issues with their technology in the past. Thanks, Karen -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer@neuralscape.com http://www.neuralscape.com From hahn at mcmaster.ca Mon Jun 16 20:32:00 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <87hcbs96go.fsf@snark.cb.piermont.com> References: <1210016466.4924.1.camel@Vigor13> <48568904.3020307@ias.edu> <87hcbs96go.fsf@snark.cb.piermont.com> Message-ID: > That's rather surprising. MD5 is a pure integer algorithm, and is well > known for being unfriendly to vectorization. There is also extensive the quoted performance was throughput, not latency, so perhaps they were simply doing a bunch of md5's at once. >> * GPU (single GeForce 8800 Ultra on cylon): >> 57,640,967.264473 hash/second > > ...that implies moving at least 3.7e9 bytes of data (MD5 > operates on blocks of 64 bytes) into the GPU per second, entirely the only comparable number I've heard quoted was in parallel rendering/compositing, where fairly recent systems bragged about reading back the framebuffer at 2 GB/s. From richard.walsh at comcast.net Tue Jun 17 06:52:10 2008 From: richard.walsh at comcast.net (richard.walsh@comcast.net) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? Message-ID: <061720081352.323.4857C18A00084455000001432215568884089C040E99D20B9D0E080C079D@comcast.net> -------------- Original message -------------- From: Vincent Diepeveen > Seems the next CELL is 100% confirmed double precision. Mmm ... "it seems ... 100% confirmed" ... this is confusing, it is either certain or "it seems ...", not both. I have not seen it confirmed anywhere that the PowerXCell (the new chip just announced) is 100% IEEE 754 double-precision compatible. The old generation chip was, but at 1/4 the speed. The new chip runs double precision at ~100 Gflops, and I am guessing that now its double precision has limited IEEE 754 compatibility like single precision did/does... not that this is important to all applications. If you "know" otherwise and can point me to supporting documentation, I would be interested. rbw -- "Making predictions is hard, especially about the future." Niels Bohr -- Richard Walsh Thrashing River Consulting-- 5605 Alameda St. Shoreview, MN 55126 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080617/bedd5a70/attachment.html From james.p.lux at jpl.nasa.gov Tue Jun 17 06:52:42 2008 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48570A89.3040608@cse.ucdavis.edu> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <48570A89.3040608@cse.ucdavis.edu> Message-ID: <20080617065242.dy9c9t4xco4kw080@webmail.jpl.nasa.gov> Quoting Bill Broadley , on Mon 16 Jun 2008 05:51:21 PM PDT: > > >> parties in GPU programming; >> considering the amount of computing power they need, for the same >> budget they can also make their own CPU's. > > Even assuming man months of highly paid programmers I don't see how you > get from that cost to the budget for making your own CPU. There are some open source CPU cores in VHDL for things like the SPARC V8 around, if you want to try this out. There's also open source VHDL bus interfaces, memory, etc. Get yourself a copy of the appropriate tools and an appropriate FPGA eval board, and you're off and running. Once you have a solid design in FPGA, there are a number of fabs that will take it and turn it into a (faster, smaller, lower power) ASIC, with some form of guarantee (i.e. if it works in FPGA, and you follow their design rules, you're guaranteed a working ASIC). They do the fabs as multiproject wafers (like MOSIS) Typical overall cost is in sub $100K range, from what I understand. From james.p.lux at jpl.nasa.gov Tue Jun 17 07:01:16 2008 From: james.p.lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <4856F7EC.7060000@ussarna.se> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> Message-ID: <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> Quoting Linus Harling , on Mon 16 Jun 2008 04:31:56 PM PDT: > Vincent Diepeveen skrev: > >> >> Then instead of a $200 pci-e card, we needed to buy expensive Tesla's >> for that, without getting >> very relevant indepth technical information on how to program for that >> type of hardware. >> >> The few trying on those Tesla's, though they won't ever post this as >> their job is fulltime GPU programming, >> report so far very dissappointing numbers for applications that really >> matter for our nations. > > > Tomography is kind of important to a lot of people: > > http://tech.slashdot.org/tech/08/05/31/1633214.shtml > http://www.dvhardware.net/article27538.html > http://fastra.ua.ac.be/en/index.html > > But of course, that was done with regular $500 cards, not Teslas. Mind you, if you go and get a tomographic scan today, they already use fast hardware to do it. Only researchers on limited budgets tolerate taking days to reduce the data on a desktop PC. And, while the concept of doing faster processing with a <10KEuro box is attractive in that environment, I suspect it's a long way from being commercially viable in that role. The current tomographic technology (e.g. GE Lightspeed) is pretty impressive. They slide you in, and 10-15 seconds later, there's 3 d rendered models and slices on the screen. The equipment is pretty hassle free, the UI straightforward from what I could see, etc. And, of course, people are willing (currently) to pay many millions for a machine to do this. I suspect that the other costs of running a CT scanner (both capital and operating) overwhelm the cost of the computing power, so going from a $100K box to a $20K box is a drop in the bucket. When you're talking MRI, for instance, there's the cost of the liquid helium for the magnets. That's a long way from a bunch of grad students racking up a bunch of PCs. From prentice at ias.edu Tue Jun 17 09:38:12 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <87hcbs96go.fsf@snark.cb.piermont.com> References: <1210016466.4924.1.camel@Vigor13> <48568904.3020307@ias.edu> <87hcbs96go.fsf@snark.cb.piermont.com> Message-ID: <4857E874.7000300@ias.edu> Perry E. Metzger wrote: > Prentice Bisbal writes: >> Completely untrue. One of my colleagues, who does a lot of work with GPU >> processors for astrophysics calculations, was able to increase the >> performance of the MD5 algorithm by ~100x with about 1.5 days of work. > > That's rather surprising. MD5 is a pure integer algorithm, and is well > known for being unfriendly to vectorization. There is also extensive > work by Keromytis et al on the use of GPUs for accelerating > cryptographic operations, and I don't think they achieved anything > like that sort of performance improvement. > > I'll point out, by the way... > >> * GPU (single GeForce 8800 Ultra on cylon): >> 57,640,967.264473 hash/second > > ...that implies moving at least 3.7e9 bytes of data (MD5 > operates on blocks of 64 bytes) into the GPU per second, entirely > ignoring the 64 Feistel rounds within the GPU. Each round is 4 xors > and a rotate, and they can't be done in parallel, so we get a total of > about 1.8e10 integer ops (entirely ignoring the world shuffling) per > second. That's... rather a lot. > > Perry Perry, I was just passing the information along from my colleague. I forwarded your response to him. Maybe he will reply himself and go into more detail about his findings for you. -- Prentice From diep at xs4all.nl Tue Jun 17 10:07:54 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> Message-ID: Jim, I feel you notice a very important thing here. That is that mainly for hobbyists like me a GPU is interesting to program for, or for companies who have a budget to buy less than a dozen of them in total. For ISPs the only thing that matters is the power consumption and for encryption at a low TCP/IP layer it's too easy to equip all those cheapo cpu's with encryption coprocessors which are like 1 watt or so and are delivering enough work to get the 100 mbit/1 gigabit nics fully busy, in case of public key it's in fact at a speed that you won't reach at a GPU when managing to parallellize it and it to work in a great manner. The ISPs look for full scalable stuff of course such machines, quite the opposite of having 1 card @ 250 watt. In fact it would be quite interesting to know how fast you can run RSA on a GPU. Where are the benchmarks there? I tend to remember that i posted some solution to do a fast generic modulo (of course not a new idea, but that you always hear after figuring something out), with a minimum of code under the condition you already have multiplying code. How fast can you multiply big numbers on those GPU's? 4096 x 4096 bits is most interesting there. Then of course take the modulo quickly and repeat this for the entire exponent-squaring. That is the only interesting question IMHO, what amount of throughput does it deliver for RSA4096. I tend to remember a big bug it has in such a case is that the older cards (8800 etc) only can do 16 x 16 bits == 32 bits, whereas at CPU's you can use 64 x 64 bits == 128 bits. BIG difference in speed. Yet those hobbyists who are the interested persons in GPU programming have limited time to get software to run and have a budget far smaller than $10k. They're not even gonna buy as much Tesla's as NASA will. Not a dozen. The state in which gpu programming is now is that some big companies can have a person toy fulltime with 1 gpu, as of course the idea of having a cpu with hundreds of cores is very attractive and looks like a realistic future, so companies must explore that future. Of course every GPU/CPU company is serious in their aim to produce products that perform well, we all do not doubt it. Yet it is only attractive to hobbyists and those hobbyists are not gonna get any interesting technical data needed to get the maximum out of the GPU's from Nvidia. This is a big problem. Those hobbyists have very limited time to get their sparetime products done to do numbercrunching, so being busy fulltime writing testprograms to know everything about 1 specific GPU is not something they all like to do for a hobby. Just having that information will attract the hobbyists as they are willing to take the risk to buy 1 Tesla and spend time there. That produces software. That software will have a certain performance, based upon that performance perhaps some companies might get interested. Intel and AMD will be doing better there i hope. Note that testing CUDA also is suboptimal, it just runs for 5 seconds or so max. You need a machine with a 2nd videocard. That requires a mainboard with at least 2x pci-e 16x. How to cluster that? My cluster cards are pci-x not pci-e. quadrics QM400's. I can get boards @ 139 euro with 1 slot PCI-E 16x and build quadcore Q6600 nodes @ 500 euro, as soon as i have a job again. My macbookpro 17'' has no pci-e 16x slot free though. So for number crunching, a cluster always wins it from a single nvidia videocard. The communication speed over the pci-e from the videocards is too slow latency to parallellize software that is not-embarrassingly parallel. Majority of hobbyists will have a similar problem with nvidia, very sad in itself. A good CUDA setup that can beat a simplistic cluster is not so cheap and easy to program for like building that cluster is. Also the memory scales better in those clusters than it does for the cards. If 1 node can do less work than 1 GPU can, it still means that it's easier to get that exponential speedup by having a shared cache across all nodes (this is true for a lot of modern crunching algorithms). With a GPU you're forced to do all calculation including caching within the GPU and within the limited device RAM. Now in contradiction to what most people tend to believe, usually there is methods to get away with a limited amount of RAM with modern overwriting methods of caching, even when that loses a factor 2 there is ways to overcome that. The biggest limitation is that communication with other nodes is real hard. Scaling to more nodes is just not gonna happen of course as the latency between the nodes is real bad and it is an extra slow hop to latency of course. First from device RAM to RAM then from RAM to card and from card to RAM and from RAM to device RAM. Let's make a list of problems that most clusters here calculate upon and you'll see how much the GPU concept still needs to get matured to get it to work well for most codes. Software that needs low latency interconnects you could get to work within 1 card only therefore, provided the RAM access is not bottlenecked. Yet all reports so far indicate it is. No information there is just not very encouraging and for professional crunching work where companies therefore have a big budget for, building or buying in your own low power co-processor that so to speak even fits into an ipod is just too easy. So in the end i guess some stupid extension of SSE will give a bigger increase in crunching power than the in itself attractive gpgpu hardware. The biggest limitation being development time from hobbyists. Vincent On Jun 17, 2008, at 4:01 PM, Jim Lux wrote: > Quoting Linus Harling , on Mon 16 Jun 2008 > 04:31:56 PM PDT: > >> Vincent Diepeveen skrev: >> >>> >>> Then instead of a $200 pci-e card, we needed to buy expensive >>> Tesla's >>> for that, without getting >>> very relevant indepth technical information on how to program for >>> that >>> type of hardware. >>> >>> The few trying on those Tesla's, though they won't ever post this as >>> their job is fulltime GPU programming, >>> report so far very dissappointing numbers for applications that >>> really >>> matter for our nations. >> >> >> Tomography is kind of important to a lot of people: >> >> http://tech.slashdot.org/tech/08/05/31/1633214.shtml >> http://www.dvhardware.net/article27538.html >> http://fastra.ua.ac.be/en/index.html >> >> But of course, that was done with regular $500 cards, not Teslas. > > > Mind you, if you go and get a tomographic scan today, they already > use fast hardware to do it. Only researchers on limited budgets > tolerate taking days to reduce the data on a desktop PC. And, while > the concept of doing faster processing with a <10KEuro box is > attractive in that environment, I suspect it's a long way from > being commercially viable in that role. > > The current tomographic technology (e.g. GE Lightspeed) is pretty > impressive. They slide you in, and 10-15 seconds later, there's 3 > d rendered models and slices on the screen. The equipment is > pretty hassle free, the UI straightforward from what I could see, etc. > > And, of course, people are willing (currently) to pay many millions > for a machine to do this. I suspect that the other costs of > running a CT scanner (both capital and operating) overwhelm the > cost of the computing power, so going from a $100K box to a $20K > box is a drop in the bucket. When you're talking MRI, for > instance, there's the cost of the liquid helium for the magnets. > > That's a long way from a bunch of grad students racking up a bunch > of PCs. > > > > > > > From James.P.Lux at jpl.nasa.gov Tue Jun 17 10:43:07 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> Message-ID: <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> At 10:07 AM 6/17/2008, Vincent Diepeveen wrote: >Jim, > >I feel you notice a very important thing here. > >That is that mainly for hobbyists like me a GPU is interesting to >program for, or for companies who have a budget to buy less than a dozen >of them in total. And, as you say, that's a hobby/R&D market where you're willing to spend more in labor than in hardware. >For ISPs the only thing that matters is the power consumption and for >encryption at a low TCP/IP layer it's too easy to equip all those >cheapo cpu's with encryption coprocessors which are like 1 watt or so >and are delivering enough work to get the 100 mbit/1 gigabit nics >fully busy, in case of public key it's in fact at a speed that you >won't reach at a GPU when managing to parallellize it and it to work >in a great >manner. The ISPs look for full scalable stuff of course such >machines, quite the opposite of having 1 card @ 250 watt. I don't know much about the economies of running an ISP. While electrical power (and cooling,etc.) might be a big chunk of their budget, I suspect that mundane business stuff like advertising, billing, account management, etc. might actually be a bigger slice. For instance, do co-lo facilities charge you for power, or is it like office space, where you rent it by the square foot, and an assumed amount of power and HVAC comes with the price. >Yet those hobbyists who are the interested persons in GPU programming >have limited time >to get software to run and have a budget far smaller than $10k. >They're not even gonna buy as much Tesla's as NASA will. >Not a dozen. There, I think you're wrong. There's lots of hobbyists and tinkerer's of one sort or another out there. I'd bet they sell at least thousands of them. >The state in which gpu programming is now is that some big companies >can have a person toy fulltime with 1 gpu, >as of course the idea of having a cpu with hundreds of cores is very >attractive and looks like a realistic future, >so companies must explore that future. The various flavors of multi-core in a field of RAM have been around for decades, because it's (superficially?) attractive from a scalability standpoint. The problem, as everyone on this list is aware, is effectively using such an architecture.. parallelizing isn't trivial. There's a reason they still sell mainframe computers, but, hope does spring eternal. >Of course every GPU/CPU company is serious in their aim to produce >products that perform well, we all do not doubt it. Not necessarily, unless your performance metric is shareholder return. It is the rare company that can make a business of selling solely on top-end performance (e.g. Cray). There's also several strategies and target markets. If you have good manufacturing capability for large quantities, you adjust your performance to what consumers will buy at a price you can make money on. If you're in a more "fee for service" model, then you likely are doing smaller unit volumes, but the units cost a lot more (I suspect that most of the cluster vendors on this list fall in this category), but still, in the long run the cost to do the job MUST be less than what the customer is willing to pay (unless the owner is some sort of philanthropist, naive, or a fool) >Yet it is only attractive to hobbyists and those hobbyists are not >gonna get any interesting technical data needed to get the maximum >out of the GPU's from Nvidia. This is a big problem. Those hobbyists >have very limited time to get their sparetime products done >to do numbercrunching, So it's basically an investment decision. How much value do you want to get out of your investment of time or money? If you're only willing to spend a few hours, then you must not value the end state of the work very highly (or, more correctly, you value something else more highly...). > so being busy fulltime writing testprograms to >know everything about 1 specific GPU is not something they >all like to do for a hobby. Just having that information will attract >the hobbyists as they are willing to take the risk to buy 1 Tesla and >spend time there. That produces software. That software will have a >certain performance, >based upon that performance perhaps some companies might get interested. To a certain extent, this is the "build it and they will come" model. It's not one that is going to make any real difference to Nvidia's bottom line, so they're unlikely to invest more than a token amount in it. >So in the end i guess some stupid extension of SSE will give a bigger >increase in crunching power than the in itself attractive gpgpu >hardware. >The biggest limitation being development time from hobbyists. And HPC hobbyists are a very tiny market, not worth very much commercially. From lindahl at pbm.com Tue Jun 17 11:23:56 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> Message-ID: <20080617182356.GA28059@bx9.net> On Tue, Jun 17, 2008 at 10:43:07AM -0700, Jim Lux wrote: > For instance, do co-lo facilities charge you for power, or is > it like office space, where you rent it by the square foot, and an > assumed amount of power and HVAC comes with the price. The typical colo charges by space, with a certain amount of power per rack, but in practice the limit is power. As an example, the new colo I'm moving into only gives me enough power for 16 machines per rack. But older colos are only 2/3 of that. These are HPCish nodes. I was pretty surprised by this, as most HPC machine rooms allow a significantly higher power density. > but still, in > the long run the cost to do the job MUST be less than what the > customer is willing to pay (unless the owner is some sort of > philanthropist, naive, or a fool) Mmmmmmmmmmmmf. MMMMF! -- greg From James.P.Lux at jpl.nasa.gov Tue Jun 17 11:41:34 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080617182356.GA28059@bx9.net> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> Message-ID: <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> At 11:23 AM 6/17/2008, Greg Lindahl wrote: >On Tue, Jun 17, 2008 at 10:43:07AM -0700, Jim Lux wrote: > > > For instance, do co-lo facilities charge you for power, or is > > it like office space, where you rent it by the square foot, and an > > assumed amount of power and HVAC comes with the price. > >The typical colo charges by space, with a certain amount of power per >rack, but in practice the limit is power. As an example, the new colo >I'm moving into only gives me enough power for 16 machines per >rack. But older colos are only 2/3 of that. These are HPCish nodes. > >I was pretty surprised by this, as most HPC machine rooms allow a >significantly higher power density. > > > but still, in > > the long run the cost to do the job MUST be less than what the > > customer is willing to pay (unless the owner is some sort of > > philanthropist, naive, or a fool) > >Mmmmmmmmmmmmf. MMMMF! > >-- greg Well.. to be fair, there were (and still are) businesses out there (particularly a few years ago) that didn't fully understand the concept of needing net profit. (ah yes, the glory days of startups "buying market share" in the dot-com bubble) And, some folks made a fine living in the mean time. (But, then, those folks weren't the owners, were they, or if they were, in a limited sense, they now have some decorative wallpaper..) From lindahl at pbm.com Tue Jun 17 11:53:08 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48571A50.8020002@cse.ucdavis.edu> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <48570A89.3040608@cse.ucdavis.edu> <20080617011359.GB6176@bx9.net> <48571A50.8020002@cse.ucdavis.edu> Message-ID: <20080617185307.GB28059@bx9.net> On Mon, Jun 16, 2008 at 06:58:40PM -0700, Bill Broadley wrote: > Heh. Is there a published linpack for some CUDA based solution? Or > possibly code available? http://www.hp.com/techservers/hpccn/hpccollaboration/ADCatalyst/downloads/accelerating-HPCUsing-GPUs.pdf says that some folks at Berkeley wrote some SGEMM code and that it achieves ~ 165 Gflops out of a 4-GPU Tesla setup. That's waaaaay down from the alleged peak of that box. > I'm pretty sure hypertransport allows for a significant number of > outstanding memory transactions, so even a single gpu/cpu hybrid could farm > out a 100GB/sec memory system to numerous sockets.... sounds like a good > justification for HT3 to me. Hypertransport is just a bus, it's the northbridge+cpu that determines how many outstanding transactions you can have. -- greg From richard.walsh at comcast.net Tue Jun 17 12:00:58 2008 From: richard.walsh at comcast.net (richard.walsh@comcast.net) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? Message-ID: <061720081900.1570.485809EA0009C560000006222214756402089C040E99D20B9D0E080C079D@comcast.net> -------------- Original message -------------- From: Jim Lux > Well.. to be fair, there were (and still are) businesses out there > (particularly a few years ago) that didn't fully understand the > concept of needing net profit. (ah yes, the glory days of startups > "buying market share" in the dot-com bubble) And, some folks made a > fine living in the mean time. (But, then, those folks weren't the > owners, were they, or if they were, in a limited sense, they now have > some decorative wallpaper..) > Hey Jim, Gold rushes are good (and greed I guess too ... ;-) ...) ... there IS often gold in them there hills, it is just that very few if anyone knows exactly where. So, the less risk averse among us and those with more money than sense (thankfully, I say) starting digging. Most of their trials end in error, but the rest of us benefit from the few that are lucky/smart enough to find it. I think you are assuming that the futures are far more predictable than it in fact is, even for the best and brightest like yourself ... what percentage of the HPC market will accelerators have at this time next year? Regards, rbw -- "Making predictions is hard, especially about the future." Niels Bohr -- Richard Walsh Thrashing River Consulting-- 5605 Alameda St. Shoreview, MN 55126 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080617/8dd5911c/attachment.html From hahn at mcmaster.ca Tue Jun 17 12:05:33 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080617182356.GA28059@bx9.net> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> Message-ID: > I'm moving into only gives me enough power for 16 machines per > rack. But older colos are only 2/3 of that. These are HPCish nodes. I guess I'm not surprised, but then again, it does make me scratch my head about vendors who are still in love with blades. you know, headlines like "HP puts 1000 cores in a rack". and who buys them? I picture PHB's buying blade chassis and then putting just one per rack for heat-density reasons. maybe blades are creating a giant market for blanking panels ;) From shaeffer at neuralscape.com Tue Jun 17 12:59:34 2008 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> References: <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> Message-ID: <20080617195934.GB20890@synapse.neuralscape.com> On Tue, Jun 17, 2008 at 11:41:34AM -0700, Jim Lux wrote: > > Well.. to be fair, there were (and still are) businesses out there > (particularly a few years ago) that didn't fully understand the > concept of needing net profit. (ah yes, the glory days of startups > "buying market share" in the dot-com bubble) And, some folks made a > fine living in the mean time. (But, then, those folks weren't the > owners, were they, or if they were, in a limited sense, they now have > some decorative wallpaper..) > Hi Jim, I think you have the common view about this. The reality is many of those same companies would be making money today. They were just ahead of their time -- which is very common here in Silicon Valley. Having good ideas or exceptional technology is not enough -- you need to time it right -- or you don't survive. I love the example of IP telephony. I first heard of friends in IP telephony startups in 1993-4 time frame. They and a long line of others went bankrupt, until Skype hit it big. The .com bust is about as hyped as you can get. They used to love to bash San Francisco as the symbol of the bust. Take a look at this month's cover of San Francisco magazine. Timing is everything in life. (smiles ;) http://www.sanfranmag.com/ And even worse, they lumped the failure of big iron companies and especially Sun Microsystems during that time with the .com bust, when, in reality, everyone was moving to commodity hardware and Linux during that time frame -- the business press just didn't want to report that at the time... Go figure. And timing isn't just about being to early. A good example today is all those new Internet search companies that are litering Silicon Valley today. Most of them will fail, because they are too late. Some will get bought out -- but most are going to fail. Timing works both ways. Just my view from the trenches of Silicon Valley, Karen -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer@neuralscape.com http://www.neuralscape.com From James.P.Lux at jpl.nasa.gov Tue Jun 17 13:25:49 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080617195934.GB20890@synapse.neuralscape.com> References: <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> <20080617195934.GB20890@synapse.neuralscape.com> Message-ID: <6.2.5.6.2.20080617131658.02f713e8@jpl.nasa.gov> At 12:59 PM 6/17/2008, Karen Shaeffer wrote: >On Tue, Jun 17, 2008 at 11:41:34AM -0700, Jim Lux wrote: > > > > Well.. to be fair, there were (and still are) businesses out there > > (particularly a few years ago) that didn't fully understand the > > concept of needing net profit. (ah yes, the glory days of startups > > "buying market share" in the dot-com bubble) And, some folks made a > > fine living in the mean time. (But, then, those folks weren't the > > owners, were they, or if they were, in a limited sense, they now have > > some decorative wallpaper..) > > > >Hi Jim, >I think you have the common view about this. The reality is many of >those same companies would be making money today. They were just >ahead of their time -- which is very common here in Silicon Valley. Hmmm.. I don't know about that.. having a solution with no problem that needs to be solved isn't a valid business model. You could equally well argue that a company with lots of smart employees, but no ideas, is also ahead of it's time. Either you have all the elements, or you don't. >Having good ideas or exceptional technology is not enough -- you >need to time it right -- or you don't survive. I love the example >of IP telephony. I first heard of friends in IP telephony startups >in 1993-4 time frame. They and a long line of others went bankrupt, >until Skype hit it big. Back in 93, the telcos were still talking ISDN and ATM to the desktop as their model..whether it delivered IP packets (X.25, anyone?) or circuit switched digital voice was sort of immaterial. Big firms already mixed voice/data on their data connections (e.g. Micom was building such data/voice switches for decades by then), but what made IP telephony take off, I think, was sufficient installed density of sufficiently high speed internet access with flat rate billing. It's the flat rate billing that really did it, because it got around the "pay per minute" model for standard telephony, which was already on the way out for big customers, but very entrenched for the consumer. The long distance resellers really cashed in, because they could buy flat rate bulk and resell by the minute for 5c/minute or less. >The .com bust is about as hyped as you can get. They used to love >to bash San Francisco as the symbol of the bust. Take a look at this >month's cover of San Francisco magazine. Timing is everything in >life. (smiles ;) > >http://www.sanfranmag.com/ > >And even worse, they lumped the failure of big iron companies >and especially Sun Microsystems during that time with the .com >bust, when, in reality, everyone was moving to commodity hardware >and Linux during that time frame -- the business press just didn't >want to report that at the time... Go figure. To a certain extent, the big iron companies had troubles, though, because of the .com bubble in general. All this cash flooding into the market looking for investments, so really, really speculative ventures got funded (sort of like cash flooding into the collateralized debt obligation/mortgage backed securities market), and that funding drove deposits on equipment from the big iron companies, etc. From landman at scalableinformatics.com Tue Jun 17 14:12:00 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <6.2.5.6.2.20080617131658.02f713e8@jpl.nasa.gov> References: <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> <20080617195934.GB20890@synapse.neuralscape.com> <6.2.5.6.2.20080617131658.02f713e8@jpl.nasa.gov> Message-ID: <485828A0.5040701@scalableinformatics.com> Jim Lux wrote: > At 12:59 PM 6/17/2008, Karen Shaeffer wrote: >> On Tue, Jun 17, 2008 at 11:41:34AM -0700, Jim Lux wrote: >> > >> > Well.. to be fair, there were (and still are) businesses out there >> > (particularly a few years ago) that didn't fully understand the >> > concept of needing net profit. (ah yes, the glory days of startups >> > "buying market share" in the dot-com bubble) And, some folks made a >> > fine living in the mean time. (But, then, those folks weren't the >> > owners, were they, or if they were, in a limited sense, they now have >> > some decorative wallpaper..) >> > >> >> Hi Jim, >> I think you have the common view about this. The reality is many of >> those same companies would be making money today. They were just >> ahead of their time -- which is very common here in Silicon Valley. > > Hmmm.. I don't know about that.. having a solution with no problem that > needs to be solved isn't a valid business model. You could equally well [hmmm.... how do we get Jim to lecture to VC's about funding the next social wannabe linky-linky site? ] [...] >> And even worse, they lumped the failure of big iron companies >> and especially Sun Microsystems during that time with the .com >> bust, when, in reality, everyone was moving to commodity hardware >> and Linux during that time frame -- the business press just didn't >> want to report that at the time... Go figure. Yup. Most of them still don't want to admit it. > To a certain extent, the big iron companies had troubles, though, > because of the .com bubble in general. All this cash flooding into the > market looking for investments, so really, really speculative ventures > got funded (sort of like cash flooding into the collateralized debt > obligation/mortgage backed securities market), and that funding drove > deposits on equipment from the big iron companies, etc. Well, there were secondary affects. Cisco having a $0.5T valuation was one of those. At SGI, one of the "oh feces" moments we had was when TJ, who wasn't throwing up in a hotel pool somewhere, noted that some SGI gear made for great web servers in 1995 or so. Then we (while I was there) managed to completely miss that market. Coulda rode the bubble up ... More seriously, getting the business model right is hard. Making it work is hard. Getting investments from groups that understand these things is hard. There was too much speculation, not enough focus upon the model. I would argue there is lots of that today with the social bits. How many social websites are there? What are they really worth? What is their revenue model? How will they make money (apart from being sold to larger companies)? Why are VCs funding them? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From diep at xs4all.nl Tue Jun 17 14:28:24 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <061720081900.1570.485809EA0009C560000006222214756402089C040E99D20B9D0E080C079D@comcast.net> References: <061720081900.1570.485809EA0009C560000006222214756402089C040E99D20B9D0E080C079D@comcast.net> Message-ID: Richard, The big question that we should raise to answer your HPC market is whether assembler coded x64 SSE2 code also counts. Nothing as weird as all those bizarre SSE2 instructions. There is no way you can make an objective analysis why some weirdo instructions are in and why some very useful stuff is not in. Let's zoom in into one detail. What we really need in SSE2 to really speed up bigtime some FFT type transforms as well as make it attractive to all kind of small sized codes is to have an instruction lowbitsmultiply. If we represent a register as 2 x 64 bits integers: A:B and we multiply with C:D. Then we want to be able to execute the next 2 instructions, especially at intel processors: lowbitsmultiply : A:B * C:D ===> AC (mod 2^64) : BD (mod 2^64) highbitsmultiply : A:B * C:D ===> AC >> 64 : BD >> 64 where '>>' means shiftright 64 Basically in floating point the itanium could have such instructions as the above, so why not the PC processors also for 64 bits? Of course we need these 2 instructions a tad faster than currently multiplying an integer is in the integer units. We want a throughput of 1 cycle without blocking other execution units of course. When further vectorizing the SSE units it is of course not nice when a single instruction in one of the execution units blocks all others. There is no real perfect hardware from HPC viewpoint. Fiddling with SSE2 instructions is something very few programmers are good at in order to get their thing done in a cryptic manner, as algorithms and caches haven't become simpler since the 80s. So i would argue that the number crunching market already gets dominated by SSE2 acceleration and will see even more of that type of stuff, which basically means a return to the 80s in some sense for programmers who want to get the maximum out of a CPU. Interesting now is when Intel comes with something and AMD with yet unother crunching koncept (YUCK), taking care more cash gets pocketted by programmers. So that is good news for the jobless low level programmers (me, says the fool) :) Note if GPU's get equipped with this type of instructions, even when just 32 bits, that would already be a big step, as that allows CRT (Chinese Remainder Theorem) to get the job done, which is a manner to currently solve it at the PC. Point is that register size and which type of instructions your hardware supports matters. It especially matters to release which ones you support. Videocards simply have the luck that our coordinate system can get expressed easily with 32 bits floating points. When needed even 16 bits. So i'd not wait for a GPU that is entirely 64 bits double precision and/or 64 bits integers, as that would slow them down for their biggest market. Note that it would on paper be possible right now to make a chessprogram within a GPU run real fast, if the RAM accesses would be latency optimized (so 2 random acceses of all stream processors, each 10000 cycles to the RAM, first access being a read, the second one a write). All other theoretic problems i solved on paper. Yet this small 'problem' is dominating it that much that i'm not gonna gamble programming in order to find out perhaps that my assumption is correct :) However to reveal some of the math to compare the GPU's versus the CPU's when i would design a new 'braindead' chessprogram: Let's say we have 240 cores 32 bits at 0.675 Ghz Let's assume now we need 10k cycles for each node at each streamprocessor: (1 nps = 1 node per second = 1 chessposition a second searching for the holy grail) I assume on average 1 instruction a cycle (dangerous assumption considering there is 2 RAM hits also) 675000k / 10k = 67.5k nps 240 * 67.5k nps = 16.2 million nps Now if we compare with what i fight against in tournaments, that is a skulltrail mainboard with 2 xeon cpu's overclocked to 4Ghz, so 8 cores in total with big RAM. If i see what practical nps is that fast chessprograms get at it right now, then that is about 20 million nps. PC faster than GPU. I will skip all kind of technical discussions, such as where PC loses something; its memory controller is not near fast enough to serve every node, so it loses bigtime there last plies, at least 20-30%, and that we assume the same speedup for 8 cores versus 240 cores, where game tree search is one of the hardest to parallellize challenges; so you effectively will lose a lot more at 240 cores than at 8 cores. In fact i would guess it's 30% efficiency for the GPU versus 87.5% for the PC. Yet there is things to discover there which make up for an interesting challenge so i skip that algorithmic discussion entirely here, for now. In short for problems that the past were latency oriented, the dominating factor is: "how many instructions per second can you execute". The PC simply wins it from the GPU here, this for a typical 32 bits problem (in case of my chess software). The PC simply can effectively execute more instructions a cycle, when also counting the instructions it can skip by taking branches. Note that the PC also wins it in powerconsumption at 4Ghz @ 2 sockets. It has taken me many months to redo the above math, trying to find a solution to get somehow GPU faster than PC. Didn't manage so far. The biggest 2 differences between a CPU and a GPU when doing such math is the low clock of the GPU versus the high clock of the CPU. CPU at events already wins a factor 6 nearly just based upon clockspeed. I've showed up at a world championship in 2003 with a supercomputer with 500Mhz cpu's and fought against opponents at 2.8Ghz MP Xeon cpu's. Also a factor 6 difference nearly already in clockspeed. That is really a big dang in your selfconfidence. Maybe it is good that Greg Lindahl didn't join in that event, he would've googled a tad and showed up with that one-liner of Seymour Cray. Vincent On Jun 17, 2008, at 9:00 PM, richard.walsh@comcast.net wrote: > > -------------- Original message -------------- > From: Jim Lux > > > Well.. to be fair, there were (and still are) businesses out there > > (particularly a few years ago) that didn't fully understand the > > concept of needing net profit. (ah yes, the glory days of startups > > "buying market share" in the dot-com bubble) And, some folks made a > > fine living in the mean time. (But, then, those folks weren't the > > owners, were they, or if they were, in a limited sense, they now > have > > some decorative wallpaper..) > > > > Hey Jim, > > Gold rushes are good (and greed I guess too ... ;-) ...) ... there > IS often gold in them there hills, it is just that very few if > anyone knows exactly where. So, the less risk averse among us and > those with more money than sense (thankfully, I say) starting > digging. Most of their trials end in error, but the rest of us > benefit from the few that are lucky/smart enough to find it. I > think you are assuming that the futures are far more predictable > than it in fact is, even for the best and brightest like > yourself ... what percentage of the HPC market will accelerators > have at this time next year? > > Regards, > > rbw > > -- > > "Making predictions is hard, especially about the future." > > Niels Bohr > > -- > > Richard Walsh > Thrashing River Consulting-- > 5605 Alameda St. > Shoreview, MN 55126 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From shaeffer at neuralscape.com Tue Jun 17 14:51:34 2008 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <6.2.5.6.2.20080617131658.02f713e8@jpl.nasa.gov> References: <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> <20080617195934.GB20890@synapse.neuralscape.com> <6.2.5.6.2.20080617131658.02f713e8@jpl.nasa.gov> Message-ID: <20080617215134.GA21429@synapse.neuralscape.com> On Tue, Jun 17, 2008 at 01:25:49PM -0700, Jim Lux wrote: > At 12:59 PM 6/17/2008, Karen Shaeffer wrote: > >Hi Jim, > >I think you have the common view about this. The reality is many of > >those same companies would be making money today. They were just > >ahead of their time -- which is very common here in Silicon Valley. > > Hmmm.. I don't know about that.. having a solution with no problem > that needs to be solved isn't a valid business model. You could > equally well argue that a company with lots of smart employees, but > no ideas, is also ahead of it's time. > > Either you have all the elements, or you don't. Have you ever worked in Silicon Valley? Much of the thinking is about what is emerging, and how long it will take to emerge. This is the mindset for the big winners, but it is also the highest risk. Once all the elements are in place, then that is the secondary market of what I call the carpet baggers. Actually there are a lot of those folks too. > >The .com bust is about as hyped as you can get. They used to love > >to bash San Francisco as the symbol of the bust. Take a look at this > >month's cover of San Francisco magazine. Timing is everything in > >life. (smiles ;) > > > >http://www.sanfranmag.com/ > > > >And even worse, they lumped the failure of big iron companies > >and especially Sun Microsystems during that time with the .com > >bust, when, in reality, everyone was moving to commodity hardware > >and Linux during that time frame -- the business press just didn't > >want to report that at the time... Go figure. > > > To a certain extent, the big iron companies had troubles, though, > because of the .com bubble in general. All this cash flooding into > the market looking for investments, so really, really speculative > ventures got funded (sort of like cash flooding into the > collateralized debt obligation/mortgage backed securities market), > and that funding drove deposits on equipment from the big iron companies, > etc. I think you nailed that. Silicon Valley is driven by greed. And no group is more full of greed than the investors of Silicon Valley -- as it should be. We all live by the golden rule -- Those with the gold make the rule. What is almost comical is how they engage in herd investing. They watch what their counterparts are investing in -- and they often need to invest in a competing company to avoid looking clueless. There is cover in that model -- If it hits big, you can say you were playing in the market, even if your investment lost. And if that market flops, you can say that everyone else made the same mistake as you. This is very common here in Silicon Valley. And it definitely was at play in the hyper-investments that set up the .com bust -- herd mentality that required everyone to be dumping money into the emerging Internet paradigm -- early. You gotta love Silicon Valley. I wouldn't live any other place in the world. (smiles ;) Thanks for your comments. They are always interesting. I actually have work to get done today. Gotta go. Karen -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer@neuralscape.com http://www.neuralscape.com From James.P.Lux at jpl.nasa.gov Tue Jun 17 15:18:02 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <485828A0.5040701@scalableinformatics.com> References: <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> <20080617195934.GB20890@synapse.neuralscape.com> <6.2.5.6.2.20080617131658.02f713e8@jpl.nasa.gov> <485828A0.5040701@scalableinformatics.com> Message-ID: <6.2.5.6.2.20080617141600.02fa7660@jpl.nasa.gov> At 02:12 PM 6/17/2008, Joe Landman wrote: >Jim Lux wrote: >>At 12:59 PM 6/17/2008, Karen Shaeffer wrote: >>>On Tue, Jun 17, 2008 at 11:41:34AM -0700, Jim Lux wrote: >>> > >>> > Well.. to be fair, there were (and still are) businesses out there >>> > (particularly a few years ago) that didn't fully understand the >>> > concept of needing net profit. (ah yes, the glory days of startups >>> > "buying market share" in the dot-com bubble) And, some folks made a >>> > fine living in the mean time. (But, then, those folks weren't the >>> > owners, were they, or if they were, in a limited sense, they now have >>> > some decorative wallpaper..) >>> > >>> >>>Hi Jim, >>>I think you have the common view about this. The reality is many of >>>those same companies would be making money today. They were just >>>ahead of their time -- which is very common here in Silicon Valley. >>Hmmm.. I don't know about that.. having a solution with no problem >>that needs to be solved isn't a valid business model. You could equally well > >[hmmm.... how do we get Jim to lecture to VC's about funding the >next social wannabe linky-linky site? ] Ahh... those sites have advertising, which is their business model. The links are just there to get you to see the ads and are more socially acceptable and allow a narrower tailoring to specific target markets allowing higher ad rates, than something of broad appeal (mentos & coke or porn). >Well, there were secondary affects. Cisco having a $0.5T valuation >was one of those. At SGI, one of the "oh feces" moments we had was >when TJ, who wasn't throwing up in a hotel pool somewhere, noted >that some SGI gear made for great web servers in 1995 or so. Then >we (while I was there) managed to completely miss that >market. Coulda rode the bubble up ... > >More seriously, getting the business model right is hard. Making it >work is hard. Getting investments from groups that understand these >things is hard. There was too much speculation, not enough focus >upon the model. I would argue there is lots of that today with the >social bits. How many social websites are there? What are they >really worth? What is their revenue model? How will they make >money (apart from being sold to larger companies)? Why are VCs funding them? The VCs funding them do so because it's a high risk gamble that can potentially pay off big. Not everyone wants to invest in CDs or mutual funds. You could, for instance, invest in something that has a probability of paying off of 0.01, but has a return of 1500% and come out ahead, in an expected value sense. There's also all manner of money to be made in the process of doing the investing (i.e. fees). As they say in the horse racing business, the guys shoveling the manure always make money, even the horse owners lose money overall. From James.P.Lux at jpl.nasa.gov Tue Jun 17 15:24:54 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <20080617215134.GA21429@synapse.neuralscape.com> References: <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> <6.2.5.6.2.20080617113801.02f7ca70@jpl.nasa.gov> <20080617195934.GB20890@synapse.neuralscape.com> <6.2.5.6.2.20080617131658.02f713e8@jpl.nasa.gov> <20080617215134.GA21429@synapse.neuralscape.com> Message-ID: <6.2.5.6.2.20080617151932.02fc5a58@jpl.nasa.gov> At 02:51 PM 6/17/2008, Karen Shaeffer wrote: >On Tue, Jun 17, 2008 at 01:25:49PM -0700, Jim Lux wrote: > > At 12:59 PM 6/17/2008, Karen Shaeffer wrote: > > >Hi Jim, > > >I think you have the common view about this. The reality is many of > > >those same companies would be making money today. They were just > > >ahead of their time -- which is very common here in Silicon Valley. > > > > Hmmm.. I don't know about that.. having a solution with no problem > > that needs to be solved isn't a valid business model. You could > > equally well argue that a company with lots of smart employees, but > > no ideas, is also ahead of it's time. > > > > Either you have all the elements, or you don't. > >Have you ever worked in Silicon Valley? Much of the thinking >is about what is emerging, and how long it will take to emerge. Thinking yes, profit/revenue no. >This is the mindset for the big winners, but it is also the >highest risk. I would say that it's the folks who think and identify what the elements are and when they will be in place that have the highest returns. >Once all the elements are in place, then that is >the secondary market of what I call the carpet baggers. Actually >there are a lot of those folks too. I believe the term is "fast followers" > > > > To a certain extent, the big iron companies had troubles, though, > > because of the .com bubble in general. All this cash flooding into > > the market looking for investments, so really, really speculative > > ventures got funded (sort of like cash flooding into the > > collateralized debt obligation/mortgage backed securities market), > > and that funding drove deposits on equipment from the big iron companies, > > etc. > >I think you nailed that. Silicon Valley is driven by greed. And no group is >more full of greed than the investors of Silicon Valley -- as it should be. I suspect that there are folks in other places who are substantially greedier and who think that "they're the smartest guys in the room", and who play in far more ephemeral and speculative venues than actual ideas and products. Structured Investment Vehicles with multiple tranches are just an example.. entirely a construct of paper and one economic model against another to hopefully have a positive return. No technology ideas or better mousetraps needed. It's all moving future money from one place to another, betting that your risk model and corresponding asset allocation strategy is better than someone else's. >We all live by the golden rule -- Those with the gold make the rule. What >is almost comical is how they engage in herd investing. They watch what >their counterparts are investing in -- and they often need to invest in >a competing company to avoid looking clueless. There is cover in that >model -- If it hits big, you can say you were playing in the market, >even if your investment lost. And if that market flops, you can say that >everyone else made the same mistake as you. This is very common here in >Silicon Valley. And in the entertainment business. Look how many clone "high concept" movies/TV shows there are. From perry at piermont.com Tue Jun 17 18:26:17 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:18 2010 Subject: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: (Mark Hahn's message of "Tue\, 17 Jun 2008 15\:05\:33 -0400 \(EDT\)") References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <4856F7EC.7060000@ussarna.se> <20080617070116.nqbmzub87zks8084@webmail.jpl.nasa.gov> <6.2.5.6.2.20080617102708.02fa63a0@jpl.nasa.gov> <20080617182356.GA28059@bx9.net> Message-ID: <87d4mf7cue.fsf@snark.cb.piermont.com> Mark Hahn writes: >> I'm moving into only gives me enough power for 16 machines per >> rack. But older colos are only 2/3 of that. These are HPCish nodes. > > I guess I'm not surprised, but then again, it does make me scratch > my head about vendors who are still in love with blades. you know, > headlines like "HP puts 1000 cores in a rack". and who buys them? > I picture PHB's buying blade chassis and then putting just one per > rack for heat-density reasons. I have worked with companies using them quite successfully, in fully populated racks. If you have a machine room in an office tower in midtown Manhattan (and some people have no choice but to do that), your main concern is space, not power. There are places where you can buy more power far more cheaply than more space. -- Perry E. Metzger perry@piermont.com From sean at duke.edu Wed Jun 18 07:37:31 2008 From: sean at duke.edu (Sean Dilda) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] HPC Position available in Durham, NC Message-ID: <48591DAB.80902@duke.edu> (I'm resending this as the first version had an html attachment that was too big for this list) If there are any experience HPC Sysadmins in the Raleigh-Durham area or looking to move to it, please take a look at this job posting. We're stepping up our HPC and Research Computing efforts and are looking for an extra Systems Administrator to help with the extra load. If you have any questions about the position, please feel free to contact me off-list. Thanks, Sean *If you are interested in this position, please apply on-line at http://www.hr.duke.edu/jobs/external.html and use 400214967 in the requisition number field. POSITION AVAILABLE POSITION TITLE: Analyst, IT (High Performance Computing) JOB CODE: 2423 JOB BAND / LEVEL: C JOB FAMILY: 08 (Information Technology) WORK SCHEDULE: Normal hours worked DEPARTMENT CONTACT: Stephen Galla (galla@duke.edu) 919-668-4236 POSITION SUMMARY: High Performance Computing is an important component of shared services provided by Duke?s Office of Information Technology (OIT). Work in this position involves planning, implementing, and supporting Duke?s Shared Cluster Resource (DSCR) and other high performance computing environments in support of research computing. The DSCR is a High Performance Computing Cluster (HPCC) designed for parallel and single-threaded jobs. The DSCR runs Linux and Sun Grid Engine (batch scheduler). This position reports to the Sr. Manager of Collaborative Systems and will work closely with OIT?s management, systems administrators and researchers to provide operational support of high performance computing. A working knowledge of Red Hat Enterprise Linux (RHEL) or CentOS is required. Familiarity with programming languages (including Shell, Perl, Java, PHP, and Ruby) is preferred. DUTIES & WORK PERFORMED: ? Administration of Duke?s Shared Cluster Resource (DSCR) and other high performance computing environments. ? Interfacing with peer administrators supporting Linux based systems. ? Interfacing with researchers to plan and implement participation within the cluster. ? Proactively initiate measures to ensure operational availability and performance of the DSCR. ? Respond, troubleshoot, resolve and document system incidents. ? Maintain monitoring, logging, backup and restoration of the DSCR infrastructure. SOFT SKILLS: ? Ability to work within a team in a demanding, fast-paced environment. ? Must have good planning and organizational skills, strong verbal and written communication skills. ? Highly motivated individual able to drive projects to completion. ? Ability to handle multiple concurrent activities and have a flexible, positive attitude. ? Demonstrated ability to track, organize, prioritize and execute project and operational workloads. ? Excellent analytical and problem solving skills. QUALIFICATIONS & EXPERIENCE: The most qualified candidates will exhibit demonstrated ability or experience in the following areas. ? Solid working knowledge of Red Hat Enterprise Linux (RHEL) or CentOS, networking and account management. ? Experience packaging, installing and supporting 3rd party applications. ? Experience with scripting and programming languages (Shell, Perl, Java, PHP, and Ruby). ? Experience with kick start and the management of a large number of systems. ? Experience with NFS and NIS. ? Experience with NetApp filers. ? Ability to write end-user documentation for non-technical staff. ? Experience developing project plans and time estimates. ? Proven design and debugging skills. ? Familiarity with SGE (Sun Grid Engine) and other distributed computing tools. EDUCATION & EXPERIENCE: Required: ? Equivalent combination of relevant education and experience to a BA or BS in Math or Computer Science or related field. DATE OF POSTING: June 17, 2008 Pos#: 50468910 Req#: 400214967 From prentice at ias.edu Wed Jun 18 07:51:04 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" Message-ID: <485920D8.2030309@ias.edu> I seem to have muddied the waters of the original NVIDIA/CUDA post. Someone made the inaccurate statement that CUDA programming is difficult and time-consuming. I cited the MD5-example of my colleague as an example of how easy it is to port the code, and how significant the performance improvements could be. Some subscribers to this list questions these results,and the discussion quicky turned away from NVIDIA/CUDA/GPUs to MD5. I forward the e-mails to my colleague. Since he doesn't subscribe to this list, I'm replying based on information he has provided me. He hasn't asked me to reply on his behalf, I'm doing this on my onw to contribute to the discussion. My colleague who did this work, Mario Juric, is a member at the Institute for Advanced Study (member = postdoc) studying Astrophysics. Contrary to the assertion by Vincent that CUDA/GPUs are only for hobbyists, Mario is very interested in using GPUs to speed up his astrophysics research. The biggest hindrance to doing "real" work with GPUs is the lack of dual-precision capabilities. As we all know, that hindrance was eliminated yesterday. In November of 2007, Mario organized the AstroGPU Workshop here at the Institute to discuss the use of GPUs in Astrophysics (http://www.astrogpu.org/). This workshop serves as proof that there are scientists serious about using GPUs for real work (not just hobbyists). Now about that MD5 discussion... I forwarded Mario the replies to my post, and he replied thusly: Hi Prentice, The guy ... is right -- it's throughput for a lot of simultaneous computations, not for a computation of a single hash. If you attempt to compute a single hash on an entire card, you won't get any improvement. Same as you wouldn't if you tried it on a single vs. quad core CPU. But if you compute four hashes, than single vs. quad makes a huge difference. And the GPU cards are effectively 128 core CPUs, so when you need to compute millions of hashes... Feel free to e-mail them my online writeup at: http://majuric.org/software/cudamd5/ The full source code is there as well. That should clear a lot of things up. For those of you with questions about the MD5 performance, please see the link above, and better yet, take a look at Mario's code. The link includes lots of pretty graphs ;) Elcomsoft did the same thing, and sells it commercially: http://www.engadget.com/2007/10/24/elcomsoft-turns-your-pc-into-a-password-cracking-supercomputer/ NVIDIA has promised us some new GPUs through their Professor Partner program. I'm sure once we get our hands on them, we'll do more coding/benchmarking. Not sure if they'll be DP-capable units. -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ From kus at free.net Wed Jun 18 08:25:51 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Tyan S2932 and lm_sensors Message-ID: Sorry, do somebody have correct sensors.conf file for Tyan S2932 motherboard ? There is no lm_sensors configuration file for this mobos on Tyan site :-( Yours Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry Moscow From diep at xs4all.nl Wed Jun 18 13:14:28 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <485920D8.2030309@ias.edu> References: <485920D8.2030309@ias.edu> Message-ID: Prentice, No one doubts your collegue. Taking a md5 is very fast on PC processors. The code is very simple. Just a few lines of code it is. At my raid10 array i take a lot of md5sums for big files (chess endgame table bases), the limiting factor is the i/o speed. I/O delivers oh in my case a 100MB/s. Running 128 parallel sessions of md5sum is not so interesting at all, we all believe this can be done fast. Even if you would have a harddrive that can deliver 20 GB/s, still the limiting factor taking md5sum with a GPU is the PCI-e bandwidth of what is it say 2GB/s or so? So you can never stream enough data to the videocard to declare that it is useful to handle your md5sums, if your main processor, being a cheapo PC processor from years ago, already is doing more than fine there, considering the bottleneck. So this experiment of your high esteemed collegue is an example of an application that you do not want to buy those cards for. Of course the reason he did do it, is obvious. He wanted to know something about the throughput it can deliver. Yet the limitation of that throughput, namely that you have that throughput only within its registerfile/local cache, means basically bad news. Maybe it is an idea for your collegue to measure how many instructions per second he managed to get executed. For the new cards getting close to say 0.15 Tera instructions per cycle would be interesting information to have. Knowing the effective number of gigabytes per second it can process for md5 is however not interesting at all. Because the above is so trivial, i guess that caused some here. Interesting of course is to parallellize taking a md5. Yet being able to parallellize taking a md5sum of course defeats the original purpose of the algorithm, as it means of course by deduction of that fact that it is possible to fool md5sum (modifying a file such that it contains the content you want it to have meanwhile giving the same md5sum output). MD5 is hopelessly outdated for that reason, as it is very insecure, giving online hackers opportunities to do bad things at your file system. Vincent On Jun 18, 2008, at 4:51 PM, Prentice Bisbal wrote: > I seem to have muddied the waters of the original NVIDIA/CUDA post. > Someone made the inaccurate statement that CUDA programming is > difficult > and time-consuming. I cited the MD5-example of my colleague as an > example of how easy it is to port the code, and how significant the > performance improvements could be. > > Some subscribers to this list questions these results,and the > discussion > quicky turned away from NVIDIA/CUDA/GPUs to MD5. I forward the e-mails > to my colleague. Since he doesn't subscribe to this list, I'm replying > based on information he has provided me. He hasn't asked me to > reply on > his behalf, I'm doing this on my onw to contribute to the discussion. > > My colleague who did this work, Mario Juric, is a member at the > Institute for Advanced Study (member = postdoc) studying Astrophysics. > Contrary to the assertion by Vincent that CUDA/GPUs are only for > hobbyists, Mario is very interested in using GPUs to speed up his > astrophysics research. The biggest hindrance to doing "real" work with > GPUs is the lack of dual-precision capabilities. As we all know, that > hindrance was eliminated yesterday. > > In November of 2007, Mario organized the AstroGPU Workshop here at the > Institute to discuss the use of GPUs in Astrophysics > (http://www.astrogpu.org/). This workshop serves as proof that > there are > scientists serious about using GPUs for real work (not just > hobbyists). > > Now about that MD5 discussion... I forwarded Mario the replies to my > post, and he replied thusly: > > > > Hi Prentice, > The guy ... is right -- it's throughput for a lot of simultaneous > computations, not for a computation of a single hash. If you > attempt to > compute a single hash on an entire card, you won't get any > improvement. > Same as you wouldn't if you tried it on a single vs. quad core CPU. > But > if you compute four hashes, than single vs. quad makes a huge > difference. And the GPU cards are effectively 128 core CPUs, so > when you > need to compute millions of hashes... > > Feel free to e-mail them my online writeup at: > > http://majuric.org/software/cudamd5/ > > The full source code is there as well. That should clear a lot of > things > up. > > > > For those of you with questions about the MD5 performance, please see > the link above, and better yet, take a look at Mario's code. The link > includes lots of pretty graphs ;) > > > > Elcomsoft did the same thing, and sells it commercially: > > http://www.engadget.com/2007/10/24/elcomsoft-turns-your-pc-into-a- > password-cracking-supercomputer/ > > > > NVIDIA has promised us some new GPUs through their Professor Partner > program. I'm sure once we get our hands on them, we'll do more > coding/benchmarking. Not sure if they'll be DP-capable units. > > -- > Prentice Bisbal > Linux Software Support Specialist/System Administrator > School of Natural Sciences > Institute for Advanced Study > Princeton, NJ > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From prentice at ias.edu Wed Jun 18 14:13:30 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> Message-ID: <48597A7A.9070101@ias.edu> Vincent Diepeveen wrote: > > Running 128 parallel sessions of md5sum is not so interesting at all, we > all believe this can be done fast. > Vincent, That is the whole point of my original posting. The point was NEVER to demonstrate the use of GPUs for streaming MD5-encrypted data. This was the point of my posting: 1. To prove that CUDA programming is NOT as difficult as you made it out to be. 2. To demonstrate the performance improvement you can get by parallelizing an operation using CUDA. The MD5 algorithm was perfect for this. No claims were ever made as to the need for parallelizing MD5. There is value, however, if your goal is to recover (discover?) an MD5-hashed password through a brute-force attack. Last time I checked, MD5 password s are the default for most Linux distros. 3. To show that more than just "hobbyists" are investigating GPUs. -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ From lindahl at pbm.com Wed Jun 18 14:46:04 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <485920D8.2030309@ias.edu> References: <485920D8.2030309@ias.edu> Message-ID: <20080618214603.GC31594@bx9.net> On Wed, Jun 18, 2008 at 10:51:04AM -0400, Prentice Bisbal wrote: > Someone made the inaccurate statement that CUDA programming is difficult > and time-consuming. One data point cannot prove that CUDA is easy. There are people out there claiming that FPGAs are easy to program, because they're one of the 7 people on the planet for whom programming an FPGA is easy. I've looked over CUDA and some examples, and while it's better looking than some of the other GPU programming languages out there, it's clear that it is more difficult and time-consuming than using traditional languages on traditional cpus. -- greg From kus at free.net Wed Jun 18 14:54:18 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Tyan S2932 and lm_sensors In-Reply-To: Message-ID: In message from "Seth Bardash" (Wed, 18 Jun 2008 10:32:17 -0600): >ftp://ftp.tyan.com/softwave/lms/2932.sensors.conf > > >Seth Bardash > >Integrated Solutions and Systems >1510 Old North Gate Road >Colorado Springs, CO 80921 > >719-495-5866 >719-495-5870 Fax >719-337-4779 Cell > >http://www.integratedsolutions.org > >Failure can not cope with knowledge and perseverance! > >-----Original Message----- >From: beowulf-bounces@beowulf.org >[mailto:beowulf-bounces@beowulf.org] >On Behalf Of Mikhail Kuzminsky >Sent: Wednesday, June 18, 2008 9:26 AM >To: beowulf@beowulf.org >Subject: [Beowulf] Tyan S2932 and lm_sensors > >Sorry, do somebody have correct sensors.conf file for Tyan S2932 >motherboard ? There is no lm_sensors configuration file for this >mobos > >on Tyan site :-( > >Yours >Mikhail Kuzminsky >Computer Assistance to Chemical Research Center >Zelinsky Institute of Organic Chemistry >Moscow >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf >No virus found in this incoming message. >Checked by AVG. >Version: 8.0.100 / Virus Database: 270.4.0/1507 - Release Date: >6/18/2008 7:09 AM > From landman at scalableinformatics.com Wed Jun 18 14:57:12 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <48597A7A.9070101@ias.edu> References: <485920D8.2030309@ias.edu> <48597A7A.9070101@ias.edu> Message-ID: <485984B8.2050409@scalableinformatics.com> Prentice Bisbal wrote: > Vincent Diepeveen wrote: >> Running 128 parallel sessions of md5sum is not so interesting at all, we >> all believe this can be done fast. >> > > Vincent, > > That is the whole point of my original posting. The point was NEVER to > demonstrate the use of GPUs for streaming MD5-encrypted data. This was > the point of my posting: > > 1. To prove that CUDA programming is NOT as difficult as you made it out > to be. Hi Prentis: There is a general impression that CUDA is hard. I am not sure precisely where this is coming from, but this is what I am hearing from multiple quarters. Usually from people whom have not tried it. > > 2. To demonstrate the performance improvement you can get by > parallelizing an operation using CUDA. The MD5 algorithm was perfect for > this. No claims were ever made as to the need for parallelizing MD5. > There is value, however, if your goal is to recover (discover?) an > MD5-hashed password through a brute-force attack. Last time I checked, > MD5 password s are the default for most Linux distros. > > 3. To show that more than just "hobbyists" are investigating GPUs. I think I can comment on some papers submitted to various conferences. I am privy to some work not yet published, so I can't recount that. In short, we have seen papers on segmentation of medical image data sets (Liver segmentation to be precise) on CUDA platforms, getting ~70x performance over a single machine. I am aware of some unnamed bioinformatics applications seeing ... nice ... speedups on CUDA. None of these are hobbyist things. The CUDA eco-system is growing rapidly, with real users. We have 3 CUDA machines in house, one of them my laptop :). I just need to get on a few planes so I can spend that time coding ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From kus at free.net Wed Jun 18 14:58:51 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Tyan S2932 and lm_sensors In-Reply-To: Message-ID: In message from "Seth Bardash" (Wed, 18 Jun 2008 10:32:17 -0600): >ftp://ftp.tyan.com/softwave/lms/2932.sensors.conf > > >Seth Bardash Thank you very much !! It's strange, but I didn't find this file on Tyan archive! Mikhail > >Integrated Solutions and Systems >1510 Old North Gate Road >Colorado Springs, CO 80921 > >719-495-5866 >719-495-5870 Fax >719-337-4779 Cell > >http://www.integratedsolutions.org > >Failure can not cope with knowledge and perseverance! > >-----Original Message----- >From: beowulf-bounces@beowulf.org >[mailto:beowulf-bounces@beowulf.org] >On Behalf Of Mikhail Kuzminsky >Sent: Wednesday, June 18, 2008 9:26 AM >To: beowulf@beowulf.org >Subject: [Beowulf] Tyan S2932 and lm_sensors > >Sorry, do somebody have correct sensors.conf file for Tyan S2932 >motherboard ? There is no lm_sensors configuration file for this >mobos > >on Tyan site :-( > >Yours >Mikhail Kuzminsky >Computer Assistance to Chemical Research Center >Zelinsky Institute of Organic Chemistry >Moscow >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf >No virus found in this incoming message. >Checked by AVG. >Version: 8.0.100 / Virus Database: 270.4.0/1507 - Release Date: >6/18/2008 7:09 AM > From lindahl at pbm.com Wed Jun 18 15:11:54 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] SuperMicro and lm_sensors Message-ID: <20080618221153.GA24788@bx9.net> Speaking of lm_sensors, does anyone have configs for recent SuperMicro mobos? My SuperMicro support contact doesn't have ay idea, and running sensors-detect leaves me with lots of readings which are miscalibrated. -- greg From kilian at stanford.edu Wed Jun 18 15:21:46 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <20080618214603.GC31594@bx9.net> References: <485920D8.2030309@ias.edu> <20080618214603.GC31594@bx9.net> Message-ID: <200806181521.47115.kilian@stanford.edu> On Wednesday 18 June 2008 02:46:04 pm Greg Lindahl wrote: > One data point cannot prove that CUDA is easy. There are people out > there claiming that FPGAs are easy to program, because they're one of > the 7 people on the planet for whom programming an FPGA is easy. > > I've looked over CUDA and some examples, and while it's better > looking than some of the other GPU programming languages out there, > it's clear that it is more difficult and time-consuming than using > traditional languages on traditional cpus. I'll add my 2 cents here. I've recently attended a "CUDA technical training session" given by NVIDIA in Santa Clara, CA. The presentation came along with practical exercises (laptops were provided for the hands-on session, sweet). It wasn't about going too deep into details, but from what I've seen, they've made a lot of effort to make the language understandable and usable for whoever has some programming experience. CUDA is regular C, with a small set of extensions. You definitely have to wrap your mind around some concepts (loops are irrelevant, for instance, since everything GPU-related is implicitely parallel ; you have to divide your work into smaller units which can fit in the CUDA programming model, and which are susceptible to keep the GPU pipes busy ; you don't really control how the work units are scheduled to the processors, etc.), but nothing fundamentaly different from your average parallel programming. So, yes, of course, there's a learning curve, and I only scratched the surface, but it doesn't seem to me that any experienced programmer would need more than a couple days to begin writing efficient CUDA code. We've also encountered somme oddities, like CUDA code freezing a machine running X.org (and using the proprietary NVIDIA driver), compiler segfaults or code returning incoherent results. We didn't spend too much time on debugging those, so they may very well have been related to the code itself, and not to the compiler or the driver. And moreover, it was version 1.1, and I think they released a version 2.0 since then. Anyway, CUDA is definitely not an UML-like language (it's not enough to draw boxes, you still have to write code, and NVIDIA already ported a bunch of standard libs to help you porting your applications), but it definitely looks easier than what ATI used to call CTM. Cheers, -- Kilian From jakob at unthought.net Wed Jun 18 15:45:23 2008 From: jakob at unthought.net (Jakob Oestergaard) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> Message-ID: <20080618224523.GA18052@unthought.net> On Wed, Jun 18, 2008 at 10:14:28PM +0200, Vincent Diepeveen wrote: > Prentice, > > No one doubts your collegue. > Ok the following has nothing to do with any part of the discussion, but I just needed to get this out :) ... > So this experiment of your high esteemed collegue is an example of an > application that you do not want to buy those cards for. For attacks against password hashes (and similar) this is very useful. You generate strings, hash them, compare the hash to your target value. If it matches, you've found your source string (password). Disks are never involved. Your brute-force of md5 (or something similar) will run as fast as the string generators (processors) allow. Some time ago I looked into buying an FPGA card to speed up a bitsliced 3des to brute force certain hashes, but never actually got around to doing this. From an algorithmic point of view, though, this could have rocked seriously on even a low end FPGA :) The point being; this is a useful application. Even though the md5 implementation may just have been "for show" and not intended as a final product, it is very close to something that has direct uses. And sure, no, people won't buy GPUs to speed up the occational file hashing with md5sum, and I'm sure that wasn't the intention either :) -- / jakob From diep at xs4all.nl Wed Jun 18 16:16:27 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <20080618224523.GA18052@unthought.net> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> Message-ID: <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> Jakob, You are speaking here of a hobby project of a system administrator to crack a few passwords in the slowest possible way? In general very stupid algorithms do not benefit much from caching things in RAM, let alone caches and are lightyears slower than what is possible. Game tree search is a great example of this, but lately also in weather research they've found more sophisticated methods if i read this list correctly. Apart from all this. See www.wassenaar.org That is a treaty that USA wanted and many nations signed it. Called the Wassenaar treaty. Maybe therefore a new embassy from USA gets built now in the town Wassenaar (close to The Hague). Treaty, if i read it well, makes problems using public key above 512 bits. So according to that treaty, if i read it correct, trying to do anything with a SSH which uses usually 1024 bits RSA, is already meaning you are a criminal category 5 (highest category). In fact programming that SSH 1024 bits code already makes you a big enemy of the state as it is above 512 bits. So your posting rises the question with me, without thinking yet about the CUDA component of your solution, Is it legal what your friend is doing for his hobby in his sparetime? Thanks, Vincent checkout: http://www.primenumbers.net/prptop/prptop.php On Jun 19, 2008, at 12:45 AM, Jakob Oestergaard wrote: > On Wed, Jun 18, 2008 at 10:14:28PM +0200, Vincent Diepeveen wrote: >> Prentice, >> >> No one doubts your collegue. >> > > Ok the following has nothing to do with any part of the discussion, > but I just > needed to get this out :) > > ... >> So this experiment of your high esteemed collegue is an example of an >> application that you do not want to buy those cards for. > > For attacks against password hashes (and similar) this is very useful. > > You generate strings, hash them, compare the hash to your target > value. If it > matches, you've found your source string (password). > > Disks are never involved. > > Your brute-force of md5 (or something similar) will run as fast as > the string > generators (processors) allow. Some time ago I looked into buying > an FPGA card > to speed up a bitsliced 3des to brute force certain hashes, but > never actually > got around to doing this. From an algorithmic point of view, > though, this > could have rocked seriously on even a low end FPGA :) > > The point being; this is a useful application. Even though the md5 > implementation may just have been "for show" and not intended as a > final > product, it is very close to something that has direct uses. > > And sure, no, people won't buy GPUs to speed up the occational file > hashing > with md5sum, and I'm sure that wasn't the intention either :) > > -- > > / jakob > > From jlforrest at berkeley.edu Wed Jun 18 16:31:21 2008 From: jlforrest at berkeley.edu (Jon Forrest) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <200806181521.47115.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <20080618214603.GC31594@bx9.net> <200806181521.47115.kilian@stanford.edu> Message-ID: <48599AC9.3040100@berkeley.edu> Kilian CAVALOTTI wrote: > We've also encountered somme oddities, like CUDA code freezing a machine > running X.org (and using the proprietary NVIDIA driver), I'm glad you mentioned this. I've read through much of the information on their web site and I still don't understand the usage model for CUDA. By that I mean, on a desktop machine, are you supposed to have 2 graphics cards, 1 for running CUDA code and one for regular graphics? If you only need 1 card for both, how do you avoid the problem you mentioned, which was also mentioned in the documentation? Or, if you have a compute node that will sit in a dark room, you aren't going to be running an X server at all, so you won't have to worry about anything hanging? How does this behavior change, if at all, when running Windows? I'm planning on starting a pilot program to get the chemists in my department to use CUDA, but I'm waiting for V2 of the SDK to come out. Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest@berkeley.edu From bill at cse.ucdavis.edu Wed Jun 18 16:34:59 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <20080618214603.GC31594@bx9.net> References: <485920D8.2030309@ias.edu> <20080618214603.GC31594@bx9.net> Message-ID: <48599BA3.1050105@cse.ucdavis.edu> Greg Lindahl wrote: > On Wed, Jun 18, 2008 at 10:51:04AM -0400, Prentice Bisbal wrote: > >> Someone made the inaccurate statement that CUDA programming is difficult >> and time-consuming. > > One data point cannot prove that CUDA is easy. There are people out > there claiming that FPGAs are easy to program, because they're one of > the 7 people on the planet for whom programming an FPGA is easy. *chuckle*. > I've looked over CUDA and some examples, and while it's better looking > than some of the other GPU programming languages out there, it's clear > that it is more difficult and time-consuming than using traditional > languages on traditional cpus. Agreed. Traditional languages are easier, but don't express parallelism well. One approach is of course openMP, a few pragmas, and parallel friendly (loop index independent) loops and you can get reasonable speedups on SMP machines. Cuda seems to take a different approach, instead of trying to auto-parallelize a loop, it requires a function pointer to the code, and the function must declare it's exit condition. CUDA seems rather similar to openMP. Massimiliano Fatica of nvidia did a stream port, and I'll quote pieces of his code below. So instead of: for (j=0; j>>(d_b, d_c, scalar, N); cudaThreadSynchronize(); times[1][k]= mysecond() - times[1][k]; Instead of: static double a[N+OFFSET] You get: cudaMalloc((void**)&d_a, sizeof(float)*N); Instead of: for (j=0; j>>(d_a, 2.f, N); set_array<<>>(d_b, .5f, N); set_array<<>>(d_c, .5f, N); So yes, it's a change, but it does seem pretty reasonable. From perry at piermont.com Wed Jun 18 16:37:02 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> (Vincent Diepeveen's message of "Thu\, 19 Jun 2008 01\:16\:27 +0200") References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> Message-ID: <87r6auiach.fsf@snark.cb.piermont.com> Not that this mailing list is an appropriate place to discuss this, but... Vincent Diepeveen writes: > Treaty, if i read it well, makes problems using public key above 512 > bits. > > So according to that treaty, if i read it correct, trying to do anything > with a SSH which uses usually 1024 bits RSA, is already meaning you > are a criminal category 5 (highest category). Er, no. That's totally false in every way. There may be countries where this is true, but certainly in the United States, there is no problem implementing and using crypto with any length keys you like. You are also allowed to make available arbitrary open source crypto software for export so long as you send one email to alert a particular agency that you're doing so, and if software is available commercially it can also be exported with minimal trouble. Ten or fifteen years ago, things were different, but that was ten or fifteen years ago. And yes, I am an expert on this subject. I run a very large mailing list on security and cryptography. If you want to discuss this, I suspect this is not the right place to do it. Perry -- Perry E. Metzger perry@piermont.com From kilian at stanford.edu Wed Jun 18 16:57:59 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <48599AC9.3040100@berkeley.edu> References: <485920D8.2030309@ias.edu> <200806181521.47115.kilian@stanford.edu> <48599AC9.3040100@berkeley.edu> Message-ID: <200806181657.59782.kilian@stanford.edu> On Wednesday 18 June 2008 04:31:21 pm you wrote: > I'm glad you mentioned this. I've read through much of the > information on their web site and I still don't understand the usage > model for CUDA. By that I mean, on a desktop machine, are you > supposed to have 2 graphics cards, 1 for running CUDA code and one > for regular graphics? Well, we used laptops for the hands-on session, so one graphics card is sufficient. Everything is handled by the driver. I have no clue about the internals, but somehow, the CUDA code you compile generates GPU PTX assembly, which is passed to the driver for execution. That what I meant by mentioning you don't really control the scheduling of your threads: I have no idea how the GPU decides if it should better render an 3D object for your screensaver, or execute your code. It seems to be able to do both at the same time, though, since we've been shown some CUDA applications involving OpenGL rendering. And we didn't even mention the (Tri)SLI setups. I'm actually curious to know how their Tesla boxes work. Are the four graphics card treated as independent processing units (meaning there should be a scheduler somewhere to apportion the work amongst the GPUs). Or are they treated as a single GPU, ? la SLI? > If you only need 1 card for both, how do you > avoid the problem you mentioned, which was also mentioned in the > documentation? Well, you can't. :) It's not fundamentaly different from what you do with a regular CPU: if your code locks it up, your whole machine is dead, along with the other running applications. Althought it seems a bit easier to lock up a GPU rather than a CPU. Except for those which are specifically designed for this, of course (MIPS-X, anyone? hsc instruction, page 65 in [1]). > How does this behavior change, if at all, when running Windows? I think that's pretty much the same. Since proper execution of your code depends on the graphical driver's goodwill, and given their reputation on the Windows platform regarding stability, you'd better take that into account. > I'm planning on starting a pilot program to get the > chemists in my department to use CUDA, but I'm waiting > for V2 of the SDK to come out. Looks like v2 beta2 is out ([2]). [1]ftp://reports.stanford.edu/pub/cstr/reports/csl/tr/86/289/CSL-TR-86-289.pdf [2]http://www.nvidia.com/object/cuda_get.html Cheers, -- Kilian From James.P.Lux at jpl.nasa.gov Wed Jun 18 17:03:16 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> Message-ID: <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> At 04:16 PM 6/18/2008, Vincent Diepeveen wrote: >Jakob, > > >Is it legal what your friend is doing for his hobby in his sparetime? > >Thanks, >Vincent There are many things that one might do, perhaps with a cluster, that can get you into trouble with various treaties and agreements. For example, the ITAR describes a heck of a lot of really interesting things that if inappropriately transferred to someone might result in some attention from the authorities. In general, fundamental research is not subject to export controls, so if you frame your problem in terms of abstract mathematical problems, you're not going to be treading on any toes. However, start distributing it as "Jim Lux's superduper encryptor/password cracker, now with 1024 bit capability!" and it's moved from fundamental research to a product. "Defense service" is a particularly tricky area since it's pretty vague in definition. There's tricky guidelines like this: Providing guidance or instruction to a foreign person on where to find data or information related to controlled items may be a defense service even if the data is in the public domain, if the data addresses a specific problem and is being provided to help with that problem. So.. if your (foreign person) buddy is designing thermonuclear devices in their garage, and they complain about how slow it is to run the hydrocodes to simulate stuff, better not hand them that old copy of Sterling, et al., or even worse, give them rgb's website. (the latter would be too suspicious, since rgb *is* a physicist, doing monte carlo simulations no less, while Tom Sterling is *just a computer scientist*) From bernard at vanhpc.org Wed Jun 18 17:03:53 2008 From: bernard at vanhpc.org (Bernard Li) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <20080618221153.GA24788@bx9.net> References: <20080618221153.GA24788@bx9.net> Message-ID: Hi Greg: On Wed, Jun 18, 2008 at 3:11 PM, Greg Lindahl wrote: > Speaking of lm_sensors, does anyone have configs for recent SuperMicro > mobos? My SuperMicro support contact doesn't have ay idea, and running > sensors-detect leaves me with lots of readings which are > miscalibrated. Have you tried updating lm_sensors to the 3.x series? Supermicro also has their own system health monitoring tool called superodoctor which you can get here: ftp://ftp.supermicro.com/utility/Supero_Doctor_II/Linux/ Cheers, Bernard From mathog at caltech.edu Wed Jun 18 17:04:23 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] OT: LTO Ultrium (3) throughput? Message-ID: If any of you have an LTO Ultrium-3 drive what kind of speeds are you observing? On one Linux system here (kernel 2.6.24-19) we have an HP Ultrium-3 attached to an Adaptec ASC-29320ALP U320 controller. There is nothing else on that SCSI bus, termination and cable seem good. Getting into scsi-select from the BIOS shows everything set to 320. No error messages or warnings are appearing. Yet: dd if=/dev/zero of=/dev/nst0 bs=8192 count=10000 only moves 21.3MB/sec. The HP documentation indicates 432GB/hour, (compressed) which is 120 Mb/sec, so we're off by 6X (or maybe 3X for 2:1 compression, either way, a lot). The system's CPU and memory aren't rate limiting as dd if=/dev/zero of=/dev/null bs=8192 count=10000 moves 6.9GB/sec. Any thoughts where the bottleneck might be? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From landman at scalableinformatics.com Wed Jun 18 17:19:14 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> Message-ID: <4859A602.2040900@scalableinformatics.com> Jim Lux wrote: > So.. if your (foreign person) buddy is designing thermonuclear devices > in their garage, and they complain about how slow it is to run the > hydrocodes to simulate stuff, better not hand them that old copy of > Sterling, et al., or even worse, give them rgb's website. (the latter > would be too suspicious, since rgb *is* a physicist, doing monte carlo > simulations no less, while Tom Sterling is *just a computer scientist*) Hold on ... does this mean RGB is a munition? man .... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From landman at scalableinformatics.com Wed Jun 18 17:24:20 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] OT: LTO Ultrium (3) throughput? In-Reply-To: References: Message-ID: <4859A734.8090401@scalableinformatics.com> David Mathog wrote: > If any of you have an LTO Ultrium-3 drive what kind of speeds are you > observing? On one Linux system here (kernel 2.6.24-19) we have an > HP Ultrium-3 attached to an Adaptec ASC-29320ALP U320 controller. > There is nothing else on that SCSI bus, termination and cable seem good. > Getting into scsi-select from the BIOS shows everything set to 320. > No error messages or warnings are appearing. Yet: > > dd if=/dev/zero of=/dev/nst0 bs=8192 count=10000 > > only moves 21.3MB/sec. The HP documentation indicates 432GB/hour, > (compressed) which is 120 Mb/sec, so we're off by 6X (or maybe 3X for > 2:1 compression, either way, a lot). The system's CPU and memory > aren't rate limiting as > > dd if=/dev/zero of=/dev/null bs=8192 count=10000 > > moves 6.9GB/sec. > > Any thoughts where the bottleneck might be? Have you adjusted the block size up? maybe try dd if=/dev/zero of=/dev/nst0 bs=8M count=1000 this is about 8 GB, so it hopefully shouldn't take too long. At 21 MB/s, this is about 50 seconds per gigabyte, so if this takes 400 seconds, you may have a problem. This said, the tape drive numbers we see quoted are "best case" scenarios, with optimal block sizes, a strong wind at the tapes back, and no monopole flux nearby. Burning incense may or may not help the speed, YMMV... This said, have you tried the "obvious" tests, such as what does lsscsi report, and lscpi -v for that adapter, and ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From perry at piermont.com Wed Jun 18 17:41:47 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> (Jim Lux's message of "Wed\, 18 Jun 2008 17\:03\:16 -0700") References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> Message-ID: <87iqw6i7ck.fsf@snark.cb.piermont.com> Jim Lux writes: > In general, fundamental research is not subject to export controls, so > if you frame your problem in terms of abstract mathematical problems, > you're not going to be treading on any toes. However, start > distributing it as "Jim Lux's superduper encryptor/password cracker, > now with 1024 bit capability!" and it's moved from fundamental > research to a product. Under current rules, provided it is open source or generally available to any buyer, you can distribute and export cryptographic code quite freely. There are some minimal reporting requirements, but they're barely worth mentioning. That's the reason you can freely distribute things like Kerberos, OpenSSL, pgp/gpg, etc. Perry -- Perry E. Metzger perry@piermont.com From jlb17 at duke.edu Wed Jun 18 17:45:29 2008 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] OT: LTO Ultrium (3) throughput? In-Reply-To: <4859A734.8090401@scalableinformatics.com> References: <4859A734.8090401@scalableinformatics.com> Message-ID: On Wed, 18 Jun 2008 at 8:24pm, Joe Landman wrote > David Mathog wrote: >> If any of you have an LTO Ultrium-3 drive what kind of speeds are you >> observing? On one Linux system here (kernel 2.6.24-19) we have an >> HP Ultrium-3 attached to an Adaptec ASC-29320ALP U320 controller. ISTR hearing some rumblings at some point about some LTO tape drives not liking Adapted SCSI. Mine run on LSI boards, and run quite nicely. >> There is nothing else on that SCSI bus, termination and cable seem good. >> Getting into scsi-select from the BIOS shows everything set to 320. >> No error messages or warnings are appearing. Yet: >> >> dd if=/dev/zero of=/dev/nst0 bs=8192 count=10000 >> >> only moves 21.3MB/sec. The HP documentation indicates 432GB/hour, >> (compressed) which is 120 Mb/sec, so we're off by 6X (or maybe 3X for >> 2:1 compression, either way, a lot). The system's CPU and memory >> aren't rate limiting as >> >> dd if=/dev/zero of=/dev/null bs=8192 count=10000 >> >> moves 6.9GB/sec. >> Any thoughts where the bottleneck might be? > > Have you adjusted the block size up? maybe try > > dd if=/dev/zero of=/dev/nst0 bs=8M count=1000 A very good suggestion. In my testing, 32KB blocks (the default for amanda) yielded only about 27MB/s (in line with your 8KB blocks above) while 2M blocks got me to 60MB/s (all testing done with tar and a very fast disk system). Note that I couldn't go much bigger than 2M or I started getting errors like "/dev/nst1: Cannot write: Value too large for defined data type". > this is about 8 GB, so it hopefully shouldn't take too long. At 21 MB/s, > this is about 50 seconds per gigabyte, so if this takes 400 seconds, you may > have a problem. > > This said, the tape drive numbers we see quoted are "best case" scenarios, > with optimal block sizes, a strong wind at the tapes back, and no monopole > flux nearby. Burning incense may or may not help the speed, YMMV... In addition, 2x hardware compression is almost always a lovely fairly tale with absolutely no basis in reality. That said, the LTO hardware compression is rather nice, in that it recognizes uncompressible data and doesn't try to further compress it. And, as always, be sure you used the proper color goat in your ceremonies. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From mathog at caltech.edu Wed Jun 18 17:56:43 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] Re: [OT: LTO Ultrium (3) throughput? Message-ID: Joe Landman wrote: > Have you adjusted the block size up? maybe try > > dd if=/dev/zero of=/dev/nst0 bs=8M count=1000 The tape has a 64Mb on board buffer and somewhere in the manual it said that it shouldn't matter. But no, I have not tried that yet and will do so tomorrow. > This said, have you tried the "obvious" tests, such as what does lsscsi > report, and lscpi -v for that adapter, and ... Maybe it will mean more to you than it does to me... % lsscsi --long [8:0:0:0] tape HP Ultrium 3-SCSI D21D /dev/st0 state=running queue_depth=32 scsi_level=4 type=1 device_blocked=0 timeout=900 lspci -v 04:04.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10) Subsystem: Adaptec ASC-29320LPE PCIe U320 Flags: bus master, 66MHz, slow devsel, latency 32, IRQ 19 I/O ports at 4400 [disabled] [size=256] Memory at d8400000 (64-bit, non-prefetchable) [size=8K] I/O ports at 4000 [disabled] [size=256] [virtual] Expansion ROM at d8600000 [disabled] [size=512K] Capabilities: [dc] Power Management version 2 Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 Enable- Capabilities: [94] PCI-X non-bridge device /proc/interrupts shows that this controller is the only thing on IRQ 19. Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From diep at xs4all.nl Wed Jun 18 18:02:26 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:18 2010 Subject: [Beowulf] OT: LTO Ultrium (3) throughput? In-Reply-To: References: Message-ID: <929B39C3-7FCE-4890-BAEE-22E32D8221AD@xs4all.nl> Hi Dave, Probably the i/o is your limiting factor. Using raid10 or something? Ignore that factor 2 compression by the way, that's just marketing paper. My LTO-2 gets about 28MB/s uncompressed streamspeed, for compressed files (7-zip is far superior to anything else currently under linux). Usually i also store longterm data with 10% par2 data (free linux tool is there for it as well), so that i can recover bit errors later on. That 28MB/s is at least what i observe, could be tad optimistic :) The initial problems i had also with streamspeed and also reading back was all software based. A lot of backup software was real ugly bad simply. What mattered at the raid10 array also was the file system used and how far loaded it already was. Reading files from a raid10 is not necessarily fast. Harddrives are relative fast when reading/writing blocks of 128-512KB or so, whereas filesystems use far smaller blocksizes to store in reality. Very bad to stream a couple of hundreds of GB to tape. The software used is important. Default backup software was dubious in my case and the biggest reason for problems. After that was fixed, the i/o speed is a problem when storing tiny files at tape. But well i'm a total layman with backups. Perhaps you already figured out your problem. Vincent On Jun 19, 2008, at 2:04 AM, David Mathog wrote: > If any of you have an LTO Ultrium-3 drive what kind of speeds are you > observing? On one Linux system here (kernel 2.6.24-19) we have an > HP Ultrium-3 attached to an Adaptec ASC-29320ALP U320 controller. > There is nothing else on that SCSI bus, termination and cable seem > good. > Getting into scsi-select from the BIOS shows everything set to 320. > No error messages or warnings are appearing. Yet: > > dd if=/dev/zero of=/dev/nst0 bs=8192 count=10000 > > only moves 21.3MB/sec. The HP documentation indicates 432GB/hour, > (compressed) which is 120 Mb/sec, so we're off by 6X (or maybe 3X for > 2:1 compression, either way, a lot). The system's CPU and memory > aren't rate limiting as > > dd if=/dev/zero of=/dev/null bs=8192 count=10000 > > moves 6.9GB/sec. > > Any thoughts where the bottleneck might be? > > Thanks, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From landman at scalableinformatics.com Wed Jun 18 18:09:30 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: [OT: LTO Ultrium (3) throughput? In-Reply-To: References: Message-ID: <4859B1CA.9070006@scalableinformatics.com> David Mathog wrote: > Joe Landman wrote: [...] > Maybe it will mean more to you than it does to me... > > % lsscsi --long > [8:0:0:0] tape HP Ultrium 3-SCSI D21D /dev/st0 > state=running queue_depth=32 scsi_level=4 type=1 device_blocked=0 > timeout=900 Hmmm... queue_depth of 32? On a tape? Weird. There has been some strangeness in the TCQ/NCQ layers. Might want to try to set this to 1 and see if it helps echo 1 > /sys/block/st0/device/queue_depth (note: some drivers don't support changing the queue_depth) > lspci -v > 04:04.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10) > Subsystem: Adaptec ASC-29320LPE PCIe U320 > Flags: bus master, 66MHz, slow devsel, latency 32, IRQ 19 > I/O ports at 4400 [disabled] [size=256] > Memory at d8400000 (64-bit, non-prefetchable) [size=8K] > I/O ports at 4000 [disabled] [size=256] > [virtual] Expansion ROM at d8600000 [disabled] [size=512K] > Capabilities: [dc] Power Management version 2 > Capabilities: [a0] Message Signalled Interrupts: Mask- 64bit+ Queue=0/1 > Enable- > Capabilities: [94] PCI-X non-bridge device Looks like a PCI-x adapter. This shouldn't be an issue though. Try changing the block size. 2M to 8M, and see what happens. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From lindahl at pbm.com Wed Jun 18 18:14:20 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <20080618221153.GA24788@bx9.net> References: <20080618221153.GA24788@bx9.net> Message-ID: <20080619011419.GA7040@bx9.net> On Wed, Jun 18, 2008 at 03:11:54PM -0700, Greg Lindahl wrote: > Speaking of lm_sensors, does anyone have configs for recent SuperMicro > mobos? My SuperMicro support contact doesn't have ay idea, and running > sensors-detect leaves me with lots of readings which are > miscalibrated. Thanks to the 3 people who wrote me with suggestions. It appears that the magic is this: * Upgrade to a newer version of lm_sensors (atrpms has 2.10.6, dunno if you can find lm_sensors3 rpms anywhere) * Make sure your kernel has a driver for what sensors-detect detects Since I'm on CentOS 5.1, I was behind in both respects, and the result is that "sensors" prints nonsense -- it doesn't help that all of my boxes have 2 winbond chips in them, with the older one being detected and having a driver and having none of the interesting signals. With both updates I get reasonable numbers. -- greg From diep at xs4all.nl Wed Jun 18 18:35:21 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <87iqw6i7ck.fsf@snark.cb.piermont.com> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <87iqw6i7ck.fsf@snark.cb.piermont.com> Message-ID: <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> USA and allowing encryption beyond the cracking capabilities of a 1st year computer science student... ...hmmm I remember how i did do big efforts to get vista ultimate bitlocker to work. On paper it sounds ok. AES 256 bits CBC. The idea is : your usb stick has the encryption key and only that thing has it. So no one can decrypt the partition without that usb stick. In case you forget to load that usb stick having the encryption key, there is a general manner to unlock the machine by feeding it a long code. Looks great isn't it? So far the paper... But now the usual bug; the implementation that practical was allowed by one those guys on the Perry-Sport mailing list. Of course we don't want to tire ourselves too much typing too long unlock codes... That unlock code, stored at a different USB stick is 48 digits. By the way 48 digits is how many bits? Right, that's less than 48 * (log 10/log 2) = 160 bits. So the problem for our first year student has been brought back from 256 to 160 bits already. Not that it is a hobby mine, but one day i rebooted the machine and had forgotten to put in the USB encryption stick. By accident i mistyped the key. Windows then told me: "you made a mistake somewhere in those 5 digits of the 48 digit key, please retype them". Doh. So my guess is that soon i do not need to worry when by accident i lose that USB stick... Vincent On Jun 19, 2008, at 2:41 AM, Perry E. Metzger wrote: > > Jim Lux writes: >> In general, fundamental research is not subject to export >> controls, so >> if you frame your problem in terms of abstract mathematical problems, >> you're not going to be treading on any toes. However, start >> distributing it as "Jim Lux's superduper encryptor/password cracker, >> now with 1024 bit capability!" and it's moved from fundamental >> research to a product. > > Under current rules, provided it is open source or generally available > to any buyer, you can distribute and export cryptographic code quite > freely. There are some minimal reporting requirements, but they're > barely worth mentioning. > > That's the reason you can freely distribute things like Kerberos, > OpenSSL, pgp/gpg, etc. > > > Perry > -- > Perry E. Metzger perry@piermont.com From shaeffer at neuralscape.com Wed Jun 18 18:47:44 2008 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <20080619011419.GA7040@bx9.net> References: <20080618221153.GA24788@bx9.net> <20080619011419.GA7040@bx9.net> Message-ID: <20080619014744.GA10075@synapse.neuralscape.com> On Wed, Jun 18, 2008 at 06:14:20PM -0700, Greg Lindahl wrote: > On Wed, Jun 18, 2008 at 03:11:54PM -0700, Greg Lindahl wrote: > > > Speaking of lm_sensors, does anyone have configs for recent SuperMicro > > mobos? My SuperMicro support contact doesn't have ay idea, and running > > sensors-detect leaves me with lots of readings which are > > miscalibrated. > > Thanks to the 3 people who wrote me with suggestions. > > It appears that the magic is this: > > * Upgrade to a newer version of lm_sensors (atrpms has 2.10.6, > dunno if you can find lm_sensors3 rpms anywhere) > * Make sure your kernel has a driver for what sensors-detect > detects > > Since I'm on CentOS 5.1, I was behind in both respects, and the result > is that "sensors" prints nonsense -- it doesn't help that all of my > boxes have 2 winbond chips in them, with the older one being detected > and having a driver and having none of the interesting signals. > > With both updates I get reasonable numbers. > > -- greg Hi Greg, I have some experience with lm_sensors. And you are correct that the 2.10.x release is required for many currrent system boards and recent kernels. But that isn't enough, depending on your system board. In particular, I have seen some cases where the temperatures are mislabeled. Unless the system board manufacturer gives you a configuration file, you might need to verify the mapping of the temperature labels to specific devices. YMMV. Thanks, Karen -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer@neuralscape.com http://www.neuralscape.com From john.hearns at streamline-computing.com Thu Jun 19 00:17:07 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <48599AC9.3040100@berkeley.edu> References: <485920D8.2030309@ias.edu> <20080618214603.GC31594@bx9.net> <200806181521.47115.kilian@stanford.edu> <48599AC9.3040100@berkeley.edu> Message-ID: <1213859837.4784.9.camel@Vigor13> On Wed, 2008-06-18 at 16:31 -0700, Jon Forrest wrote: > Kilian CAVALOTTI wrote: > I'm glad you mentioned this. I've read through much of the information > on their web site and I still don't understand the usage model for > CUDA. By that I mean, on a desktop machine, are you supposed to have > 2 graphics cards, 1 for running CUDA code and one for regular > graphics? If you only need 1 card for both, how do you avoid the > problem you mentioned, which was also mentioned in the documentation? Actually, I should imagine Kilian is referring to something else, not the inbuilt timeout which is in the documentation. But I can't speak for im. > Or, if you have a compute node that will sit in a dark room, > you aren't going to be running an X server at all, so you won't > have to worry about anything hanging? I don't work for Nvidia, so I can't say! But the usage model is as you say - you can prototype applications which will run for a short time on the desktop machine, but long runs are meant to be done on a dedicated back-end machine. If you want a totally desk-side solution, they sell a companion box which goes alongside and attaches via a ribbon cable. I guess the art here is finding a motherboard with the right number and type of PCI-express slots to take both the companion box and a decent graphics card for X use. > > I'm planning on starting a pilot program to get the > chemists in my department to use CUDA, but I'm waiting > for V2 of the SDK to come out. > Why wait? The hardware will be the same, and you can dip your toe in the water now. From john.hearns at streamline-computing.com Thu Jun 19 00:24:38 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <20080618221153.GA24788@bx9.net> References: <20080618221153.GA24788@bx9.net> Message-ID: <1213860288.4784.15.camel@Vigor13> On Wed, 2008-06-18 at 15:11 -0700, Greg Lindahl wrote: > Speaking of lm_sensors, does anyone have configs for recent SuperMicro > mobos? My SuperMicro support contact doesn't have ay idea, and running > sensors-detect leaves me with lots of readings which are Can't help you on the lm_sensors front, we always use IPMI these days. But hearing of miscalibrated values does not surprise me. From dnlombar at ichips.intel.com Thu Jun 19 06:50:20 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <20080619011419.GA7040@bx9.net> References: <20080618221153.GA24788@bx9.net> <20080619011419.GA7040@bx9.net> Message-ID: <20080619135020.GA8918@nlxdcldnl2.cl.intel.com> On Wed, Jun 18, 2008 at 06:14:20PM -0700, Greg Lindahl wrote: > On Wed, Jun 18, 2008 at 03:11:54PM -0700, Greg Lindahl wrote: > > > Speaking of lm_sensors, does anyone have configs for recent SuperMicro > > mobos? My SuperMicro support contact doesn't have ay idea, and running > > sensors-detect leaves me with lots of readings which are > > miscalibrated. > > Thanks to the 3 people who wrote me with suggestions. > > It appears that the magic is this: > > * Upgrade to a newer version of lm_sensors (atrpms has 2.10.6, > dunno if you can find lm_sensors3 rpms anywhere) > * Make sure your kernel has a driver for what sensors-detect > detects Did you look for /proc/acpi/thermal_zone/*/temperature The glob is for your BIOS-defined ID. If it does exist, that's the value that drives /proc/acpi/thermal_zone/*/trip_points See also /proc/acpi/thermal_zone/*/polling_frequency -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From shaeffer at neuralscape.com Thu Jun 19 07:40:38 2008 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <1213860288.4784.15.camel@Vigor13> References: <20080618221153.GA24788@bx9.net> <1213860288.4784.15.camel@Vigor13> Message-ID: <20080619144038.GA18151@synapse.neuralscape.com> On Thu, Jun 19, 2008 at 08:24:38AM +0100, John Hearns wrote: > On Wed, 2008-06-18 at 15:11 -0700, Greg Lindahl wrote: > > Speaking of lm_sensors, does anyone have configs for recent SuperMicro > > mobos? My SuperMicro support contact doesn't have ay idea, and running > > sensors-detect leaves me with lots of readings which are > > Can't help you on the lm_sensors front, we always use IPMI these days. > But hearing of miscalibrated values does not surprise me. Hi John, Yes, miscalibrated values and even the mapping of labels to expected devices can be in error, even when the lm_sensors subsystem is fully operational. This is inherent in the architecture of the system, if you will. The configuration file must have information specific to the system board implementation, rather than merely related to specific sensor chips. The generic configuration files found in distributions often are not correct (even if they look reasonable) for specific system boards. And, if you are attempting to support an appliance product line, where there are numerous system boards in various appliances, then this becomes quite a chore to manage, due to the architecture of lm_sensors subsystem. IPMI is clearly a better way to go. But lm_sensors can always be made to work, it just requires some in-house expertise in many cases. Thanks, Karen -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer@neuralscape.com http://www.neuralscape.com From rgb at phy.duke.edu Thu Jun 19 06:58:44 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> Message-ID: On Wed, 18 Jun 2008, Jim Lux wrote: > So.. if your (foreign person) buddy is designing thermonuclear devices in > their garage, and they complain about how slow it is to run the hydrocodes to > simulate stuff, better not hand them that old copy of Sterling, et al., or > even worse, give them rgb's website. (the latter would be too suspicious, > since rgb *is* a physicist, doing monte carlo simulations no less, while Tom > Sterling is *just a computer scientist*) Ah, taking my name in vain I see...;-) Don't forget, at one time, the shocking subversive rgb had a mirror of the NWFAQ: http://nuclearweaponarchive.org/Nwfaq/Nfaq0.html on his personal website, which contains nearly all the instructions for making nuclear bombs one could ever need even without a cluster. At this point the whole idea of computers (clusters or not) as weapons is just plain silly. My cell phone and PDA are "weapons" by the standards set a mere fifteen years ago -- as far as being capable of running fast enough complex enough codes to solve the problems of designing "good enough" nuclear weapons is concerned. Well, I'm not sure about my cell phone, but my PDA has 64 MB of memory and a 400 MHz processor -- that is more than enough. Or my son's PS3 playstation, if you don't like the floating point on my PDA. Who cares? Compare the computing available to the people who actually DESIGNED the early thermonuclear devices to a single dual CPU, quad core (8 core total) 64 bit, over the counter compute "node" with 16 GB of memory (total cost WAY less than what I paid for my original IBM PC with its 15 MB of attached storage even ignoring the significant inflation accumulation since 1982). A joke, man, a joke. My laptop -- even my old one and not my new one or my still newer one that should get here tomorrow or Monday -- can design nuclear bombs, thermo or otherwise, and any halfway competent computational physicist can write the code needed to do so using the excellent numerical libraries that are readily available within any distribution of linux. Remember, building nuclear bombs ranges from VERY easy to not terribly difficult, until you want to build a BIG one, or a small one, or one that has to be "precise" in its performance. Terrorists need none of these things -- sloppy to the point of being a fizzle of sorts is still more than good enough. Building a kiloton-range U-235 suicide bomb (given the U-235) I could do in roughly a day, reusing a 12 gauge shotgun already in the house. I'd probably have to buy a metal lathe or small metal smelter, I admit, depending on how the Uranium was delivered to me. To make a really GOOD (efficient) bomb and get into the 10 KT range might take me a week or more to run down a few more materials -- a neutron source and reflector, some concrete, a remote trigger (suicide not being all that appealing to me), a superior propellant to gunpowder (e.g. a lump of TNT, homemade or otherwise). Handling or making the explosives would be the most dangerous or difficult part of the process as I'm not a chemist and it is very easy to blow yourself up, but I'm certain that it is still quite easy to get TNT and commercial triggers if you really really want to and have timescale months to acquire them legally and have no criminal record and have a legitimate construction project of some sort that requires blowing up some rocks on your property in the country. Building a plutonium bomb (given the plutonium) is considerably more difficult and not a project for MY garage even with a lathe. Plutonium is downright dangerous to handle, and the construction requires shaped lensing charges, which in turn requires an ability to make precision casts of at least two different explosives with differential burn rates and to set them off with high speed triggers at exactly the same time. However, "exactly the same time" very probably means something quite different now from what it did in 1945. Again, my PC contains nanosecond clocks; over the counter electronics can probably provide enough switching speed and power to get within the range that will suffice for an implosion device, especially one slightly overengineered in other respects. Sure, the government controls known fast switches, but I very much doubt that they control the knowledge of how to make them, and I doubt that they are that hard to make. I'd say a plutonium bomb (given the plutonium) is a project that would cost somewhere in the $100K range up to a few million, for a small team that includes an explosive expert, an electronics expert, a physicist, and a computer geek, to get (again) to the 10+ KT range. Thermonuclear fusion and the 100+ KT to MT range are similarly straightforward. From what I recall, one can just monkey around with building bigger bombs surrounded by more fissile material and get close to the latter, adding fusile material such as tritium (expensive and dangerous) or deuterium (plentiful and harmless) and lithium into spaces between trigger and a U-238 casing. To get to MT, one has to build a proper dual implosion device so that the trigger causes both heating, compression, and neutrons to all happen at the same time to a significant volume of fusile material. The NWFAQ doesn't contain engineering specs (as it proudly and irrelevantly announces). So it is entirely possible that one could try for MT and only end up with a 100 KT or so. However, range of total destruction scales like the 1/3 power or thereabouts of the energy, so a 1 MT device only does roughly twice the damage of a 100 KT device anyway. Code and my laptop might up the odds of making a MT+ device work the first time, but... ...from the point of view of strategic war or tactical war these differences matter, I suppose. A 100 KT "fizzle" might let a hard shelter survive where a 1 MT non-fizzle would kill it. Getting too big or two small an explosion can either kill your own troops or not kill all of the enemy on an actual battlefield. Tactical devices like neutron bombs require significant engineering and experimentation to achieve and are not garage projects, I suspect -- get them wrong on the one side and they're thermonuclear devices that are far more powerful than you anticipate, get them wrong on the other and they don't make a significant flux of neutrons and the enemy soldiers overrun your position. However, from the point of view of terrorist bombs NOBODY CARES -- or should care -- a 1 KT "near fizzle" bomb is the moral equivalent of two million pounds of TNT, 100 panel trucks loaded full of TNT and set off all at once. Set off in the right place, it would do billions of dollars in damage and kill as many as hundreds of thousands of people, especially if it were surrounded by e.g. a ton or so of cobalt. I could do that with my shotgun. A plutonium bomb that COMPLETELY fizzled would fizzle by prematurely fission of imperfectly compressed ball of plutonium blowing apart before a large "enough" fraction of the total mass fissions, spreading highly radioactive plutonium and various radioactive fission byproducts over a few hundred acres of presumably expensive and densely populated real estate and STILL would likely achieve at least billions of dollars in damages and a death toll in the thousands to tens of thousands, the latter spread out over ten or twenty years and from horrible things like leukemia and other cancers. "Chernobyl" in the middle of the city of your choice. The only solution to the threat of nuclear terrorism is to change the conditions that give rise to terrorism in the first place. Poverty, ignorance, scriptural theistic religion (with its absolutes and myths that lead to irrational action), hopelessness, nationalism, overpopulation, scarcity, human greed. If we as a species fail to do this, then SOONER or LATER, somebody is going to end up with few chunks of sufficiently pure U-235, a garage, and a shotgun, or more likely will end up being provided a properly engineered and tested design plutonium bomb all "ready to use" and will then -- not terribly surprisingly -- use it on us. Bad as it is, this beats the hell out of the condition I grew up in -- living just outside of DC but well within the radius of TD expected for a 10 MT airburst over the Washington Monument, with MIRV'd ICBMs targeted on both sides and a single moment of insanity away from MAD -- but it isn't terribly desireable. No matter what the holding action, no matter what the defenses, it is just too easy. Put a shotgun-bomb into a freighter as machine parts or a nameless lump of concrete down in the bilge, sail it up the Saint Clair river, there goes Detroit, or maybe Chicago. SF, NY, Miami, New Orleans, Washington, Baltimore, Boston -- all vulnerable. Or unload it, put it on a truck, and anyplace is a target. The government could uncover and stop thirty such plans in a row, but the thirty-first loses a city. Guantanamo isn't the answer, because it doesn't address the right questions, eliminate the root cause, it is at best a finger in the dyke that creates still more leaks from its own nontrivial contribution to the sheer injustice of it all. Poverty, despair, scriptural religion, social injustice on a global scale ... nothing else will do but eliminating them, and it will take fifty years of CORRECTLY directed action to do much about them, all the while sticking fingers into the dyke whereever we discover a leak. In the meantime, well, one day we'll lose a city. Or worse. All it takes for evil to triumph is for humans of goodwill to do nothing, and we've been doing the WRONG thing (much of it being nothing) for far too long to escape unscathed. Depressingly and politically yours (and not ENTIRELY off topic -- I did talk for a BIT about computing at the very beginning:-), rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From rgb at phy.duke.edu Thu Jun 19 07:26:45 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <87iqw6i7ck.fsf@snark.cb.piermont.com> <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> Message-ID: On Thu, 19 Jun 2008, Vincent Diepeveen wrote: > USA and allowing encryption beyond the cracking capabilities of a 1st year > computer science student... ...hmmm Dear Vincent, IIRC almost any of the high-end encryption routines available within linux are effectively uncrackable, certainly uncrackable to somebody with less than NSA-class resources. Most of the routines scale -- if 1024 bit keys still leave you worried, use a larger one. Brute force searches of keyspace are easily driven past the capabilities of any computer; all that remains are solving certain "hard" problems in e.g. number theory implicit to the method. I haven't looked at the literature recently, but to the best of my knowledge e.g the integer factorization problem cannot be solved in polynomial time for any known algorithm, and factoring a single 663 bit integer in a test that took ballpark of a GFLOP-century of effort for the record as of 2005. NP means that a 2048 bit key would very likely require TFLOP-centuries of effort if not more -- maybe NSA trying really really hard could do it, maybe IBM using big blue could do it, maybe the top ten of the top-500 could do it -- in a year or ten of unbroken effort -- or maybe not. Moore's Law and the possible advent of quantum computing (where there is reportedly a P-time algorithm but no hardware to run it on) might change this, a truly significant advance in number theory or human cleverness might change it -- training a humongous neural network to perform 2048 bit factorizations well enough that its partial and horrendously parallel and nonlinear solutions serve to reduce the search space to where NP searches can find it in human-accessible time, for example. Barring that, you're pretty safe with any of the open source encryption methods with a large key. rgb > I remember how i did do big efforts to get vista ultimate bitlocker to work. > On paper it sounds ok. AES 256 bits CBC. AES is a "good" encryption IIRC. The weakness, of course, is guarding the key (as always). ssh is quite secure, but not if you have both of my public/private keys. Symmetric methods are quite secure as well, but not if someone takes your key. And on Vista, it doesn't matter whether or not the key is on a USB stick if somebody cracks the OS, which is the REAL weak link in the chain, and inserts code to copy your key the next time you insert the USB stick and then decrypt and transmit the entire contents of your encrypted drive. It takes NSA resources, perhaps, to find long keys or factor long keys or solve elliptical or discrete logarithm problems in number theory that have only NP algorithms that scale badly as the keysize gets large. Does anyone seriously think that it takes NSA resources to crack Vista itself? Especially in a consumer environment (that is, systems run by anyone less knowledgeable than a computer systems engineer or sysadmin)? rgb > > The idea is : your usb stick has the encryption key and only that thing has > it. > So no one can decrypt the partition without that usb stick. > > In case you forget to load that usb stick having the encryption key, > there is a general manner to unlock the machine by feeding it a long code. > > Looks great isn't it? > > So far the paper... > > But now the usual bug; the implementation that practical was allowed by one > those guys on the Perry-Sport mailing list. > Of course we don't want to tire ourselves too much typing too long unlock > codes... > > That unlock code, stored at a different USB stick is 48 digits. > > By the way 48 digits is how many bits? > > Right, that's less than 48 * (log 10/log 2) = 160 bits. > > So the problem for our first year student has been brought back from 256 to > 160 bits already. > > Not that it is a hobby mine, but one day i rebooted the machine and had > forgotten to put in the USB encryption stick. > > By accident i mistyped the key. Windows then told me: > "you made a mistake somewhere in those 5 digits of the 48 digit key, > please retype them". > > Doh. > > So my guess is that soon i do not need to worry when by accident i lose that > USB stick... > > Vincent > > > > On Jun 19, 2008, at 2:41 AM, Perry E. Metzger wrote: > >> >> Jim Lux writes: >>> In general, fundamental research is not subject to export controls, so >>> if you frame your problem in terms of abstract mathematical problems, >>> you're not going to be treading on any toes. However, start >>> distributing it as "Jim Lux's superduper encryptor/password cracker, >>> now with 1024 bit capability!" and it's moved from fundamental >>> research to a product. >> >> Under current rules, provided it is open source or generally available >> to any buyer, you can distribute and export cryptographic code quite >> freely. There are some minimal reporting requirements, but they're >> barely worth mentioning. >> >> That's the reason you can freely distribute things like Kerberos, >> OpenSSL, pgp/gpg, etc. >> >> >> Perry >> -- >> Perry E. Metzger perry@piermont.com > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From kilian at stanford.edu Thu Jun 19 09:45:21 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> Message-ID: <200806190945.21604.kilian@stanford.edu> On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: > Getting too big > or two small an explosion can either kill your own troops or not kill > all of the enemy on an actual battlefield. To add some more OT stuff to this thread, I don't think a nuclear weapon has ever been used (or even considered being used) to kill troops on a battlefield. Some cluster bombs (hey, back on topic! :)) are probably enough for this purpose. IMHO, a nuclear weapon is mainly a dissuasion weapon, ie, one you claim you own to make your ennemies think twice before they strike you. Or that you use against civilians to make your point louder, and let your ennemies understand they'd better surrender. That's why I find the association between "nuclear weapon" and "battlefield" a bit irrelevant. Other than that, pretty interesting stuff. I'm unfortunately supporting your conclusions. Cheers, -- Kilian From peter.st.john at gmail.com Thu Jun 19 09:52:26 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <1213859837.4784.9.camel@Vigor13> References: <485920D8.2030309@ias.edu> <20080618214603.GC31594@bx9.net> <200806181521.47115.kilian@stanford.edu> <48599AC9.3040100@berkeley.edu> <1213859837.4784.9.camel@Vigor13> Message-ID: I dug up this pdf from Nvidia: http://www.nvidia.com/docs/IO/43395/tesla_product_overview_dec.pdf Since I can't imagine coding a graphics card while it serves my X :-) I supposed one might put the PCIE card in a box with a cheap SVGA for the least-cost CUDA experiment (one GPU, 128 "thread processors" per GPU). The deskside is 2 GPU with 3 GB, the rackable is 4 GPU with 6 GB; they have PCIE adapter cards to talk to your workstation. I think one plan might be like the GraPE (Gravity Pipe); maybe one of the rackables alternating with a CPU MB acting as net host and fileserver, so each (2-board) node has 4 GPU for array computing. Peter P.S. incidentally, while I was browsing Nvidia, I spec'd out a fantasy gaming rig. $23K, 256GB of solid state "disk", 3x 1GB video cards, 180W just to liquid-cool the CPU :-) Maybe next year. On 6/19/08, John Hearns wrote: > > On Wed, 2008-06-18 at 16:31 -0700, Jon Forrest wrote: > > > Kilian CAVALOTTI wrote: > > > I'm glad you mentioned this. I've read through much of the information > > on their web site and I still don't understand the usage model for > > CUDA. By that I mean, on a desktop machine, are you supposed to have > > 2 graphics cards, 1 for running CUDA code and one for regular > > graphics? If you only need 1 card for both, how do you avoid the > > problem you mentioned, which was also mentioned in the documentation? > > Actually, I should imagine Kilian is referring to something else, > not the inbuilt timeout which is in the documentation. But I can't speak > for im. > > > > > > Or, if you have a compute node that will sit in a dark room, > > you aren't going to be running an X server at all, so you won't > > have to worry about anything hanging? > > > I don't work for Nvidia, so I can't say! > But the usage model is as you say - you can prototype applications which > will run for a short time on the desktop machine, but long runs are > meant to be done on a dedicated back-end machine. > If you want a totally desk-side solution, they sell a companion box > which goes alongside and attaches via a ribbon cable. I guess the art > here is finding a motherboard with the right number and type of > PCI-express slots to take both the companion box and a decent graphics > card for X use. > > > > > > > I'm planning on starting a pilot program to get the > > chemists in my department to use CUDA, but I'm waiting > > for V2 of the SDK to come out. > > > > > Why wait? The hardware will be the same, and you can dip your toe in the > water now. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080619/31c25251/attachment.html From kilian at stanford.edu Thu Jun 19 10:05:15 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <1213859837.4784.9.camel@Vigor13> References: <485920D8.2030309@ias.edu> <48599AC9.3040100@berkeley.edu> <1213859837.4784.9.camel@Vigor13> Message-ID: <200806191005.15794.kilian@stanford.edu> On Thursday 19 June 2008 12:17:07 am John Hearns wrote: > Actually, I should imagine Kilian is referring to something else, > not the inbuilt timeout which is in the documentation. But I can't > speak for im. I don't know about this timeout. As I said we didn't really had time nor the resources to investigate the crashes. But I've definitely seen a machine freeze when launching a binary containing CUDA code. > I guess the art > here is finding a motherboard with the right number and type of > PCI-express slots to take both the companion box and a decent > graphics card for X use. AFAIK, the multi GPU Tesla boxes contain up to 4 Tesla processors, but are hooked to the controlling server with only 1 PCIe link, right? Does this spell like "bottleneck" to anyone? Sure, moving data between host memory and GPU memory is not what you do the most often, but still. Cheers, -- Kilian From glen.beane at jax.org Thu Jun 19 10:11:18 2008 From: glen.beane at jax.org (Glen Beane) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806190945.21604.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> Message-ID: <485A9336.8010303@jax.org> Kilian CAVALOTTI wrote: > On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: >> Getting too big >> or two small an explosion can either kill your own troops or not kill >> all of the enemy on an actual battlefield. > > To add some more OT stuff to this thread, I don't think a nuclear weapon > has ever been used (or even considered being used) to kill troops on a > battlefield. look up "tactical nukes". These were the USA's only hope of defending Europe from a Soviet ground invasion. -- Glen L. Beane Software Engineer The Jackson Laboratory Phone (207) 288-6153 From landman at scalableinformatics.com Thu Jun 19 10:19:28 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806190945.21604.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> Message-ID: <485A9520.2080508@scalableinformatics.com> Kilian CAVALOTTI wrote: > On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: >> Getting too big >> or two small an explosion can either kill your own troops or not kill >> all of the enemy on an actual battlefield. > > To add some more OT stuff to this thread, I don't think a nuclear weapon > has ever been used (or even considered being used) to kill troops on a > battlefield. Some cluster bombs (hey, back on topic! :)) are probably > enough for this purpose. Tactical nukes (aimed at armies) were on the table for a few of the NATO scenarios involving responses to Soviet invasion of western Europe (based upon some of the historical reading, though I am not sure how serious they were). The western Europeans were understandably un-enthusiastic about such scenarios. > IMHO, a nuclear weapon is mainly a dissuasion weapon, ie, one you claim > you own to make your ennemies think twice before they strike you. Or > that you use against civilians to make your point louder, and let your > ennemies understand they'd better surrender. This makes a number of fundamental presumptions, which may not be true in all cases. First and foremost, it assumes that the potential recipient of the attack or response to attack is a rational actor. As seen on the world stage, this isn't always the case. A rational head of government will balance the lives lost in an attack against any possible gains. An irrational apocalyptic head of government will not likely make that calculation, but an alternative one in which they somehow come out a winner regardless of the events. Second, it assumes that the potential recipient of the attack or response to attack is state based. This has also been demonstrated to be problematic on today's world stage. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From peter.st.john at gmail.com Thu Jun 19 10:22:50 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806190945.21604.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> Message-ID: Actually, nuclear weapons have indeed been considered for killing troops on the battlefield. At one time, the possibility of the Soviet Union invading western Europe seemed not so remote. Here is a link (wiki) to basically a bazooka launched fission bomb: http://en.wikipedia.org/wiki/Davy_Crockett_%28nuclear_device%29 with pictures. ("recoiless gun" means a rocket fired from a tube). The idea would be to for a relatively small force to cope with a relatively large number of tanks. Peter On 6/19/08, Kilian CAVALOTTI wrote: > > On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: > > Getting too big > > or two small an explosion can either kill your own troops or not kill > > all of the enemy on an actual battlefield. > > > To add some more OT stuff to this thread, I don't think a nuclear weapon > has ever been used (or even considered being used) to kill troops on a > battlefield. Some cluster bombs (hey, back on topic! :)) are probably > enough for this purpose. > > IMHO, a nuclear weapon is mainly a dissuasion weapon, ie, one you claim > you own to make your ennemies think twice before they strike you. Or > that you use against civilians to make your point louder, and let your > ennemies understand they'd better surrender. > > That's why I find the association between "nuclear weapon" > and "battlefield" a bit irrelevant. > > Other than that, pretty interesting stuff. I'm unfortunately supporting > your conclusions. > > Cheers, > -- > > Kilian > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080619/54ddac50/attachment.html From perry at piermont.com Thu Jun 19 10:27:41 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: (Robert G. Brown's message of "Thu\, 19 Jun 2008 10\:26\:45 -0400 \(EDT\)") References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <87iqw6i7ck.fsf@snark.cb.piermont.com> <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> Message-ID: <87k5glfi7m.fsf@snark.cb.piermont.com> Again, this is not a cryptography list, but I'll correct a few small things... "Robert G. Brown" writes: > I haven't looked at the literature recently, but to the best of my > knowledge e.g the integer factorization problem cannot be solved in > polynomial time for any known algorithm, Correct. > and factoring a single 663 bit integer in a test that took > ballpark of a GFLOP-century of effort for the record as of 2005. I don't remember the record, but at this point it is considered (theoretically) feasible to attack 1024 bit RSA keys using GNFS and similar methods -- Dan Bernstein has published on this, you can doubtless find the paper on his site. Serious users are using 2048 bit RSA keys at this point. There aren't nearly such good methods for attacking elliptic curve based systems, and many people have migrated to those for performance reasons -- you can use shorter keys with (it is believed) equal security. > ssh is quite secure, but not if you have both of my public/private > keys. That depends on what you mean by "secure". There are two forms of security provided by SSH. One is protection from people trying to break in to your account, the other is protection from people reading your traffic over the network. I can log in using your credentials if I have your private key and you are using SSH with public key authentication. However, even if I have both of your private and public keys, the ephemeral key used for a particular session is agreed to using Diffie-Hellman key exchange, and mere knowledge of your long term keys will not allow anyone to read your session traffic. This property is known as "Perfect Forward Secrecy." (Technically, this is only true of sshv2 -- sshv1 used random nonces exchanged under RSA for the key material, but sshv1 is no longer in wide use because it has a number of security issues.) -- Perry E. Metzger perry@piermont.com From jmdavis1 at vcu.edu Thu Jun 19 10:39:41 2008 From: jmdavis1 at vcu.edu (Mike Davis) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806190945.21604.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> Message-ID: <485A99DD.1030707@vcu.edu> To continue to be OT. Look up the Davey Crockett. http://en.wikipedia.org/wiki/Davy_Crockett_(nuclear_device) It is a weapon meant exclusively for battlefield use. It is also almost a suicide weapon. Kilian CAVALOTTI wrote: > On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: > >> Getting too big >> or two small an explosion can either kill your own troops or not kill >> all of the enemy on an actual battlefield. >> > > To add some more OT stuff to this thread, I don't think a nuclear weapon > has ever been used (or even considered being used) to kill troops on a > battlefield. Some cluster bombs (hey, back on topic! :)) are probably > enough for this purpose. > > IMHO, a nuclear weapon is mainly a dissuasion weapon, ie, one you claim > you own to make your ennemies think twice before they strike you. Or > that you use against civilians to make your point louder, and let your > ennemies understand they'd better surrender. > > That's why I find the association between "nuclear weapon" > and "battlefield" a bit irrelevant. > > Other than that, pretty interesting stuff. I'm unfortunately supporting > your conclusions. > > Cheers, > From kilian at stanford.edu Thu Jun 19 10:50:50 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <485A9520.2080508@scalableinformatics.com> References: <485920D8.2030309@ias.edu> <200806190945.21604.kilian@stanford.edu> <485A9520.2080508@scalableinformatics.com> Message-ID: <200806191050.51115.kilian@stanford.edu> On Thursday 19 June 2008 10:19:28 am you wrote: > This makes a number of fundamental presumptions, which may not be > true in all cases. > > First and foremost, it assumes that the potential recipient of the > attack or response to attack is a rational actor. That's a very valid point. It's all been described in deterrence theories, and the paradgim shifted from the Cold War's "deterrence of the strong by the weak" to today's "deterrence of the crazy by the weak". > As seen on the > world stage, this isn't always the case. A rational head of > government will balance the lives lost in an attack against any > possible gains. An irrational apocalyptic head of government will > not likely make that calculation, but an alternative one in which > they somehow come out a winner regardless of the events. > > Second, it assumes that the potential recipient of the attack or > response to attack is state based. This has also been demonstrated > to be problematic on today's world stage. Agreed. Then how adequate is a tactical nuke against military troops in today's world stage? Cheers, -- Kilian From vanallsburg at hope.edu Thu Jun 19 10:52:50 2008 From: vanallsburg at hope.edu (Paul Van Allsburg) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <485A99DD.1030707@vcu.edu> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> <485A99DD.1030707@vcu.edu> Message-ID: <485A9CF2.1050408@hope.edu> Mike Davis wrote: > To continue to be OT. Look up the Davey Crockett. > > > http://en.wikipedia.org/wiki/Davy_Crockett_(nuclear_device) > > > > It is a weapon meant exclusively for battlefield use. It is also > almost a suicide weapon. > and a live video at http://www.sonicbomb.com/modules.php?name=Content&pa=showpage&pid=56 -- Paul Van Allsburg Computational Science & Modeling Facilitator Natural Sciences Division, Hope College 35 East 12th Street Holland, Michigan 49423 616-395-7292 http://www.hope.edu/academic/csm/ From kilian at stanford.edu Thu Jun 19 11:00:28 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <485A9336.8010303@jax.org> References: <485920D8.2030309@ias.edu> <200806190945.21604.kilian@stanford.edu> <485A9336.8010303@jax.org> Message-ID: <200806191100.29026.kilian@stanford.edu> On Thursday 19 June 2008 10:11:18 am you wrote: > > To add some more OT stuff to this thread, I don't think a nuclear > > weapon has ever been used (or even considered being used) to kill > > troops on a battlefield. > > look up "tactical nukes". These were the USA's only hope of > defending Europe from a Soviet ground invasion. Well, what would have been the effect of launching nuclear weapons to defend Europe in case of a Soviet invasion? They would have been either launched to where the Soviet troops actually were, ie, on Europe, with the main effect of wiping up the countries they were supposed to protect. Not so appealing. Or, and it's probably the most plausible scenario, they would have been aimed to USSR, and likely to major cities, where they would have killed mostly civilians, not troops. With the hope that the Soviet government would withdraw from Europe. That's why I think nuclear weapons are hardly a mean to kill military troops on a battlefield. I concede that tactical nukes are still weapons, and that the main purpose of a weapon is to hurt your ennemy. But not only: building and showing off bigger weapons can also be a way to frighten him, hoping that it will be enough to dissuade him to attack. Cheers, -- Kilian From James.P.Lux at jpl.nasa.gov Thu Jun 19 11:08:48 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> Message-ID: <6.2.5.6.2.20080619105301.02cc8fd8@jpl.nasa.gov> At 06:58 AM 6/19/2008, Robert G. Brown wrote: >On Wed, 18 Jun 2008, Jim Lux wrote: > >>So.. if your (foreign person) buddy is designing thermonuclear >>devices in their garage, and they complain about how slow it is to >>run the hydrocodes to simulate stuff, better not hand them that old >>copy of Sterling, et al., or even worse, give them rgb's website. >>(the latter would be too suspicious, since rgb *is* a physicist, >>doing monte carlo simulations no less, while Tom Sterling is *just >>a computer scientist*) > >Ah, taking my name in vain I see...;-) hardly in vain.. (hoping you actually get to read this, in some venue other than your detention hearing, ) >Remember, building nuclear bombs ranges from VERY easy to not terribly >difficult, until you want to build a BIG one, or a small one, or one >that has to be "precise" in its performance. Terrorists need none of >these things -- sloppy to the point of being a fizzle of sorts is still >more than good enough. Nicholas Freeling, "Gadget" is a thriller/detective story about someone (like rgb) being kidnapped and forced to design a smallish device for some nefarious sorts. A better story, in my opinion, than, say, Clancy's "sum of all fears". Building a plutonium bomb (given the plutonium) is considerably more >difficult and not a project for MY garage even with a lathe. Plutonium >is downright dangerous to handle, and the construction requires shaped >lensing charges, which in turn requires an ability to make precision >casts of at least two different explosives with differential burn rates >and to set them off with high speed triggers at exactly the same time. One can look at the dual-use list at ORNL and see what sort of precision is considered "dangerous" to bound your design effort. >However, "exactly the same time" very probably means something quite >different now from what it did in 1945. Again, my PC contains >nanosecond clocks; over the counter electronics can probably provide >enough switching speed and power to get within the range that will >suffice for an implosion device, especially one slightly overengineered >in other respects. Sure, the government controls known fast switches, >but I very much doubt that they control the knowledge of how to make >them, and I doubt that they are that hard to make. Simultaneous is easy.. equal length coax, for instance. It's the switches that are tough (and export controlled) A bit trickier than you might think, to get the jitter down in the subnanosecond range, and to have the high peak currents needed. And no, you can't use spare xenon flashtubes you've scavenged out of disposable cameras. If you're worried about small (always an issue when thinking about nanoseconds.. 30cm/nanosecond for free space propagation), there's also the energy storage to consider. To optimize the yield, too, the timing of your zipper relative to your compression is important. >Thermonuclear fusion and the 100+ KT to MT range are similarly >straightforward. From what I recall, one can just monkey around with >building bigger bombs surrounded by more fissile material and get close >to the latter, adding fusile material such as tritium (expensive and >dangerous) or deuterium (plentiful and harmless) and lithium into spaces >between trigger and a U-238 casing. Uh.. probably not.. That's the "alarm clock" or "layer cake" design and has some issues. I refer interested readers to Morland's 1977 article in "The Progressive" for more information. > To get to MT, one has to build a >proper dual implosion device so that the trigger causes both heating, >compression, and neutrons to all happen at the same time to a >significant volume of fusile material. The NWFAQ doesn't contain >engineering specs (as it proudly and irrelevantly announces). So it is >entirely possible that one could try for MT and only end up with a 100 >KT or so. However, range of total destruction scales like the 1/3 power >or thereabouts of the energy, so a 1 MT device only does roughly twice >the damage of a 100 KT device anyway. Code and my laptop might up the >odds of making a MT+ device work the first time, but... cube scaling is the rule in explosives (all the way down to sub kilo amounts), however, for very large devices, where the "point source" assumption is invalid (e.g. that 10km fireball), it does break down. There's also the issue of the propagation of a shock wave in air and air not being a linear medium. >...from the point of view of strategic war or tactical war these >differences matter, I suppose. A 100 KT "fizzle" might let a hard >shelter survive where a 1 MT non-fizzle would kill it. Getting too big >or two small an explosion can either kill your own troops or not kill >all of the enemy on an actual battlefield. Or, knocking the lid off a silo, which requires both precision targeting AND suitable yield AND something that fits in a package that can be appropriately delivered. >Tactical devices like >neutron bombs require significant engineering and experimentation to >achieve and are not garage projects, I suspect -- get them wrong on the >one side and they're thermonuclear devices that are far more powerful >than you anticipate, get them wrong on the other and they don't make a >significant flux of neutrons and the enemy soldiers overrun your >position. Hence the interest in things like the National Ignition Facility. >However, from the point of view of terrorist bombs NOBODY CARES -- or >should care -- a 1 KT "near fizzle" bomb is the moral equivalent of two >million pounds of TNT, 100 panel trucks loaded full of TNT and set off >all at once. Set off in the right place, it would do billions of >dollars in damage and kill as many as hundreds of thousands of people, >especially if it were surrounded by e.g. a ton or so of cobalt. The study done a couple years ago that postulated a "nominal yield"(e.g. 20kT) fission device in Los Angeles/Long Beach port showed that the majority of damage and death resulted from things like traffic jams and accidents in the crowds fleeing and overloading hospitals. The actual damage radius is fairly small (in the context of Los Angeles, which is >100km across) and the fallout (from a particularly dirty surface burst.. i.e. the shipping container on the dock sucking up the dirt) wasn't all that bad, in terms of dose. The panic, on the other hand, is lethal. > I could >do that with my shotgun. Hmm. Mcrit for good quality U235 is fairly high, especially unreflected. I don't know if your shotgun has sufficient oomph to assemble it quickly enough without predetonation or a fizzle. I've seen numbers for required assy speed in the 1000 m/sec sort of range (i.e. you've got to move from a noncrit to a crit configuration in the amount of time between spontaneous neutrons appearing to get things rolling). Accelerating 10kg, say, to 1000 m/sec, takes a heap o'joules >Bad as it is, this beats the hell out of the condition I grew up in -- >living just outside of DC but well within the radius of TD expected for >a 10 MT airburst over the Washington Monument, with MIRV'd ICBMs >targeted on both sides and a single moment of insanity away from MAD -- >but it isn't terribly desireable. No matter what the holding action, no >matter what the defenses, it is just too easy. Put a shotgun-bomb into >a freighter as machine parts or a nameless lump of concrete down in the >bilge, sail it up the Saint Clair river, there goes Detroit, or maybe >Chicago. SF, NY, Miami, New Orleans, Washington, Baltimore, Boston -- >all vulnerable. Or unload it, put it on a truck, and anyplace is a >target. why worry about ICBMs when DHL/FedEx will deliver it to your selected doorstep? From peter.st.john at gmail.com Thu Jun 19 11:09:09 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <485A99DD.1030707@vcu.edu> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> <485A99DD.1030707@vcu.edu> Message-ID: War is hell, but I wouldn't call it a "suicide device". The range of the rocket is on the order of 1000 meters, and the effective radius at the target is on the order of 100 m. You wouldn't want to shoot yourself in the foot with one of these, you'd take out your own batallion, but you are aiming at a valley or a hillside, not an oncoming tank. Peter On 6/19/08, Mike Davis wrote: > > To continue to be OT. Look up the Davey Crockett. > > > http://en.wikipedia.org/wiki/Davy_Crockett_(nuclear_device) > > > > It is a weapon meant exclusively for battlefield use. It is also almost a > suicide weapon. > > > > > > Kilian CAVALOTTI wrote: > >> On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: >> >> >>> Getting too big >>> or two small an explosion can either kill your own troops or not kill >>> all of the enemy on an actual battlefield. >>> >>> >> >> To add some more OT stuff to this thread, I don't think a nuclear weapon >> has ever been used (or even considered being used) to kill troops on a >> battlefield. Some cluster bombs (hey, back on topic! :)) are probably enough >> for this purpose. >> >> IMHO, a nuclear weapon is mainly a dissuasion weapon, ie, one you claim >> you own to make your ennemies think twice before they strike you. Or that >> you use against civilians to make your point louder, and let your ennemies >> understand they'd better surrender. >> >> That's why I find the association between "nuclear weapon" and >> "battlefield" a bit irrelevant. >> >> Other than that, pretty interesting stuff. I'm unfortunately supporting >> your conclusions. >> >> Cheers, >> >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080619/6e59caf4/attachment.html From glen.beane at jax.org Thu Jun 19 11:13:36 2008 From: glen.beane at jax.org (Glen Beane) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806191100.29026.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <200806190945.21604.kilian@stanford.edu> <485A9336.8010303@jax.org> <200806191100.29026.kilian@stanford.edu> Message-ID: <485AA1D0.1080209@jax.org> Kilian CAVALOTTI wrote: > On Thursday 19 June 2008 10:11:18 am you wrote: >>> To add some more OT stuff to this thread, I don't think a nuclear >>> weapon has ever been used (or even considered being used) to kill >>> troops on a battlefield. >> look up "tactical nukes". These were the USA's only hope of >> defending Europe from a Soviet ground invasion. > > Well, what would have been the effect of launching nuclear weapons to > defend Europe in case of a Soviet invasion? They would have been either > launched to where the Soviet troops actually were, ie, on Europe, with > the main effect of wiping up the countries they were supposed to > protect. Not so appealing. > Or, and it's probably the most plausible scenario, they would have been > aimed to USSR, and likely to major cities, where they would have killed > mostly civilians, not troops. With the hope that the Soviet government > would withdraw from Europe. This is terribly off topic, but you are thinking of strategic nuclear weapons. You couldn't aim a tactical nuke at a city in the USSR unless you were a mile or so from the city. These were small rocket or artillery fired warheads of less than 100 pounds. Tactical nukes are not large enough to destroy cities. The main effect would not be wiping out the countries they were meant to protect, the main effect would be to wipe out large tank formations and make small areas temporarily irradiated to block the movement of troops. > > That's why I think nuclear weapons are hardly a mean to kill military > troops on a battlefield. Strategic nukes, no. Tactical nukes, yes. -- Glen L. Beane Software Engineer The Jackson Laboratory Phone (207) 288-6153 From jmdavis1 at vcu.edu Thu Jun 19 11:25:05 2008 From: jmdavis1 at vcu.edu (Mike Davis) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> <485A99DD.1030707@vcu.edu> Message-ID: <485AA481.2030000@vcu.edu> According to one weapons designer the only safe way to use it was to fire from a hilltop into a valley from a jeep and then drive like hell into the next valley. If you are within 150m you will receive an instantly lethal dose. Within 400m you will receive up to 600 rem with is lethal. Even at to 1000m you would likely get enough radiation to interfere with your warfighting capacity. Peter St. John wrote: > War is hell, but I wouldn't call it a "suicide device". The range of > the rocket is on the order of 1000 meters, and the effective radius at > the target is on the order of 100 m. You wouldn't want to shoot > yourself in the foot with one of these, you'd take out your own > batallion, but you are aiming at a valley or a hillside, not an > oncoming tank. > Peter > > On 6/19/08, *Mike Davis* > > wrote: > > To continue to be OT. Look up the Davey Crockett. > > > http://en.wikipedia.org/wiki/Davy_Crockett_(nuclear_device) > > > > > It is a weapon meant exclusively for battlefield use. It is also > almost a suicide weapon. > > > > > > Kilian CAVALOTTI wrote: > > On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: > > > Getting too big > or two small an explosion can either kill your own troops > or not kill > all of the enemy on an actual battlefield. > > > > To add some more OT stuff to this thread, I don't think a > nuclear weapon has ever been used (or even considered being > used) to kill troops on a battlefield. Some cluster bombs > (hey, back on topic! :)) are probably enough for this purpose. > > IMHO, a nuclear weapon is mainly a dissuasion weapon, ie, one > you claim you own to make your ennemies think twice before > they strike you. Or that you use against civilians to make > your point louder, and let your ennemies understand they'd > better surrender. > > That's why I find the association between "nuclear weapon" and > "battlefield" a bit irrelevant. > > Other than that, pretty interesting stuff. I'm unfortunately > supporting your conclusions. > > Cheers, > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From bernard at vanhpc.org Thu Jun 19 11:28:08 2008 From: bernard at vanhpc.org (Bernard Li) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <20080619135020.GA8918@nlxdcldnl2.cl.intel.com> References: <20080618221153.GA24788@bx9.net> <20080619011419.GA7040@bx9.net> <20080619135020.GA8918@nlxdcldnl2.cl.intel.com> Message-ID: Hi David: On Thu, Jun 19, 2008 at 6:50 AM, Lombard, David N wrote: > Did you look for /proc/acpi/thermal_zone/*/temperature The glob is for > your BIOS-defined ID. If it does exist, that's the value that drives > /proc/acpi/thermal_zone/*/trip_points > > See also /proc/acpi/thermal_zone/*/polling_frequency I have always wondered about /proc/acpi/thermal_zone. I noticed that on some servers, the files exist, but on others, that directory is empty. I guess this is dependent on whether the BIOS exposes the information to the kernel? Or are there modules that I need to install to get it working? Thanks, Bernard From jmdavis1 at vcu.edu Thu Jun 19 11:29:40 2008 From: jmdavis1 at vcu.edu (Mike Davis) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806191100.29026.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <200806190945.21604.kilian@stanford.edu> <485A9336.8010303@jax.org> <200806191100.29026.kilian@stanford.edu> Message-ID: <485AA594.1060601@vcu.edu> The plan as per my limited understanding was to engage at Fulda Gap, slow the advance and then hit them with tactical nukes in E. Germany and further behind the lines. the Davey Crockett could be used in multiples in such a situation (and in combination with helicopter and ground anti-tank units) to stall the advance. The tactical and theatre weapons were to be used on the battlefield, and concentrations of troops behind the lines. Mike Davis Kilian CAVALOTTI wrote: > On Thursday 19 June 2008 10:11:18 am you wrote: > >>> To add some more OT stuff to this thread, I don't think a nuclear >>> weapon has ever been used (or even considered being used) to kill >>> troops on a battlefield. >>> >> look up "tactical nukes". These were the USA's only hope of >> defending Europe from a Soviet ground invasion. >> > > Well, what would have been the effect of launching nuclear weapons to > defend Europe in case of a Soviet invasion? They would have been either > launched to where the Soviet troops actually were, ie, on Europe, with > the main effect of wiping up the countries they were supposed to > protect. Not so appealing. > > Or, and it's probably the most plausible scenario, they would have been > aimed to USSR, and likely to major cities, where they would have killed > mostly civilians, not troops. With the hope that the Soviet government > would withdraw from Europe. > > That's why I think nuclear weapons are hardly a mean to kill military > troops on a battlefield. I concede that tactical nukes are still > weapons, and that the main purpose of a weapon is to hurt your ennemy. > But not only: building and showing off bigger weapons can also be a way > to frighten him, hoping that it will be enough to dissuade him to > attack. > > Cheers, > From kilian at stanford.edu Thu Jun 19 11:31:13 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <485AA1D0.1080209@jax.org> References: <485920D8.2030309@ias.edu> <200806191100.29026.kilian@stanford.edu> <485AA1D0.1080209@jax.org> Message-ID: <200806191131.13502.kilian@stanford.edu> On Thursday 19 June 2008 11:13:36 am you wrote: > Strategic nukes, no. Tactical nukes, yes. That's right, I think I've been confused over the terms. Cheers, -- Kilian From kus at free.net Thu Jun 19 11:50:25 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: Message-ID: In message from "Bernard Li" (Thu, 19 Jun 2008 11:28:08 -0700): >Hi David: > >On Thu, Jun 19, 2008 at 6:50 AM, Lombard, David N > wrote: > >> Did you look for /proc/acpi/thermal_zone/*/temperature The glob is >>for >> your BIOS-defined ID. If it does exist, that's the value that >>drives >> /proc/acpi/thermal_zone/*/trip_points >> >> See also /proc/acpi/thermal_zone/*/polling_frequency > >I have always wondered about /proc/acpi/thermal_zone. I noticed that >on some servers, the files exist, but on others, that directory is >empty. I guess this is dependent on whether the BIOS exposes the >information to the kernel? Or are there modules that I need to >install to get it working? AFAIK it depends from BIOS. On my Tyan S2932 w/last BIOS version this directory is empty. Mikhail > >Thanks, > >Bernard >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pbm.com Thu Jun 19 11:57:39 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <48599BA3.1050105@cse.ucdavis.edu> References: <485920D8.2030309@ias.edu> <20080618214603.GC31594@bx9.net> <48599BA3.1050105@cse.ucdavis.edu> Message-ID: <20080619185737.GB4491@bx9.net> On Wed, Jun 18, 2008 at 04:34:59PM -0700, Bill Broadley wrote: > Cuda seems to take a different approach, instead of trying to > auto-parallelize > a loop, it requires a function pointer to the code, and the function must > declare it's exit condition. In the end, CUDA doesn't end up being that much weirder than array processors. Which were considered to be significantly harder to program than, say, vector processors. Your post talked about how difficult it was to write code that worked. Well, the harder part is not getting some simple code written, it's getting your actual algorithm into that straight-jacket, and getting it to run fast. If I had more free time, I'd experiment with getting my pretty simple loopy hydrocode working. I suspect it would be pretty tedious, since the 1D operator at the heart of the algorithm is 4,000 lines long. -- greg From lindahl at pbm.com Thu Jun 19 12:08:02 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: References: Message-ID: <20080619190801.GD4491@bx9.net> On Thu, Jun 19, 2008 at 10:50:25PM +0400, Mikhail Kuzminsky wrote: > AFAIK it depends from BIOS. On my Tyan S2932 w/last BIOS version this > directory is empty. On the Opteron box that I got lm_sensors working on, the k8temp module was required for some of the temperatures. But that directory is empty. "hit or miss" comes to mind... -- greg From peter.st.john at gmail.com Thu Jun 19 12:14:51 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> <485A99DD.1030707@vcu.edu> Message-ID: It's off-topic, certainly. The wiki entry indicates ranges from 2 to 4 thousand meters (depending on model) and effective radius in hundreds of meters. Certainly dangerous for everyone, but so is a howitzer battery barrage. By "order" I meant nearest order of magnitude (factor of ten). The range of the weapon is around ten times as great as the effective killing radius. Peter On 6/19/08, R P Herrold wrote: > > On Thu, 19 Jun 2008, Peter St. John wrote: > > War is hell, but I wouldn't call it a "suicide device". The range of the >> rocket is on the order of 1000 meters, and the effective radius at the >> target is on the order of 100 m. >> > > the video posted indicated 4 kliks range, and a greater lethal radius ... > > but how is this on topic here? > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080619/034a176a/attachment.html From rgb at phy.duke.edu Thu Jun 19 10:14:09 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806190945.21604.kilian@stanford.edu> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> Message-ID: On Thu, 19 Jun 2008, Kilian CAVALOTTI wrote: > On Thursday 19 June 2008 06:58:44 am Robert G. Brown wrote: >> Getting too big >> or two small an explosion can either kill your own troops or not kill >> all of the enemy on an actual battlefield. > > To add some more OT stuff to this thread, I don't think a nuclear weapon > has ever been used (or even considered being used) to kill troops on a > battlefield. Some cluster bombs (hey, back on topic! :)) are probably > enough for this purpose. Never used, but then, only two have been used on humans at all, both strategic. This is a GOOD thing...;-) However, since the early fifties or sixties, MOST of the nuclear weapons on the planet were, and probably remain, "tactical" nukes that are indeed intended (at least nominally and in terms of their delivery systems) for use on battlefields or to achieve tactical advantage in hypothesized less-than-total wars, not strategic dissuasion from those wars happening. NATO and the Warsaw pact were basically a row of armies supported by tactical nukes on both sides for most of my lifetime up to the fall of the Soviet Union (which wasn't all that long ago). You might think your tanks were better, but you knew that the other side would nuke your tanks if they ever broke through the line of defense-in-depth opposing them EVEN IF they shrank from a total MAD strategic exchange. Enhanced radiation weapons (neutron bombs) were designed almost totally as tank (or small city, small fortification) killers -- you detonate one over a concentration of armor, the neutrons go right through the tank and both bodies in the tank (killing the latter by paralyzing them in a matter of minutes) and an hour later your tanks roar past all the more or less undamaged armor (seizing it for your own use, dumping the bodies, dead yet or not, as desired) with little danger from residual radiation. Or ditto with city-killing -- kill the people, preserve much of the infrastructure. > IMHO, a nuclear weapon is mainly a dissuasion weapon, ie, one you claim > you own to make your ennemies think twice before they strike you. Or > that you use against civilians to make your point louder, and let your > ennemies understand they'd better surrender. > > That's why I find the association between "nuclear weapon" > and "battlefield" a bit irrelevant. I'm merely reciting the doctrine that was openly avowed government policy for most of my life. Strategic nuclear arms were and are primarily connected to long range delivery systems. At one point they were megaton bombs on B-52s run by SAC (strategic air command), at one point ICBMs in silos with multiple warheads scattered all over the midwest, at one point submarine launched ballistic missiles, at one point air or sub or ship launched cruise missiles (that could swing either tactical or strategic where ballistic missiles were pretty much strategic). Mutual Assured Destruction of both military and non-military targets a long ways away on both sides. HOWEVER, at the same time MAD was in place, the east and west fought a number of wars and went through a number of crises associated with those wars. Korea, Viet Nam, Warsaw pact rebellions in Poland and Czeckoslovakia, the Cuba Missile Crisis. And the US and USSR were faced off in Europe that entire time. There was a very strong feeling that the USSR would invade Europe and try to conquer it the first moment they thought they could get away with it. There was a very strong feeling that either side, if they got to the point where they had an overwhelming strategic advantage in nukes, would be tempted to launch a first strike and incapacitate the strategic deterrence of other side, then achieve tactical victory at their leisure -- the tactical nukes made the prospect of success in the latter negligible in any war that would leave something worth invading unbombed in the strategic war. I used to study all this, seriously. I was obsessed with nuclear war from roughly when I was nine or ten until, well, I still am. I understood fission and knew how to build a Uranium bomb by the time I was ten or eleven, and knew "how" to build a plutonium bomb at about the same, although I didn't learn all the relevant details of the latter well enough to understand until I read through the NWFAQ. I recall writing a paper in fifth grade on how to build bombs, radius of total destruction, and so on. I visited Hiroshima when I was ten and went through the atom bomb museum there with the negative images of human fingers scorched onto farm tools and the tattered observatory. So I'd have to disagree. IIRC, most of the remaining nuclear bombs on the planet are TACTICAL, not strategic, or at least are mixed use -- aim at army and they are tactical, aim at a city and they are strategic, with very few "hot mounted" on strategic (long range) delivery systems. The biggest ones have been dismantled, as have many of the means of immediate long-range delivery that were the basis of the MAD strategy. I'm guessing that a large fraction of the 10,000 or so left in the US and Russia are cruise missile mounted weapons in the 10-50 KT range (where most cruise missiles have a range of less than 1000 miles, making them unsuitable for use as truly strategic weapons, although of course subs or stealth aircraft can make up the difference). The thousand or so in the possession of the rest of the nuclear club are similarly very likely to be MOSTLY tactically mounted, although most of these countries could probably remount in strategic delivery vehicles quickly enough if tensions were to mount for any reason. This is more than sUFFICIENT to guarantee that no land, sea or air based assault on the major powers or nuclear club nations (even the smaller ones) can succeed even if they DON'T resort to strategic punishment of the attacker (with serious reasons not to do the latter even if attacked). Nuclear subs and stealth bombers still give us more than ample strategic OR tactical delivery capabilities, and I'm sure Russia and China and England and France have very similar capabilities on a decreasing scale. Then there are the mini-nuclear-powers in opposition. Israel has tactical scale nukes that are ready for either strategic or tactical use. Tactical scale because fallout doesn't remain at home and the middle east is tiny -- nuke syria or iraq or iran the wrong time of year and you'll shower your own farmland with fallout if you're not careful and maybe even if you are. One ten megaton bomb on Jerusalem and there wouldn't be much Israel left -- or Jordan, or Giza, and depending on the wind I wouldn't want to be living in Egypt or Syria or Iraq. No army can invade Israel and succeed, and everybody knows it. At the MOMENT Israel can invade any of its neighbors with conventional arms and PROBABLY dominate them with relative impunity, but if/when Iran gets two or three nukes, that won't be true. Pakistan and India are a classical case. India has had (probably small) nukes for a rather long time, giving them again both a strategic and a tactical advantage over Pakistan in their eternal war (it was going on when I lived in India in the 60's and continues today). Pakistan had basically no chance of tactical victory in Kashmir or elsewhere as long as India could tactically nuke an armored invasion through the mountains, which would be slow and vulnerable for days. Pakistan has its own small nukes, and while they probably lack reliable ranged strategic delivery systems and are still heavily outnumbered and outgunned in any likely tactical conflict, it changes the balance of power considerably, in part because there are always extremists (Islamic and otherwise) in the Pakistani military who are just crazy enough to engage in first use under non-emergent circumstances. India would then proceed to crush Pakistan both strategically and tactically (so far cooler heads in Pakistan and India recognize this and if anything the new balance has them moving closer to peace, at least THIS year:-) but it is hard to envision any sort of strategic use of nukes in this theater that didn't kill millions of people. North Korea is yet another theater of interest as it "probably" has nukes. South can't invade the North with or without the US's help because China unfortunately endorses the North, giving them a strategic deterrent. Tactically, at this point an invasion of the North might be over in a matter of days even without nukes -- our weapons systems have improved that much -- just as they would have been over in Viet Nam in a matter of months if it weren't for the fact that we "couldn't" invade the north while the north could invade the south. So far, NK has no nukes of its own. If it did, well, once again you have a batshit crazy government with nuclear arms, and NK is already working hard on strategic delivery systems, not just tactical (the difference is primarily range -- tactical delivery is order of 10 to 100 miles, targeting e.g. concentrations of military forces of immediate interest, which may or may not be cities but probably not as they would be used to eliminate threats, not make points or punish, at least as long as there were immediate threats to be faced). If NK can deliver a nuke to Tokyo, though, they can say "do as we say or we'll shoot this dog" and their very insanity becomes a bluff nobody would be eager to call. It isn't clear how MANY they have -- it might be none -- but even ten small nukes become a formidable tactical weapon in a country that geographically concentrates any invading force, and with long range delivery they can make the cost of invasion (human and otherwise) very high indeed. We could, of course, nuke them into oblivion in a five minute long time on target cruise-missile based attack and almost certainly eliminate their nuclear arms and delivery systems in the process (and this is a not unlikely response to any first use or use at all on their part) but with China backing them and with world opinion something that MATTERS to us, it is almost certain we would do this short of nuclear provocation and MAYBE not even then. If China repudiated them on a first use (they have far too few weapons to be a credible strategic OR tactical threat) we could probably eliminate them on a tactical battlefield in short order IF they squandered their tactical nukes on strategic targets. To conclude todays briefing, the level of strategic nuclear risk globally continues to be quite low (as it has been for maybe 15 years now). No major conflicts appear immanent between the major powers, although Russia has been doing a bit of sabre rattling as they dream of their former "glory" as an empire and live through the slow process of building a truly stable and robust modern society. The level of nuclear risk in the Koreas is high, with a small risk of strategic escalation to engagement between the US and China (who both have STRONG reasons to avoid it). The risk of nuclear conflict between India and Pakistan is momentarily low, but chaos in the government of Pakistan continues and I am certain that there are extremist factions in the military and society in general that could at any time seize de facto control of their small nuclear arsenal or launch a destabilizing military action in Kashmir or back an assasination of Indian government officials. IMO a war there would be won by India in a matter of days, probably won by the systematic elimination of Pakistan's cities and military forces (India has over 100 nukes and both strategic and tactical delivery systems that are known to work). It would be at a huge cost -- I wouldn't be surprised at millions to tens of millions of deaths, the latter if Pakistan managed to reach any large Indian city e.g. Mumbai or New Delhi. India might forbear to use their weapons against the most densely populated cities in Pakistan if their major cities were not reached, but even limited use against "mostly tactical" targets would kill millions in that densely populated part of the world. Similarly, at the moment Irael's 100+ nukes are unopposed, making it simply incredible that they are at any sort of serious tactical risk. Nobody can invade Israel and everybody knows it. If it ever looked like they were actually succeeding, the invasion would end, permanently, a matter of hours later and it would end in such a way that civilization would have to re-emerge from the ashes of their former enemies lands before they ever again became a credible threat. This makes it actually quite stable in the region -- much posturing and a high risk of conventional miniwars, but otherwise little risk of Syria, Egypt, Iraq, Jordan, Lebanon actually mounting an attack. I can't quite predict what Iran's immanent nuclear arsenal will do in the region. On the one hand, there are plenty of wackos (as there are in Pakistan). On the other hand, there are plenty of well-educated and sane people who have absolutely no interest in launching a nuclear war over Iraq and Jordan into Israel in the certain knowledge that it would mean the end of Iran when Israel retaliated. I >>think<< that it would primarily give Iran a strategic advantage in its dealings with the surrounding ISLAMIC countries, a strong tactical advantage in the event of anything like a US-led invasion (where they could nuke our invading forces, but where politically we couldn't nuke them back). One possibility is that in five years it could actually have a positive influence in the region as a new equilibrium is reached. Another is that nukes are transferred to "terrorists". Terrorists and extremists are the major nuclear risk. In the modern world (as opposed to those whose heads are somehow stuck in the 70's or 80's) control of nuclear arms means nothing more nor less than controlling access to fissile bomb-grade material. Any country that has research-grade physics taught and performed in real universities, that has an engineering school, that has even modest manufacturing, chemical, and electronic manufacturing capabilities can build a nuclear bomb given bomb grade material Highly advanced countries like the US extend this bomb-building capability to sufficiently motivated private citizens. Furthermore, sorry, it isn't that difficult to accumulate bomb-grade fissile material or enriched Uranium capable of being cooked into Plutonium if one can get Uranium at all. Israel (and the US, Russia, and I expect China) has managed it a clever way I will not discuss that costs far less than 1940's technology with its huge centrifuges etc. I SUSPECT that they have worked out the means of producing pure Pu 239 (as opposed to 20% Pu 240 reactor grade) Plutonium. If this is true, then a shotgun design becomes possible for Uranium and building a bomb is trivial once again. Countries with Uranium deposits are all potential members of the nuclear club. I estimate the probability density of nuclear terrorism to be order of a few percent per year, making it nearly certain that at least one nuclear device will be used as a terrorist act over the next century UNLESS the conditions that could lead to it are changed. Fortunately, there isn't that much Uranium on the planet, and we're burning it at a furious rate. We could conceivably deplete the currently known reserves in a matter of decades; it is NOT a long term solution to the problem of fossil fuel use. rgb > > Other than that, pretty interesting stuff. I'm unfortunately supporting > your conclusions. > > Cheers, > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From peter.st.john at gmail.com Thu Jun 19 12:14:51 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> <485A99DD.1030707@vcu.edu> Message-ID: It's off-topic, certainly. The wiki entry indicates ranges from 2 to 4 thousand meters (depending on model) and effective radius in hundreds of meters. Certainly dangerous for everyone, but so is a howitzer battery barrage. By "order" I meant nearest order of magnitude (factor of ten). The range of the weapon is around ten times as great as the effective killing radius. Peter On 6/19/08, R P Herrold wrote: > > On Thu, 19 Jun 2008, Peter St. John wrote: > > War is hell, but I wouldn't call it a "suicide device". The range of the >> rocket is on the order of 1000 meters, and the effective radius at the >> target is on the order of 100 m. >> > > the video posted indicated 4 kliks range, and a greater lethal radius ... > > but how is this on topic here? > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080619/034a176a/attachment-0001.html From rgb at phy.duke.edu Thu Jun 19 10:26:49 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <87k5glfi7m.fsf@snark.cb.piermont.com> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <87iqw6i7ck.fsf@snark.cb.piermont.com> <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> <87k5glfi7m.fsf@snark.cb.piermont.com> Message-ID: On Thu, 19 Jun 2008, Perry E. Metzger wrote: > I can log in using your credentials if I have your private key and you > are using SSH with public key authentication. However, even if I have > both of your private and public keys, the ephemeral key used for a > particular session is agreed to using Diffie-Hellman key exchange, and > mere knowledge of your long term keys will not allow anyone to read > your session traffic. This property is known as "Perfect Forward > Secrecy." (Technically, this is only true of sshv2 -- sshv1 used > random nonces exchanged under RSA for the key material, but sshv1 > is no longer in wide use because it has a number of security issues.) They do enable man in the middle attacks, however, so that while your connection cannot be snooped "passively", somebody in the middle (say, in possession of any intermediary router) can pretend to be both sides by establishing simulations of the connections requested and forwarding the traffic. Similarly, if somebody has both my public and private keys they very likely can get into my system and insert trojans into it and directly snoop everything I do, access kmem, own my view of the universe through completely bugged network and peripheral eyes. Encryption is never any better than the physical and network and systems security on which it is implemented, as it is a weak-link problem. But otherwise sure. Similar things for WPA vs WEP as I recall -- WEP doesn't change the ephemeral keys. But encryption is more a hobby associated with my interest in information theory and random numbers than a speciality. I didn't realize that they'd made 1024 bit keys vulnerable at this point. I'm guessing that "vulnerable" still means "vulnerable to people with obscene amounts of free computer time and not enough to do" as opposed to "vulnerable" as in airsnort makes WEP vulnerable to pimply faced kids with old laptops, but still, worth knowing, thanks! rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From landman at scalableinformatics.com Thu Jun 19 12:39:43 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] found this amusing Message-ID: <485AB5FF.70705@scalableinformatics.com> Not wholly OT ... Running some benchmarks for a customer, and doing a little baseline performance gathering. This is Windows 2008 RC2 32 bit on one of our JackRabbit units. Well, it appears that there is either a bug in the performance counter, or the JackRabbit is simply too fast ... Have a look at http://scalability.org/images/Screenshot-small-1.png . The other measures seem to be working properly. Average is ~1.35 GB (writes) but it bounces around a bit. Looks like it overflows at 2^31-1 or so ... So we hit these bursty regions we get into that ... negative ... performance :) Either that or we are doing something horribly wrong (won't discount that either). At least it is amusing. Back to your regularly scheduled cluster. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From perry at piermont.com Thu Jun 19 12:43:19 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: (Robert G. Brown's message of "Thu\, 19 Jun 2008 13\:26\:49 -0400 \(EDT\)") References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <87iqw6i7ck.fsf@snark.cb.piermont.com> <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> <87k5glfi7m.fsf@snark.cb.piermont.com> Message-ID: <87skv9dxd4.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: >> I can log in using your credentials if I have your private key and you >> are using SSH with public key authentication. However, even if I have >> both of your private and public keys, the ephemeral key used for a >> particular session is agreed to using Diffie-Hellman key exchange, and >> mere knowledge of your long term keys will not allow anyone to read >> your session traffic. This property is known as "Perfect Forward >> Secrecy." (Technically, this is only true of sshv2 -- sshv1 used >> random nonces exchanged under RSA for the key material, but sshv1 >> is no longer in wide use because it has a number of security issues.) > > They do enable man in the middle attacks, however, so that while your > connection cannot be snooped "passively", somebody in the middle (say, > in possession of any intermediary router) can pretend to be both sides > by establishing simulations of the connections requested and forwarding > the traffic. Not quite. If they only have your key, but not the remote host's key, they can pretend to be you to the remote host, but they can't pretend to be the remote host to you. (Similarly if they've stolen the host's key but not yours.) > Similarly, if somebody has both my public and private keys they very > likely can get into my system I said that. See "I can log in using your credentials", above. > But otherwise sure. Similar things for WPA vs WEP as I recall -- WEP > doesn't change the ephemeral keys. The latter is true (in that WEP has no such mechanisms at all), but WPA in pre shared key mode doesn't change the base keys either, and the TKIP keys are derived from it, so there is no perfect forward secrecy. In "enterprise" mode, a shared key is created for each user, but I'm not sure that all modes of the protocol end up providing perfect forward secrecy. > I didn't realize that they'd made 1024 bit keys vulnerable at this > point. I'm guessing that "vulnerable" still means "vulnerable to > people with obscene amounts of free computer time and not enough to > do" I think the NSA types think of this sort of activity as quite justified, as do their equivalents in other nations and in the, er, "private sector". In particular, if you spend any of your time working for banks, the issue is not ignorable because if money is at stake, people will spend money to get at it. If you are mostly concerned about your family members reading your email, the situation is quite different. That said, typing "2048" instead of "1024" in to PGP or OpenSSL is no more expensive, so there is no point in using shorter keys even if you have little to worry about. > as opposed to "vulnerable" as in airsnort makes WEP vulnerable to > pimply faced kids with old laptops, but still, worth knowing, > thanks! Perry -- Perry E. Metzger perry@piermont.com From rgb at phy.duke.edu Thu Jun 19 10:51:15 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <6.2.5.6.2.20080619105301.02cc8fd8@jpl.nasa.gov> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <6.2.5.6.2.20080619105301.02cc8fd8@jpl.nasa.gov> Message-ID: On Thu, 19 Jun 2008, Jim Lux wrote: > Uh.. probably not.. That's the "alarm clock" or "layer cake" design and has > some issues. > I refer interested readers to Morland's 1977 article in "The Progressive" for > more information. Fusion boosting of fission, OTOH, is relatively easy to accomplish and a common practice, and recall that the yield range I was speaking of was LESS than a MT, which is the upper bound of layer cakes in any event. But my point was that throwing in a bunch of deuterium and/or tritium is almost certainly going to increase your yield, moreso for a "bad design" (near fizzle) than for a really good implosion device. The extra neutrons basically fission a lot of stuff that would otherwise be blown apart without fissioning, and you do get SOME increased yield from fusion, although it may well be small unless you do the "good" designs. For strategic weapons, this matters. For tactical weapons, it totally matters. For terrorist-class weapons or homemade weapons, getting a "good fusion design" doesn't matter, because boosting a KT to 5 KT or 10 KT might as well be infinity -- a blast capable of causing kilodeaths and gigabuck damages is more important than details of the yield. > The study done a couple years ago that postulated a "nominal yield"(e.g. > 20kT) fission device in Los Angeles/Long Beach port showed that the majority > of damage and death resulted from things like traffic jams and accidents in > the crowds fleeing and overloading hospitals. The actual damage radius is > fairly small (in the context of Los Angeles, which is >100km across) and the > fallout (from a particularly dirty surface burst.. i.e. the shipping > container on the dock sucking up the dirt) wasn't all that bad, in terms of > dose. The panic, on the other hand, is lethal. Sure, maybe. OTOH Hiroshima with a 20KT device killed order of 10^5 people. 20KT in a dense population of any sort surrounded by wood frame dwellings that can play in a fire storm might not make a megadeath, but it would dwarf 9/11. > Hmm. Mcrit for good quality U235 is fairly high, especially unreflected. I > don't know if your shotgun has sufficient oomph to assemble it quickly enough > without predetonation or a fizzle. I've seen numbers for required assy speed > in the 1000 m/sec sort of range (i.e. you've got to move from a noncrit to a > crit configuration in the amount of time between spontaneous neutrons > appearing to get things rolling). Accelerating 10kg, say, to 1000 m/sec, > takes a heap o'joules I can only quote Oppenheimer, that if you drop a subcrit piece of U235 off of a table to assemble with one on the floor you are likely to get a creditable nuclear blast. I don't know what he meant by 'creditable', but I'm assuming KT range, about the same as the davy crockett (nice movie:-), and that even if he was exaggerating about the table, a shotgun is better than a table;-) But sure, real explosives would be better, a real gun barrel would be useful, and both are readily available. I would argue that a U235 device is within "garage" assembly either way. "Pure" Pu-239 gun designs are likely to be a lot trickier, but still fairly accessible. It's getting rid fo the Pu-240 that is the problem -- it puts you right back in the isotope separation game only worse, as the mass difference is even smaller. > why worry about ICBMs when DHL/FedEx will deliver it to your selected > doorstep? Well, even a small bomb would be pretty heavy. Kind of at the boundary of what FedEx will delivery;-) rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From perry at piermont.com Thu Jun 19 12:53:37 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <6.2.5.6.2.20080619105301.02cc8fd8@jpl.nasa.gov> (Jim Lux's message of "Thu\, 19 Jun 2008 11\:08\:48 -0700") References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <6.2.5.6.2.20080619105301.02cc8fd8@jpl.nasa.gov> Message-ID: <87mylhdwvy.fsf@snark.cb.piermont.com> Jim Lux writes: >>Thermonuclear fusion and the 100+ KT to MT range are similarly >>straightforward. From what I recall, one can just monkey around with >>building bigger bombs surrounded by more fissile material and get close >>to the latter, adding fusile material such as tritium (expensive and >>dangerous) or deuterium (plentiful and harmless) and lithium into spaces >>between trigger and a U-238 casing. > > Uh.. probably not.. That's the "alarm clock" or "layer cake" design > and has some issues. I refer interested readers to Morland's 1977 > article in "The Progressive" for more information. "Dark Sun" by Richard Rhodes is probably better (and more entertaining). It describes the Teller-Ulam mechanism in some detail. I will point out that, if the description of the initial Ivy Mike test is to be taken as an indication, it actually requires quite a bit of effort to put together such a thing -- one need worry about amateurs building fission devices, but probably no fusion... Perry -- Perry E. Metzger perry@piermont.com From perry at piermont.com Thu Jun 19 12:55:48 2008 From: perry at piermont.com (Perry E. Metzger) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: <20080619190801.GD4491@bx9.net> (Greg Lindahl's message of "Thu\, 19 Jun 2008 12\:08\:02 -0700") References: <20080619190801.GD4491@bx9.net> Message-ID: <87iqw5dwsb.fsf@snark.cb.piermont.com> Greg Lindahl writes: > On Thu, Jun 19, 2008 at 10:50:25PM +0400, Mikhail Kuzminsky wrote: > >> AFAIK it depends from BIOS. On my Tyan S2932 w/last BIOS version this >> directory is empty. > > On the Opteron box that I got lm_sensors working on, the k8temp module > was required for some of the temperatures. But that directory is > empty. "hit or miss" comes to mind... And what does this have to do with nuclear weapons? You would think this was an HPC mailing list or something... Perry From rgb at phy.duke.edu Thu Jun 19 10:59:06 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <87mylhdwvy.fsf@snark.cb.piermont.com> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <6.2.5.6.2.20080619105301.02cc8fd8@jpl.nasa.gov> <87mylhdwvy.fsf@snark.cb.piermont.com> Message-ID: On Thu, 19 Jun 2008, Perry E. Metzger wrote: > "Dark Sun" by Richard Rhodes is probably better (and more > entertaining). It describes the Teller-Ulam mechanism in some > detail. I will point out that, if the description of the initial Ivy > Mike test is to be taken as an indication, it actually requires quite > a bit of effort to put together such a thing -- one need worry about > amateurs building fission devices, but probably no fusion... Not "good" as in high yield fusion, agreed. I was really just pointing out that fusion or neutron enhancement in the less than MT range is still quite accessible. rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From jcownie at cantab.net Thu Jun 19 13:02:37 2008 From: jcownie at cantab.net (James Cownie) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <87skv9dxd4.fsf@snark.cb.piermont.com> References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <87iqw6i7ck.fsf@snark.cb.piermont.com> <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> <87k5glfi7m.fsf@snark.cb.piermont.com> <87skv9dxd4.fsf@snark.cb.piermont.com> Message-ID: Returning the thread slightly nearer its original topic, I'm surprised that no one mentioned this... http://www.engadget.com/2008/05/21/open-source-ogd1-graphics-card-up-for-pre-order/ as a good place for hobbyists who want to play with FPGAs for calculation. -- -- Jim -- James Cownie -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080619/ea709304/attachment.html From peter.st.john at gmail.com Thu Jun 19 13:03:33 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu> Message-ID: Regarding: [RGB] Fortunately, there isn't that much Uranium on the planet, and we're burning it at a furious rate. We could conceivably deplete the currently known reserves in a matter of decades; it is NOT a long term solution to the problem of fossil fuel use. rgb Isn't there a scheme for creating fissionable material from more common stuff (hafnium?) by bombarding it with neutrons excited out of their nuclei by laster pulses? So one might make bombs without U235? (See e.g. http://en.wikipedia.org/wiki/Hafnium#Applications) Peter P.S. Geothermal is a long term solution, maybe; see http://geothermal.inel.gov/publications/future_of_geothermal_energy.pdf but I'm still hoping for practicable fusion. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080619/5016ab2c/attachment.html From James.P.Lux at jpl.nasa.gov Thu Jun 19 13:45:40 2008 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <87iqw6i7ck.fsf@snark.cb.piermont.com> <6D3FFF5E-20CA-4AFC-88B2-18AA74C7EB28@xs4all.nl> <87k5glfi7m.fsf@snark.cb.piermont.com> <87skv9dxd4.fsf@snark.cb.piermont.com> Message-ID: <6.2.5.6.2.20080619133800.02ccc488@jpl.nasa.gov> At 01:02 PM 6/19/2008, James Cownie wrote: >Returning the thread slightly nearer its original topic, I'm >surprised that no one mentioned this... > >http://www.engadget.com/2008/05/21/open-source-ogd1-graphics-card-up-for-pre-order/ > >as a good place for hobbyists who want to play with FPGAs for calculation. > >-- Indeed.. There's also a variety of eval cards from the manufacturers in this sort of around a kilobuck price range. Xilinx has a Spartan-3 PCIe with the XC3S1000 part (1/4 size of the one on the card above) for $350 From john.hearns at streamline-computing.com Thu Jun 19 14:22:02 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <20080618214603.GC31594@bx9.net> <200806181521.47115.kilian@stanford.edu> <48599AC9.3040100@berkeley.edu> <1213859837.4784.9.camel@Vigor13> Message-ID: <1213910532.7651.4.camel@Vigor13> On Thu, 2008-06-19 at 12:52 -0400, Peter St. John wrote: > I dug up this pdf from Nvidia: > http://www.nvidia.com/docs/IO/43395/tesla_product_overview_dec.pdf > Since I can't imagine coding a graphics card while it serves my X :-) > I supposed one might put the PCIE card in a box with a cheap SVGA for > the least-cost CUDA experiment One thing I've never understood, and hopefully someone on here can explain clearly, is why the onboard graphics is normally disabled when you add a PCI-e card. If this was an option in the BIOS it would be useful in this case (my own workstation is a Sun Ultra 20). From dnlombar at ichips.intel.com Thu Jun 19 14:23:04 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] SuperMicro and lm_sensors In-Reply-To: References: <20080618221153.GA24788@bx9.net> <20080619011419.GA7040@bx9.net> <20080619135020.GA8918@nlxdcldnl2.cl.intel.com> Message-ID: <20080619212304.GB8918@nlxdcldnl2.cl.intel.com> On Thu, Jun 19, 2008 at 11:28:08AM -0700, Bernard Li wrote: > Hi David: Hey Bernard! > On Thu, Jun 19, 2008 at 6:50 AM, Lombard, David N > wrote: > > > Did you look for /proc/acpi/thermal_zone/*/temperature The glob is for > > your BIOS-defined ID. If it does exist, that's the value that drives > > /proc/acpi/thermal_zone/*/trip_points > > > > See also /proc/acpi/thermal_zone/*/polling_frequency > > I have always wondered about /proc/acpi/thermal_zone. I noticed that > on some servers, the files exist, but on others, that directory is > empty. I guess this is dependent on whether the BIOS exposes the > information to the kernel? Or are there modules that I need to > install to get it working? Yes, a specific config is needed; but, a given kernel that sees it on one system, it will see it all systems IFF the BIOS sets it up. For any distro, it's a BIOS issue. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From john.hearns at streamline-computing.com Thu Jun 19 14:38:59 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: References: <485920D8.2030309@ias.edu> <20080618224523.GA18052@unthought.net> <5819D187-01A1-4623-9839-3D2771ADAF3F@xs4all.nl> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <6.2.5.6.2.20080619105301.02cc8fd8@jpl.nasa.gov> Message-ID: <1213911549.7651.10.camel@Vigor13> On Thu, 2008-06-19 at 13:51 -0400, Robert G. Brown wrote: > > > why worry about ICBMs when DHL/FedEx will deliver it to your selected > > doorstep? > > Well, even a small bomb would be pretty heavy. Kind of at the boundary > of what FedEx will delivery;-) Recall that the first British nuclear test was a device on board a ship, the frigate HMS Plym. This reflected fears that the British had of a nuclear weapon being smuggled into a harbour aboard a ship. The British, being a strong naval power at the time, were also of course interested in the effects of nuclear weapons on ships. ps. any chance of hauling this discussion back on track by mentioning Roadrunner? Does anyone know if there are projected ASCI type machines using Nvidia cards? I'd very much suppose that if anyone DOES have an answer to that one then a cell is waiting in Leavenworth, or the Tower. From kilian at stanford.edu Thu Jun 19 14:39:42 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <1213910532.7651.4.camel@Vigor13> References: <485920D8.2030309@ias.edu> <1213910532.7651.4.camel@Vigor13> Message-ID: <200806191439.43134.kilian@stanford.edu> On Thursday 19 June 2008 02:22:02 pm John Hearns wrote: > One thing I've never understood, and hopefully someone on here can > explain clearly, is why the onboard graphics is normally disabled > when you add a PCI-e card. It probably comes from the times when dual-head configurations were still bleeding-edge, and when BIOS manufacturers assumed that if you had a AGP/PCIe graphics card on your bus, you wanted to use it, rather than the usually cheapo onboard chip. Nowadays, NVIDIA's Hybrid-SLI [1] precisely aims to aggregate graphical power from the onboard chip and add-in card(s) in heterogenous configurations. So hopefully this trend will come to an end. At least on NV mobos. [1]http://www.nvidia.com/object/hybrid_sli.html Cheers, -- Kilian From csamuel at vpac.org Thu Jun 19 16:32:11 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <447096711.34491213918066691.JavaMail.root@zimbra.vpac.org> Message-ID: <1381552971.34571213918331904.JavaMail.root@zimbra.vpac.org> ----- "Kilian CAVALOTTI" wrote: > AFAIK, the multi GPU Tesla boxes contain up to 4 Tesla processors, but > are hooked to the controlling server with only 1 PCIe link, right? > Does this spell like "bottleneck" to anyone? The nVidia website says: http://www.nvidia.com/object/tesla_tech_specs.html # 6 GB of system memory (1.5 GB dedicated memory per GPU) [...] # Connects to host via cabling to a low power PCI # Express x8 or x16 adapter card So my guess is that you'd be using local RAM not the host systems RAM whilst computing. I took a photo of an open Tesla box at SC'07: http://flickr.com/photos/chrissamuel/2267613381/in/set-72157603919719911/ (click on "All sizes" for a larger version), my guess is that the DIMMS are hidden under the shrouds. There's a lot of fans there.. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Thu Jun 19 16:41:31 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <980339235.34611213918666184.JavaMail.root@zimbra.vpac.org> Message-ID: <1169577922.34631213918891004.JavaMail.root@zimbra.vpac.org> ----- "Robert G. Brown" wrote: > IIRC almost any of the high-end encryption routines available within > linux are effectively uncrackable, certainly uncrackable to somebody > with less than NSA-class resources. As long as the implementation is correct.. Debian SSL. :-) Humans are always the weak links in these things, whether that be implementation, crypto security or just doing plain dumb things like sending an email confirmation in the clear containing plain text passwords that were submitted over SSL. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From landman at scalableinformatics.com Thu Jun 19 17:08:43 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <1169577922.34631213918891004.JavaMail.root@zimbra.vpac.org> References: <1169577922.34631213918891004.JavaMail.root@zimbra.vpac.org> Message-ID: <485AF50B.2080807@scalableinformatics.com> Chris Samuel wrote: > ----- "Robert G. Brown" wrote: > >> IIRC almost any of the high-end encryption routines available within >> linux are effectively uncrackable, certainly uncrackable to somebody >> with less than NSA-class resources. > > As long as the implementation is correct.. Debian SSL. :-) N-tro-PEE? We dont need no steen-keen N-tro-PEE! Get yer fresh hot bits here, all 15 of them. > Humans are always the weak links in these things, > whether that be implementation, crypto security or > just doing plain dumb things like sending an email > confirmation in the clear containing plain text > passwords that were submitted over SSL. People spend lots of time and effort on security theater. Make up odd rules for passwords. Make them hard to guess and crack. Well, is that the vector for break-ins? Weak passwords? I saw a linux machine (a cluster) rooted. It was rooted because of a person with a windows laptop that happened to catch a key logger. Crackers had been attempting to break in to that machine for a long time, and here goes a grad student, and gives them the password. Worse, this grad student acted in a way we advised against, and ran jobs from root. Yeah, I know. Security theater is troubling. It gives us sheep the appearance of being secure, without any real additional value. Opie and multi-factor are hard to beat. And no theater needed. Even better, no worries about replay attacks with opie, or with a multi-factor that disables a password upon use. But even with these, you still need good *real* practices. A non-security theater practice would limit the damage one can do in a non-privileged setting. SElinux and Apparmor try to limit the damage even in a secure setting, though I am not sure how well they do there. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From kilian at stanford.edu Thu Jun 19 17:16:41 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] NVIDIA GPUs, CUDA, MD5, and "hobbyists" In-Reply-To: <1381552971.34571213918331904.JavaMail.root@zimbra.vpac.org> References: <1381552971.34571213918331904.JavaMail.root@zimbra.vpac.org> Message-ID: <200806191716.41743.kilian@stanford.edu> On Thursday 19 June 2008 04:32:11 pm Chris Samuel wrote: > ----- "Kilian CAVALOTTI" wrote: > > AFAIK, the multi GPU Tesla boxes contain up to 4 Tesla processors, > > but are hooked to the controlling server with only 1 PCIe link, > > right? Does this spell like "bottleneck" to anyone? > > The nVidia website says: > > http://www.nvidia.com/object/tesla_tech_specs.html > > # 6 GB of system memory (1.5 GB dedicated memory per GPU) The latest S1070 has even more than that: 4GB per GPU as it seems, according to [1]. But I think this refers to the "global memory", as decribed in [1] (slide 12, "Kernel Memory Access"). It's the graphics card main memory, the kind of one which is used to store textures in games, for instance. Each GPU core also has what they call "shared memory" and which is really only shared between threads on the same core (it's more like a L2 cache actually). > So my guess is that you'd be using local RAM not the > host systems RAM whilst computing. Right, but at some point, you do need to transfer data from the host memory to the GPU memory, and back. That's where there's probably a bottleneck if all 4 GPUs want to read/dump data from/to the host at the same time. Moreover, I don't think that the different GPUs can work together, ie. exchange data and participate to the same parallel computation. Unless they release something along the lines of a CUDA-MPI, those 4 GPUs sitting in the box would have to be considered as independent processing units. So as I understand it, the scaling benefits from your application's parallelization would be limited to one GPU, no matter how many you got hooked to your machine. I don't even know how you choose (or even if you can choose) on which GPU you want your code to be executed. It has to be handled by the driver on the host machine somehow. > There's a lot of fans there.. They probably get hot. At least the G80 do. They say "Typical Power Consumption: 700W" for the 4 GPUs box. Given that a modern gaming rig featuring a pair of 8800GTX in SLI already requires a 1kW PSU, I would put this on the optimistic side. [1]http://www.nvidia.com/object/tesla_s1070.html [2]http://www.mathematik.uni-dortmund.de/~goeddeke/arcs2008/C1_CUDA.pdf Cheers, -- Kilian From kilian at stanford.edu Thu Jun 19 17:25:22 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <485AF50B.2080807@scalableinformatics.com> References: <1169577922.34631213918891004.JavaMail.root@zimbra.vpac.org> <485AF50B.2080807@scalableinformatics.com> Message-ID: <200806191725.23005.kilian@stanford.edu> On Thursday 19 June 2008 05:08:43 pm Joe Landman wrote: > SElinux and Apparmor try to limit the damage > even in a secure setting, though I am not sure how well they do > there. If you want/need to use things like Lustre, for instance, you can forgot about SELinux and AppArmor, it simply doesn't work. Isn't it a common practice in HPC to keep security rules relatively relaxed *inside* a cluster (passwordless logins between compute nodes for instance), whilst trying to harden the links to the external world? I mean, most of the scientific applications haven't precisely been designed with security as their first concern, have they? Cheers -- Kilian From landman at scalableinformatics.com Thu Jun 19 17:33:40 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:07:19 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <200806191725.23005.kilian@stanford.edu> References: <1169577922.34631213918891004.JavaMail.root@zimbra.vpac.org> <485AF50B.2080807@scalableinformatics.com> <200806191725.23005.kilian@stanford.edu> Message-ID: <485AFAE4.1050705@scalableinformatics.com> Kilian CAVALOTTI wrote: > On Thursday 19 June 2008 05:08:43 pm Joe Landman wrote: >> SElinux and Apparmor try to limit the damage >> even in a secure setting, though I am not sure how well they do >> there. > > If you want/need to use things like Lustre, for instance, you can forgot > about SELinux and AppArmor, it simply doesn't work. In the Lustre release notes: "Do not laugh in the presence of Lustre, for it is subtle and quick to anger". (for those who aren't sure, this is an attempt at humor ... no need to skewer me over "anti-Lustre" comments ...) > > Isn't it a common practice in HPC to keep security rules relatively > relaxed *inside* a cluster (passwordless logins between compute nodes > for instance), whilst trying to harden the links to the external world? Yes. Leave the window wide open while bolting the door tight ... :( > > I mean, most of the scientific applications haven't precisely been > designed with security as their first concern, have they? I would be happy if they had better code than if (!(fdopen( ... )) { printf "I died\n"; exit(-1); } > > Cheers -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From csamuel at vpac.org Thu Jun 19 17:43:59 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:20 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <485AF50B.2080807@scalableinformatics.com> Message-ID: <2031586795.35641213922639112.JavaMail.root@zimbra.vpac.org> ----- "Joe Landman" wrote: > People spend lots of time and effort on security theater. Make up odd > rules for passwords. Make them hard to guess and crack. Well, is > that the vector for break-ins? Weak passwords? Yeah - sadly.. :-( -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Thu Jun 19 17:52:57 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:07:20 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: Message-ID: <1193904974.35781213923177281.JavaMail.root@zimbra.vpac.org> ----- "Peter St. John" wrote: > War is hell, but I wouldn't call it a "suicide device". http://www.damninteresting.com/?p=783 # The Davy Crockett's timer allowed a minimum shot # distance of about 1,000 feet, but such inept use # of the weapon would certainly result in the deaths # of the firing team. In most cases, the approaching # Soviets would be at least one mile away, leaving the # Atomic Battle Group personnel outside of the hazard # zone. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From rgb at phy.duke.edu Thu Jun 19 19:33:47 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:07:20 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <2031586795.35641213922639112.JavaMail.root@zimbra.vpac.org> References: <2031586795.35641213922639112.JavaMail.root@zimbra.vpac.org> Message-ID: On Fri, 20 Jun 2008, Chris Samuel wrote: > > ----- "Joe Landman" wrote: > >> People spend lots of time and effort on security theater. Make up odd >> rules for passwords. Make them hard to guess and crack. Well, is >> that the vector for break-ins? Weak passwords? > > Yeah - sadly.. :-( Do you have an recent contemporary evidence for that? I mean, back in the 80's and 90's, when I could use ypx to grab anybody's encrypted password files and run crack on them and get a dozen hits in a few hours of work, sure, but since MD5 became near-universal and since /etc/shadow was invented and since they fixed the worst of the holes that let "anybody" get at the encrypted password list, since password changing programs no longer let you use a REALLY bad password (or at least bitch about it if they do), since sysadmins started routinely running crack on the encrypted list defensively and forcing the change of particularly weak ones, since most systems can beconfigured with tools that bitch or slow down or flag repeated brute force attacks, I'd have thought that wasn't so true anymore. We run log scanners that count the attacks on our systems in a 24 hour period and break them down by e.g. originating IP number and so on, and truth be told they are nearly continuous, but I haven't heard of any of those attacks SUCCEEDING on any linux box run by any non-complete-idiot for years now. Password TRAPS are a pretty common vector; the only cases I tend to hear of at all commonly anymore for crackings (of linux boxes, not Windows systems that are cracked or infected almost at will) tend to be somebody who goes home for the summer, uses an infected, trojanned, vile spewpot of a Windows box to login back at duke from home via e.g. putty or some other related interface, and has their keystrokes logged as they do. Quite a lot of the Windows viruses install trojan spyware that does full keystroke logging and so on; I got to watch one attempt this on one of my kids boxes when it was infected, and had to change one of my passwords after cleaning it up because (sigh) I had to use it to get Duke to get the site license software I needed to do the cleaning. There are also still -- relatively rarely -- buffer overwrite attacks discovered. Most coders "get it" that one shalt not use the non-n string commands to manipulate buffers these days, although there is still legacy code in existence (I'm sure) that has it. I personally last got nailed by the slammer attack, because I got lazy about updates (this was barely pre-yum) and didn't patch my web software in time. Kernel bugs, and MAYBE a rare race condition, still sometimes allow promotion to root. But weak passwords that are brute force guessed or cracked from the shadow file? Only on a poorly managed network, one where the sysadmin doesn't bother to check and fails to inform the users of how to choose a good one, AND where users manage to gain access to the shadow file in the first place. rgb (of course MY passwd is just rgbbgr -- that's secure enough don