From stewart at serissa.com Sun Aug 1 07:02:50 2010 From: stewart at serissa.com (Lawrence Stewart) Date: Sun, 1 Aug 2010 10:02:50 -0400 Subject: [Beowulf] Scale modl Cray-1 In-Reply-To: <4C52ECAB.1020502@ldeo.columbia.edu> References: <68A57CCFD4005646957BD2D18E60667B1154A77D@milexchmb1.mil.tagmclarengroup.com> <4C52ECAB.1020502@ldeo.columbia.edu> Message-ID: <37C88C5B-B3D8-460F-928C-3AFC14B7BCD3@serissa.com> http://simh.trailing-edge.org or google simh there are many machines and an active community. the effort is coordinated by Bob Supnik, ex from many machine projects at Digital. he credits me with the original idea for the project, but that is way too much credit. simh mostly has minicomputers and older machines. software is available for most of them. for a while Bob was my boss at SiCortex, and one year at SC we had 12 emulators running simultaneously on one of the 72 core deskside machines. fun. -Larry On Jul 30, 2010, at 11:15 AM, Gus Correa wrote: > Hearns, John wrote: >> Enjoy. http://www.theregister.co.uk/2010/07/29/cray_1_replica/ >> The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > > In this age of virtualization, > I was wondering if there are simulators in software (say, for Linux) > of famous old computers: PDP-11, VAX, Cray-1, IBM 1130, IBM/360, > CDC 6600, even the ENIAC perhaps. > From instruction set, to OS, to applications. > > Any references? > > Thanks, > Gus Correa > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gus at ldeo.columbia.edu Sun Aug 1 19:02:09 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Sun, 01 Aug 2010 22:02:09 -0400 Subject: [Beowulf] Scale modl Cray-1 In-Reply-To: <37C88C5B-B3D8-460F-928C-3AFC14B7BCD3@serissa.com> References: <68A57CCFD4005646957BD2D18E60667B1154A77D@milexchmb1.mil.tagmclarengroup.com> <4C52ECAB.1020502@ldeo.columbia.edu> <37C88C5B-B3D8-460F-928C-3AFC14B7BCD3@serissa.com> Message-ID: <4C562721.5030002@ldeo.columbia.edu> Thank you all who responded: Larry Stewart, David Lombard, Franklin Jones, Douglas Guptill, Thomasz Rolla, Jim Lux. Glad to see that many of you had not only thought of this, but actually implemented simulators for many outstanding computers. I compiled SIMH. However, I suppose I need to load software to actually simulate each computer (Assembler? OS? other more specific sw?), correct? If yes, is this software available, and where? Any documentation on how to run them? Many thanks, Gus Correa Lawrence Stewart wrote: > http://simh.trailing-edge.org or google simh > > there are many machines and an active community. the effort is coordinated by Bob Supnik, ex from many machine projects at Digital. he credits me with the original idea for the project, but that is way too much credit. > > simh mostly has minicomputers and older machines. software is available for most of them. > > for a while Bob was my boss at SiCortex, and one year at SC we had 12 emulators running simultaneously on one of the 72 core deskside machines. fun. > > -Larry > > > On Jul 30, 2010, at 11:15 AM, Gus Correa wrote: > >> Hearns, John wrote: >>> Enjoy. http://www.theregister.co.uk/2010/07/29/cray_1_replica/ >>> The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> >> In this age of virtualization, >> I was wondering if there are simulators in software (say, for Linux) >> of famous old computers: PDP-11, VAX, Cray-1, IBM 1130, IBM/360, >> CDC 6600, even the ENIAC perhaps. >> From instruction set, to OS, to applications. >> >> Any references? >> >> Thanks, >> Gus Correa >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dnlombar at ichips.intel.com Mon Aug 2 08:15:51 2010 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Mon, 2 Aug 2010 08:15:51 -0700 Subject: [Beowulf] Scale modl Cray-1 In-Reply-To: <4C562721.5030002@ldeo.columbia.edu> References: <68A57CCFD4005646957BD2D18E60667B1154A77D@milexchmb1.mil.tagmclarengroup.com> <4C52ECAB.1020502@ldeo.columbia.edu> <37C88C5B-B3D8-460F-928C-3AFC14B7BCD3@serissa.com> <4C562721.5030002@ldeo.columbia.edu> Message-ID: <20100802151551.GA4013@nlxcldnl2.cl.intel.com> On Sun, Aug 01, 2010 at 07:02:09PM -0700, Gus Correa wrote: > Thank you all who responded: > Larry Stewart, David Lombard, Franklin Jones, > Douglas Guptill, Thomasz Rolla, Jim Lux. > > Glad to see that many of you had not only thought of this, > but actually implemented simulators for many outstanding computers. > > I compiled SIMH. > > However, I suppose I need to load software to actually simulate > each computer (Assembler? OS? other more specific sw?), correct? > > If yes, is this software available, and where? The 1130 sw is available from ibm1130.org . I expect that type of software will usually only be available from the people and projects that are interested in the simulated system. > Any documentation on how to run them? The 1130 sw is sufficiently documented to run it. I've run it--and as expected--it *does* run faster than the original hw. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From rpnabar at gmail.com Thu Aug 5 14:47:19 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu, 5 Aug 2010 16:47:19 -0500 Subject: [Beowulf] what defines "enterprise class" hard drives? Message-ID: I wanted to buy some 1 Terabyte SATA drives for our storage array and wanted to stay away from the cheap desktop stuff. But each manufacturer has some "enterprise class drives". But is there something specific to look for? Most of those seem to have a MTBF of around 1.2 million hours and a URE of about 1 in 10^15. The S.M.A.R.T. abilities seem fairly standard. Is there a list somewhere of well tested drives? Or any recommendations? -- Rahul From jlforrest at berkeley.edu Thu Aug 5 15:33:26 2010 From: jlforrest at berkeley.edu (Jon Forrest) Date: Thu, 05 Aug 2010 15:33:26 -0700 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: References: Message-ID: <4C5B3C36.1070406@berkeley.edu> On 8/5/2010 2:47 PM, Rahul Nabar wrote: > I wanted to buy some 1 Terabyte SATA drives for our storage array and > wanted to stay away from the cheap desktop stuff. But each > manufacturer has some "enterprise class drives". But is there > something specific to look for? Most of those seem to have a MTBF of > around 1.2 million hours and a URE of about 1 in 10^15. The S.M.A.R.T. > abilities seem fairly standard. Is there a list somewhere of well > tested drives? Or any recommendations? We've talked about this topic on this list before. There are several schools of thought. Some people base their opinions on the manufacturer's claims, and some people base their opinions on the famous papers from Google and CMU that came out a couple of years ago that described how very large numbers of drives really work. It's an interesting topic, no doubt. Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From rpnabar at gmail.com Thu Aug 5 15:42:13 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu, 5 Aug 2010 17:42:13 -0500 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: <4C5B3C36.1070406@berkeley.edu> References: <4C5B3C36.1070406@berkeley.edu> Message-ID: On Thu, Aug 5, 2010 at 5:33 PM, Jon Forrest wrote: > We've talked about this topic on this list > before. There are several schools of thought. > Some people base their opinions on the manufacturer's > claims, and some people base their opinions > on the famous papers from Google and CMU that > came out a couple of years ago that described > how very large numbers of drives really work. > > It's an interesting topic, no doubt. Ah! The google hard disk list. Thanks! I totally forgot about that one. I'll look there. -- Rahul From rpnabar at gmail.com Thu Aug 5 15:44:56 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu, 5 Aug 2010 17:44:56 -0500 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: <20100805183414.32e5e92b@jabberwock.cb.piermont.com> References: <20100805183414.32e5e92b@jabberwock.cb.piermont.com> Message-ID: On Thu, Aug 5, 2010 at 5:34 PM, Perry E. Metzger wrote: > On Thu, 5 Aug 2010 16:47:19 -0500 Rahul Nabar > wrote: >> I wanted to buy some 1 Terabyte SATA drives for our storage array >> and wanted to stay away from the cheap desktop stuff. But each >> manufacturer has some "enterprise class drives". But is there >> something specific to look for? Most of those seem to have a MTBF of >> around 1.2 million hours and a URE of about 1 in 10^15. The >> S.M.A.R.T. abilities seem fairly standard. ?Is there a list >> somewhere of well tested drives? Or any recommendations? > > Why do you want to pay more for drives? > > If you have hundreds or thousands of machines, you will get failures > no matter what, so you will need to set up your software to deal with > failures no matter what. Assuming that a failure doesn't cause you > much harm, you might as well simply accept a slightly higher failure > rate in exchange for being able to pay less per node, which lets you > buy more nodes. You can always keep spares, and indeed, you will have > to in either case. Sure, I do have a RAID level on it so a failure per-se isn't disaster. And I wouldn't pay a $1000 dollar premium for it. But I wouldn't mind paying $50 more if it translates to less trips to the cluster room and fewer RAID rebuilds. That's why I'm trying to buy something better than a cheap run-of-the mill from newegg. -- Rahul From sabujp at gmail.com Thu Aug 5 15:57:00 2010 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Thu, 5 Aug 2010 17:57:00 -0500 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: <4C5B3C36.1070406@berkeley.edu> References: <4C5B3C36.1070406@berkeley.edu> Message-ID: Hi, For SATA drives, my take is that if it has a 3YR warranty then it's a consumer line drive. If it has a 5YR warranty from the manufacturer then I consider it an "enterprise" class drive, even if it's not branded as such. That being said, I think I saw some western digital black drives with 5YR warranties (on newegg) that are not branded as "enterprise" so they don't cost twice as much. However, there are (or at least used to be) features of enterprise class drives which are not available on the consumer line of drives. Whether or not these features are necessary for operation on your brand of storage array is something you should check. The good companies that make storage arrays have a matrix/list of drives+firmware versions that they've tested and qualified for use with their arrays. If your array is a bunch of rack servers, make sure the controllers will accept non-branded/certified drives (e.g. make sure it doesn't require a Dell, HP, or IBM branded drive). After having determine that your array will take COTS components, and if you can go with slightly slower drives, I'd go with the WD Caviar green series, black if you want something faster. I'd stay away from the really cheap 5400 RPM Seagate drives. On Thu, Aug 5, 2010 at 5:33 PM, Jon Forrest wrote: > On 8/5/2010 2:47 PM, Rahul Nabar wrote: >> >> I wanted to buy some 1 Terabyte SATA drives for our storage array and >> wanted to stay away from the cheap desktop stuff. But each >> manufacturer has some "enterprise class drives". But is there >> something specific to look for? Most of those seem to have a MTBF of >> around 1.2 million hours and a URE of about 1 in 10^15. The S.M.A.R.T. >> abilities seem fairly standard. ?Is there a list somewhere of well >> tested drives? Or any recommendations? From coutinho at dcc.ufmg.br Thu Aug 5 16:28:22 2010 From: coutinho at dcc.ufmg.br (Bruno Coutinho) Date: Thu, 5 Aug 2010 20:28:22 -0300 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: References: <20100805183414.32e5e92b@jabberwock.cb.piermont.com> Message-ID: Seagate claims that their ES.2 SATA disks have higher rotational vibration tolerance. This could be useful if you have several disks working close to each other. 2010/8/5 Rahul Nabar > On Thu, Aug 5, 2010 at 5:34 PM, Perry E. Metzger > wrote: > > On Thu, 5 Aug 2010 16:47:19 -0500 Rahul Nabar > > wrote: > >> I wanted to buy some 1 Terabyte SATA drives for our storage array > >> and wanted to stay away from the cheap desktop stuff. But each > >> manufacturer has some "enterprise class drives". But is there > >> something specific to look for? Most of those seem to have a MTBF of > >> around 1.2 million hours and a URE of about 1 in 10^15. The > >> S.M.A.R.T. abilities seem fairly standard. Is there a list > >> somewhere of well tested drives? Or any recommendations? > > > > Why do you want to pay more for drives? > > > > If you have hundreds or thousands of machines, you will get failures > > no matter what, so you will need to set up your software to deal with > > failures no matter what. Assuming that a failure doesn't cause you > > much harm, you might as well simply accept a slightly higher failure > > rate in exchange for being able to pay less per node, which lets you > > buy more nodes. You can always keep spares, and indeed, you will have > > to in either case. > > Sure, I do have a RAID level on it so a failure per-se isn't disaster. > And I wouldn't pay a $1000 dollar premium for it. But I wouldn't mind > paying $50 more if it translates to less trips to the cluster room and > fewer RAID rebuilds. > > That's why I'm trying to buy something better than a cheap run-of-the > mill from newegg. > > -- > Rahul > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Thu Aug 5 16:39:17 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 05 Aug 2010 19:39:17 -0400 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: References: Message-ID: <4C5B4BA5.2030508@scalableinformatics.com> On 08/05/2010 05:47 PM, Rahul Nabar wrote: > I wanted to buy some 1 Terabyte SATA drives for our storage array and > wanted to stay away from the cheap desktop stuff. But each > manufacturer has some "enterprise class drives". But is there > something specific to look for? Most of those seem to have a MTBF of > around 1.2 million hours and a URE of about 1 in 10^15. The S.M.A.R.T. > abilities seem fairly standard. Is there a list somewhere of well > tested drives? Or any recommendations? > Understand that these designations have more to do with product feature set groupings (marketing groupings) than anything else. The feature set variations are mostly in the firmware for large groups of product offerings (lowers BOM costs by using the same physical hardware on all units). A "desktop" drive will work much harder to recover an error than an "enterprise" drive. The former is assumed not to be in a RAID so that it has to handle errors itself, and the latter is assumed to pass errors up to a RAID controller. The head settling logic (not head hardware) may be different, as "desktop" drives, again, assumed not to be in RAIDs, shouldn't have to worry about all the vibration maxima at 120Hz, 166.7Hz, and 250Hz (for 7200, 10k and 15k RPM drives), or their beat frequencies and octaves. "Enterprise" drives have to worry about insufficient vibration damping, so the firmware pack could have a different head settling/seeking code within it. "Desktop" drives are assumed to not be spinning 24x7, and have some interesting power down capabilities, which are anathema to many RAID controllers (and MD RAID for that matter). Can you use "desktop" drives in your RAIDs? In some cases, yes. Just be careful with this. This said, I'd strongly recommend extended testing. As we have discovered during some of our testing in this area, some "desktop" (and in this vendors case, their "enterprise") drives take ... er ... liberties with the specs, and can, and do wreak havoc on RAIDs (which is in part why we no longer ship this particular vendor's drives). -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From rpnabar at gmail.com Thu Aug 5 23:25:03 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 6 Aug 2010 01:25:03 -0500 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: References: <4C5B3C36.1070406@berkeley.edu> Message-ID: On Thu, Aug 5, 2010 at 5:57 PM, Sabuj Pattanayek wrote: > If your array is a bunch of rack servers, make sure the controllers > will accept non-branded/certified drives (e.g. make sure it doesn't > require a Dell, HP, or IBM branded drive). > > After having determine that your array will take COTS components, and > if you can go with slightly slower drives, I'd go with the WD Caviar > green series, black if you want something faster. I'd stay away from > the really cheap 5400 RPM Seagate drives. Thanks for the very specific tips about the 5 year warranty and also Caviar Green. I've specifically selected the JBOD so that I'm not tied to any one manufacturer's overpriced drives. -- Rahul From rpnabar at gmail.com Thu Aug 5 23:36:00 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 6 Aug 2010 01:36:00 -0500 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: <4C5B4BA5.2030508@scalableinformatics.com> References: <4C5B4BA5.2030508@scalableinformatics.com> Message-ID: On Thu, Aug 5, 2010 at 6:39 PM, Joe Landman wrote: > On 08/05/2010 05:47 PM, Rahul Nabar wrote: >> >> I wanted to buy some 1 Terabyte SATA drives for our storage array and >> wanted to stay away from the cheap desktop stuff. But each >> manufacturer has some "enterprise class drives". But is there >> something specific to look for? Most of those seem to have a MTBF of >> around 1.2 million hours and a URE of about 1 in 10^15. The S.M.A.R.T. >> abilities seem fairly standard. ?Is there a list somewhere of well >> tested drives? Or any recommendations? >> > > Understand that these designations have more to do with product feature set > groupings (marketing groupings) than anything else. ?The feature set > variations are mostly in the firmware for large groups of product offerings > (lowers BOM costs by using the same physical hardware on all units). Thanks very much Joe for your helpful insights (as always! ) !! -- Rahul From pal at di.fct.unl.pt Fri Aug 6 05:15:20 2010 From: pal at di.fct.unl.pt (Paulo Afonso Lopes) Date: Fri, 6 Aug 2010 13:15:20 +0100 (WEST) Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: References: <4C5B3C36.1070406@berkeley.edu>

Message-ID: <59390.193.136.122.17.1281096920.squirrel@webmail.fct.unl.pt> > On Thu, Aug 5, 2010 at 5:57 PM, Sabuj Pattanayek wrote: >> If your array is a bunch of rack servers, make sure the controllers >> will accept non-branded/certified drives (e.g. make sure it doesn't >> require a Dell, HP, or IBM branded drive). >> >> After having determine that your array will take COTS components, and >> if you can go with slightly slower drives, I'd go with the WD Caviar >> green series, black if you want something faster. I'd stay away from >> the really cheap 5400 RPM Seagate drives. > > Thanks for the very specific tips about the 5 year warranty and also > Caviar Green. I've specifically selected the JBOD so that I'm not > tied to any one manufacturer's overpriced drives. > Rahul, As others have said, beware :-) Some array vendors do NOT allow you to use off-the-shelf disks. For example, I remember DG CLARiiON did not allow you to use off-the-shelf disks, as they flashed in new firmware into the drives. In the begining Field Support Techs had access to a hidden option in the array software that allowed them to flash COTS drives, but this was later removed. I suspect that EMC (and others, say HP, IBM,...) do the same. The disks do not "show up" in the array, so it does not matter if you want to config them as JBOD or RAID-something :-))) (sidenote on "proprietary" stuff: I have 2 Avocent 8-port non-IP KVMs, one IBM branded and the other HP branded; they look exactly the same model. If you move a KVM-to-RJ45 adapter from the IBM to the HP (or vice-versa) it does not work, it displays something like "Unrecognised adapter") paulo -- Paulo Afonso Lopes | Tel: +351- 21 294 8536 Departamento de Inform?tica | 294 8300 ext.10702 Faculdade de Ci?ncias e Tecnologia | Fax: +351- 21 294 8541 Universidade Nova de Lisboa | e-mail: poral at fct.unl.pt 2829-516 Caparica, PORTUGAL From rpnabar at gmail.com Fri Aug 6 05:39:52 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 6 Aug 2010 07:39:52 -0500 Subject: [Beowulf] what defines "enterprise class" hard drives? In-Reply-To: <59390.193.136.122.17.1281096920.squirrel@webmail.fct.unl.pt> References: <4C5B3C36.1070406@berkeley.edu>

<59390.193.136.122.17.1281096920.squirrel@webmail.fct.unl.pt> Message-ID: On Fri, Aug 6, 2010 at 7:15 AM, Paulo Afonso Lopes wrote: > As others have said, beware :-) > > Some array vendors do NOT allow you to use off-the-shelf disks. For > example, I remember DG CLARiiON did not allow you to use off-the-shelf > disks, as they flashed in new firmware into the drives. In the begining > Field Support Techs had access to a hidden option in the array software > that allowed them to flash COTS drives, but this was later removed. I > suspect that EMC (and others, say HP, IBM,...) do the same. Thanks for the tip Paulo! But I have been bitten by this before. :) So these days I make sure that's one of the first questions I ask the vendors: "Will it play well with other drives?" And this (along with cost) is the reason I am _not_ buying HP-IBM. >(sidenote on "proprietary" stuff: I have 2 Avocent 8-port non-IP KVMs, one >IBM branded and the other HP branded; they look exactly the same model. If >you move a KVM-to-RJ45 adapter from the IBM to the HP (or vice-versa) it >does not work, it displays something like "Unrecognised adapter") That's a new example of "unrecognized foobar". Another one that surprised my recently was my Cisco 10gigE Switch that refused to recognize ethernet cables made by anyone other than Cisco. Well, not cables really but the SFP's so I suppose they have a chip in there that the switch can query and check if the cable is Cisco-made or not. -- Rahul From walid.shaari at gmail.com Fri Aug 13 23:16:19 2010 From: walid.shaari at gmail.com (Walid) Date: Sat, 14 Aug 2010 09:16:19 +0300 Subject: [Beowulf] Kernel action relevant to us In-Reply-To: <20091217020548.GC19867@bx9.net> References: <20091217020548.GC19867@bx9.net> Message-ID: Greg, do we know if that have made it to any Linux Kernel? kind regards Walid On 17 December 2009 05:05, Greg Lindahl wrote: > The following patch, not yet accepted into the kernel, should allow > local TCP connections to start up faster, while remote ones keep the > same behavior of slow start. > > ----- Forwarded message from chavey at google.com ----- > > From: chavey at google.com > Date: Tue, 15 Dec 2009 13:15:28 -0800 > To: davem at davemloft.net > CC: netdev at vger.kernel.org, therbert at google.com, chavey at google.com, > eric.dumazet at gmail.com > Subject: [PATCH] Add rtnetlink init_rcvwnd to set the TCP initial receive > window > X-Mailing-List: netdev at vger.kernel.org > > Add rtnetlink init_rcvwnd to set the TCP initial receive window size > advertised by passive and active TCP connections. > The current Linux TCP implementation limits the advertised TCP initial > receive window to the one prescribed by slow start. For short lived > TCP connections used for transaction type of traffic (i.e. http > requests), bounding the advertised TCP initial receive window results > in increased latency to complete the transaction. > Support for setting initial congestion window is already supported > using rtnetlink init_cwnd, but the feature is useless without the > ability to set a larger TCP initial receive window. > The rtnetlink init_rcvwnd allows increasing the TCP initial receive > window, allowing TCP connection to advertise larger TCP receive window > than the ones bounded by slow start. > > Signed-off-by: Laurent Chavey > --- > include/linux/rtnetlink.h | 2 ++ > include/net/dst.h | 2 -- > include/net/tcp.h | 3 ++- > net/ipv4/syncookies.c | 3 ++- > net/ipv4/tcp_output.c | 17 +++++++++++++---- > net/ipv6/syncookies.c | 3 ++- > 6 files changed, 21 insertions(+), 9 deletions(-) > > diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h > index adf2068..db6f614 100644 > --- a/include/linux/rtnetlink.h > +++ b/include/linux/rtnetlink.h > @@ -371,6 +371,8 @@ enum > #define RTAX_FEATURES RTAX_FEATURES > RTAX_RTO_MIN, > #define RTAX_RTO_MIN RTAX_RTO_MIN > + RTAX_INITRWND, > +#define RTAX_INITRWND RTAX_INITRWND > __RTAX_MAX > }; > > diff --git a/include/net/dst.h b/include/net/dst.h > index 5a900dd..6ef812a 100644 > --- a/include/net/dst.h > +++ b/include/net/dst.h > @@ -84,8 +84,6 @@ struct dst_entry > * (L1_CACHE_SIZE would be too much) > */ > #ifdef CONFIG_64BIT > - long __pad_to_align_refcnt[2]; > -#else > long __pad_to_align_refcnt[1]; > #endif > /* > diff --git a/include/net/tcp.h b/include/net/tcp.h > index 03a49c7..6f95d32 100644 > --- a/include/net/tcp.h > +++ b/include/net/tcp.h > @@ -972,7 +972,8 @@ static inline void tcp_sack_reset(struct > tcp_options_received *rx_opt) > /* Determine a window scaling and initial window to offer. */ > extern void tcp_select_initial_window(int __space, __u32 mss, > __u32 *rcv_wnd, __u32 *window_clamp, > - int wscale_ok, __u8 *rcv_wscale); > + int wscale_ok, __u8 *rcv_wscale, > + __u32 init_rcv_wnd); > > static inline int tcp_win_from_space(int space) > { > diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c > index a6e0e07..d43173c 100644 > --- a/net/ipv4/syncookies.c > +++ b/net/ipv4/syncookies.c > @@ -356,7 +356,8 @@ struct sock *cookie_v4_check(struct sock *sk, struct > sk_buff *skb, > > tcp_select_initial_window(tcp_full_space(sk), req->mss, > &req->rcv_wnd, &req->window_clamp, > - ireq->wscale_ok, &rcv_wscale); > + ireq->wscale_ok, &rcv_wscale, > + dst_metric(&rt->u.dst, RTAX_INITRWND)); > > ireq->rcv_wscale = rcv_wscale; > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c > index fcd278a..ee42c75 100644 > --- a/net/ipv4/tcp_output.c > +++ b/net/ipv4/tcp_output.c > @@ -179,7 +179,8 @@ static inline void tcp_event_ack_sent(struct sock *sk, > unsigned int pkts) > */ > void tcp_select_initial_window(int __space, __u32 mss, > __u32 *rcv_wnd, __u32 *window_clamp, > - int wscale_ok, __u8 *rcv_wscale) > + int wscale_ok, __u8 *rcv_wscale, > + __u32 init_rcv_wnd) > { > unsigned int space = (__space < 0 ? 0 : __space); > > @@ -228,7 +229,13 @@ void tcp_select_initial_window(int __space, __u32 mss, > init_cwnd = 2; > else if (mss > 1460) > init_cwnd = 3; > - if (*rcv_wnd > init_cwnd * mss) > + /* when initializing use the value from init_rcv_wnd > + * rather than the default from above > + */ > + if (init_rcv_wnd && > + (*rcv_wnd > init_rcv_wnd * mss)) > + *rcv_wnd = init_rcv_wnd * mss; > + else if (*rcv_wnd > init_cwnd * mss) > *rcv_wnd = init_cwnd * mss; > } > > @@ -2254,7 +2261,8 @@ struct sk_buff *tcp_make_synack(struct sock *sk, > struct dst_entry *dst, > &req->rcv_wnd, > &req->window_clamp, > ireq->wscale_ok, > - &rcv_wscale); > + &rcv_wscale, > + dst_metric(dst, RTAX_INITRWND)); > ireq->rcv_wscale = rcv_wscale; > } > > @@ -2342,7 +2350,8 @@ static void tcp_connect_init(struct sock *sk) > &tp->rcv_wnd, > &tp->window_clamp, > sysctl_tcp_window_scaling, > - &rcv_wscale); > + &rcv_wscale, > + dst_metric(dst, RTAX_INITRWND)); > > tp->rx_opt.rcv_wscale = rcv_wscale; > tp->rcv_ssthresh = tp->rcv_wnd; > diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c > index 6b6ae91..c8982aa 100644 > --- a/net/ipv6/syncookies.c > +++ b/net/ipv6/syncookies.c > @@ -267,7 +267,8 @@ struct sock *cookie_v6_check(struct sock *sk, struct > sk_buff *skb) > req->window_clamp = tp->window_clamp ? :dst_metric(dst, > RTAX_WINDOW); > tcp_select_initial_window(tcp_full_space(sk), req->mss, > &req->rcv_wnd, &req->window_clamp, > - ireq->wscale_ok, &rcv_wscale); > + ireq->wscale_ok, &rcv_wscale, > + dst_metric(dst, RTAX_INITRWND)); > > ireq->rcv_wscale = rcv_wscale; > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > ----- End forwarded message ----- > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From samuel at unimelb.edu.au Sat Aug 14 16:13:50 2010 From: samuel at unimelb.edu.au (Chris Samuel) Date: Sun, 15 Aug 2010 09:13:50 +1000 Subject: [Beowulf] Kernel action relevant to us In-Reply-To: References: <20091217020548.GC19867@bx9.net> Message-ID: <201008150913.51088.samuel@unimelb.edu.au> On Sat, 14 Aug 2010 04:16:19 pm Walid wrote: > do we know if that have made it to any Linux Kernel? Looks like it was merged for 2.6.34-rc1 according to gitk, so yes, it should be in the current kernel. Author: laurent chavey 2009-12-15 22:15:28 Committer: David S. Miller 2009-12-24 09:13:30 Parent: 068a2de57ddf4f472e32e7af868613c574ad1d88 (net: release dst entry while cache-hot for GSO case too) Branches: master, remotes/origin/master Follows: v2.6.33-rc1 Precedes: v2.6.34-rc1 cheers! Chris -- Christopher Samuel Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ From lindahl at pbm.com Mon Aug 16 18:50:13 2010 From: lindahl at pbm.com (Greg Lindahl) Date: Mon, 16 Aug 2010 18:50:13 -0700 Subject: [Beowulf] Kernel action relevant to us In-Reply-To: <201008150913.51088.samuel@unimelb.edu.au> References: <20091217020548.GC19867@bx9.net> <201008150913.51088.samuel@unimelb.edu.au> Message-ID: <20100817015013.GA30904@bx9.net> On Sun, Aug 15, 2010 at 09:13:50AM +1000, Chris Samuel wrote: > On Sat, 14 Aug 2010 04:16:19 pm Walid wrote: > > > do we know if that have made it to any Linux Kernel? > > Looks like it was merged for 2.6.34-rc1 according to gitk, so > yes, it should be in the current kernel. It needs to be explicitly configured, if I understand it correctly. It'd be nice if someone posted how to do that. -- greg From gdjacobs at gmail.com Mon Aug 16 19:59:22 2010 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Mon, 16 Aug 2010 21:59:22 -0500 Subject: [Beowulf] Scale modl Cray-1 In-Reply-To: <20100802151551.GA4013@nlxcldnl2.cl.intel.com> References: <68A57CCFD4005646957BD2D18E60667B1154A77D@milexchmb1.mil.tagmclarengroup.com> <4C52ECAB.1020502@ldeo.columbia.edu> <37C88C5B-B3D8-460F-928C-3AFC14B7BCD3@serissa.com> <4C562721.5030002@ldeo.columbia.edu> <20100802151551.GA4013@nlxcldnl2.cl.intel.com> Message-ID: <4C69FB0A.2010309@gmail.com> David N. Lombard wrote: > On Sun, Aug 01, 2010 at 07:02:09PM -0700, Gus Correa wrote: >> Thank you all who responded: >> Larry Stewart, David Lombard, Franklin Jones, >> Douglas Guptill, Thomasz Rolla, Jim Lux. >> >> Glad to see that many of you had not only thought of this, >> but actually implemented simulators for many outstanding computers. >> >> I compiled SIMH. >> >> However, I suppose I need to load software to actually simulate >> each computer (Assembler? OS? other more specific sw?), correct? >> >> If yes, is this software available, and where? > > The 1130 sw is available from ibm1130.org > . > > I expect that type of software will usually only be available from > the people and projects that are interested in the simulated system. > >> Any documentation on how to run them? > > The 1130 sw is sufficiently documented to run it. I've run it--and > as expected--it *does* run faster than the original hw. > There was a project to emulate the YMP, but it looks fairly dead. Also, there's a project to emulate CDC6400 hardware. http://members.iinet.net.au/~tom-hunter/ It may not be the same as having your own emulated machine to play with, but you can play around with real iron at Cray Cyber. http://www.cray-cyber.org/access/index.php -- Geoffrey D. Jacobs From samuel at unimelb.edu.au Mon Aug 16 22:15:11 2010 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 17 Aug 2010 15:15:11 +1000 Subject: [Beowulf] Kernel action relevant to us In-Reply-To: <20100817015013.GA30904@bx9.net> References: <20091217020548.GC19867@bx9.net> <201008150913.51088.samuel@unimelb.edu.au> <20100817015013.GA30904@bx9.net> Message-ID: <4C6A1ADF.5030204@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 17/08/10 11:50, Greg Lindahl wrote: > It needs to be explicitly configured, if I understand it correctly. > It'd be nice if someone posted how to do that. Looks like it's done via this patch to iproute2: http://patchwork.ozlabs.org/patch/41224/ It adds a initrwnd option to set that initial receive window size. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxqGt4ACgkQO2KABBYQAh9Z4gCbBjkPWDJm1Z3PpIIFAlJj+oCc 89IAn26v7Cl74PhMpuga2fBnwRWtzVR7 =eeNM -----END PGP SIGNATURE----- From john.hearns at mclaren.com Tue Aug 17 04:59:33 2010 From: john.hearns at mclaren.com (Hearns, John) Date: Tue, 17 Aug 2010 12:59:33 +0100 Subject: [Beowulf] Snoracle and HPC - Register article Message-ID: <68A57CCFD4005646957BD2D18E60667B116D6637@milexchmb1.mil.tagmclarengroup.com> http://www.theregister.co.uk/2010/08/17/oracle_hpc/ John Hearns | CFD Hardware Specialist | McLaren Racing Limited McLaren Technology Centre, Chertsey Road, Woking, Surrey GU21 4YH, UK T: +44 (0) 1483 261000 D: +44 (0) 1483 262352 F: +44 (0) 1483 261010 E: john.hearns at mclaren.com W: www.mclaren.com The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. From kilian.cavalotti.work at gmail.com Tue Aug 17 05:34:14 2010 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Tue, 17 Aug 2010 14:34:14 +0200 Subject: [Beowulf] Snoracle and HPC - Register article In-Reply-To: <68A57CCFD4005646957BD2D18E60667B116D6637@milexchmb1.mil.tagmclarengroup.com> References: <68A57CCFD4005646957BD2D18E60667B116D6637@milexchmb1.mil.tagmclarengroup.com> Message-ID: On Tue, Aug 17, 2010 at 1:59 PM, Hearns, John wrote: > http://www.theregister.co.uk/2010/08/17/oracle_hpc/ On the same subject, some information from the inside, regarding SGE^wOGE. http://blogs.sun.com/templedf/entry/not_dead_yet Cheers, -- Kilian From john.hearns at mclaren.com Tue Aug 17 06:23:54 2010 From: john.hearns at mclaren.com (Hearns, John) Date: Tue, 17 Aug 2010 14:23:54 +0100 Subject: [Beowulf] Snoracle and HPC - Register article In-Reply-To: References: <68A57CCFD4005646957BD2D18E60667B116D6637@milexchmb1.mil.tagmclarengroup.com> Message-ID: <68A57CCFD4005646957BD2D18E60667B11761D32@milexchmb1.mil.tagmclarengroup.com> > -----Original Message----- > On the same subject, some information from the inside, regarding > SGE^wOGE. > http://blogs.sun.com/templedf/entry/not_dead_yet > I have a light bulb moment. Of course, if you are provisioning a cloud for general purpose computing you need to start and stop virtual machines on your real machines. A load-based scheduler system would do this job nicely! Especially one where you can create custom load sensors. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. From Glen.Beane at jax.org Tue Aug 17 07:01:41 2010 From: Glen.Beane at jax.org (Glen Beane) Date: Tue, 17 Aug 2010 10:01:41 -0400 Subject: [Beowulf] Snoracle and HPC - Register article In-Reply-To: <68A57CCFD4005646957BD2D18E60667B11761D32@milexchmb1.mil.tagmclarengroup.com> Message-ID: On 8/17/10 9:23 AM, "Hearns, John" wrote: > > >> -----Original Message----- >> On the same subject, some information from the inside, regarding >> SGE^wOGE. >> http://blogs.sun.com/templedf/entry/not_dead_yet >> > > I have a light bulb moment. Of course, if you are provisioning a cloud for > general purpose computing > you need to start and stop virtual machines on your real machines. A > load-based scheduler system would > do this job nicely! Especially one where you can create custom load sensors. Like the Moab scheduler? Most of their revenue now comes from this application rather than HPC job scheduling, and it has for a few years. -- Glen L. Beane Software Engineer The Jackson Laboratory Phone (207) 288-6153 From stuartb at 4gh.net Fri Aug 20 10:34:25 2010 From: stuartb at 4gh.net (Stuart Barkley) Date: Fri, 20 Aug 2010 13:34:25 -0400 (EDT) Subject: [Beowulf] Cluster Metrics? (Upper management view) Message-ID: What sort of business management level metrics do people measure on clusters? Upper management is asking for us to define and provide some sort of "numbers" which can be used to gage the success of our cluster project. We currently have both SGE and Torque/Moab in use and need to measure both if possible. I can think of some simple metrics (well sort-of, actual technical definition/measurement may be difficult): - 90/95th percentile wait time for jobs in various queues. Is smaller better meaning the jobs don't wait long and users are happy? Is larger better meaning that we have lots of demand and need more resources? - core-hours of user computation (per queue?) both as raw time and percentage of available time. Again, which is better (management view) higher or lower? - Availability during scheduled hours (ignoring scheduled maintenance times). Common metric, but how do people actually measure/compute this? What about down nodes? Some scheduled percentage (5%?) assumed down? - Number of new science projects performed. Vague, but our applications support people can just count things occasionally. Misses users who just use the system without interaction with us. Misses "production" work that just keeps running. Any comments or ideas are welcome. Thanks, Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From mdidomenico4 at gmail.com Fri Aug 20 11:26:51 2010 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Fri, 20 Aug 2010 14:26:51 -0400 Subject: [Beowulf] Cluster Metrics? (Upper management view) In-Reply-To: References: Message-ID: I think measuring a clusters success based on the number of jobs run or cpu's used is a bad measure of true success. I would be more inclined to consider a cluster a success by speaking with the people who use it and find out not only whether they can use it effectively and/or what new science having cluster is being enabled by them. then only thing i find most of the below metrics overly useful is figuring out whether or not we need a bigger cluster. which i guess is a form of measurable success, but not one in which i would consider the "cluster" to be a success. it could just be dopes running thousands of "/bin/hostname" jobs trying to figure out how to use the cluster I also think you need to ask the "business" people what measure they would consider a cluster as a worthwhile investment, it doesn't sound as if you have that from your email. On Fri, Aug 20, 2010 at 1:34 PM, Stuart Barkley wrote: > What sort of business management level metrics do people measure on > clusters? ?Upper management is asking for us to define and provide > some sort of "numbers" which can be used to gage the success of our > cluster project. > > We currently have both SGE and Torque/Moab in use and need to measure > both if possible. > > I can think of some simple metrics (well sort-of, actual technical > definition/measurement may be difficult): > > - 90/95th percentile wait time for jobs in various queues. ?Is smaller > better meaning the jobs don't wait long and users are happy? ?Is > larger better meaning that we have lots of demand and need more > resources? > > - core-hours of user computation (per queue?) both as raw time and > percentage of available time. ?Again, which is better (management > view) higher or lower? > > - Availability during scheduled hours (ignoring scheduled maintenance > times). ?Common metric, but how do people actually measure/compute > this? ?What about down nodes? ?Some scheduled percentage (5%?) assumed > down? > > - Number of new science projects performed. ?Vague, but our > applications support people can just count things occasionally. > Misses users who just use the system without interaction with us. > Misses "production" work that just keeps running. > > Any comments or ideas are welcome. > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?-- ?Daniel Boone > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From prentice at ias.edu Fri Aug 20 12:05:44 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Fri, 20 Aug 2010 15:05:44 -0400 Subject: [Beowulf] Cluster Metrics? (Upper management view) In-Reply-To: References: Message-ID: <4C6ED208.2010700@ias.edu> I couldn't have said it better myself. Be wary of suits asking for asking for numbers. Michael Di Domenico wrote: > I think measuring a clusters success based on the number of jobs run > or cpu's used is a bad measure of true success. I would be more > inclined to consider a cluster a success by speaking with the people > who use it and find out not only whether they can use it effectively > and/or what new science having cluster is being enabled by them. > > then only thing i find most of the below metrics overly useful is > figuring out whether or not we need a bigger cluster. which i guess > is a form of measurable success, but not one in which i would consider > the "cluster" to be a success. it could just be dopes running > thousands of "/bin/hostname" jobs trying to figure out how to use the > cluster > > I also think you need to ask the "business" people what measure they > would consider a cluster as a worthwhile investment, it doesn't sound > as if you have that from your email. > > > > On Fri, Aug 20, 2010 at 1:34 PM, Stuart Barkley wrote: >> What sort of business management level metrics do people measure on >> clusters? Upper management is asking for us to define and provide >> some sort of "numbers" which can be used to gage the success of our >> cluster project. >> >> We currently have both SGE and Torque/Moab in use and need to measure >> both if possible. >> >> I can think of some simple metrics (well sort-of, actual technical >> definition/measurement may be difficult): >> >> - 90/95th percentile wait time for jobs in various queues. Is smaller >> better meaning the jobs don't wait long and users are happy? Is >> larger better meaning that we have lots of demand and need more >> resources? >> >> - core-hours of user computation (per queue?) both as raw time and >> percentage of available time. Again, which is better (management >> view) higher or lower? >> >> - Availability during scheduled hours (ignoring scheduled maintenance >> times). Common metric, but how do people actually measure/compute >> this? What about down nodes? Some scheduled percentage (5%?) assumed >> down? >> >> - Number of new science projects performed. Vague, but our >> applications support people can just count things occasionally. >> Misses users who just use the system without interaction with us. >> Misses "production" work that just keeps running. >> >> Any comments or ideas are welcome. >> >> Thanks, >> Stuart Barkley >> -- >> I've never been lost; I was once bewildered for three days, but never lost! >> -- Daniel Boone >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ From reuti at staff.uni-marburg.de Fri Aug 20 12:21:14 2010 From: reuti at staff.uni-marburg.de (Reuti) Date: Fri, 20 Aug 2010 21:21:14 +0200 Subject: [Beowulf] Cluster Metrics? (Upper management view) In-Reply-To: References: Message-ID: Hi, Am 20.08.2010 um 19:34 schrieb Stuart Barkley: > What sort of business management level metrics do people measure on > clusters? Upper management is asking for us to define and provide > some sort of "numbers" which can be used to gage the success of our > cluster project. > > We currently have both SGE and Torque/Moab in use and need to measure > both if possible. > > I can think of some simple metrics (well sort-of, actual technical > definition/measurement may be difficult): > > - 90/95th percentile wait time for jobs in various queues. Is smaller > better meaning the jobs don't wait long and users are happy? Is > larger better meaning that we have lots of demand and need more > resources? > > - core-hours of user computation (per queue?) both as raw time and > percentage of available time. Again, which is better (management > view) higher or lower? > > - Availability during scheduled hours (ignoring scheduled maintenance > times). Common metric, but how do people actually measure/compute > this? What about down nodes? Some scheduled percentage (5%?) assumed > down? > > - Number of new science projects performed. Vague, but our > applications support people can just count things occasionally. I use the -A option in SGE (it's also in Torque) to fill this field with the type of application used for the job. For SGE it's just a comment and not taken into account for any share tree policy. This field is also recorded in the accounting file. For somes type of jobs we even record the used submission command for the job in the context of the job (`qsub -ac ...`). > Misses users who just use the system without interaction with us. With a JSV (job submission verifier) which fills -A automatically, maybeyou can find the people who are not interacting with you. > Misses "production" work that just keeps running. == It's not so straight forward to measure success, like already mentioned. You can have 75% CPU load because: - your parallel jobs are not really scaling with the number of slots used (it can be possible to run additonal serial jobs on these nodes with a nice of 19, just to gather the otherwise wasted CPU cycles for some types of parallel applications; when these "background" jobs are happy that they run slower because of the nice value) or 75% slot load: - you request resources like memory or for parallel jobs slots, these resource might become reserved and when there is no small job available for backfilling, they are just idling (it can be possible to run additonal serial jobs in a queue which gets suspended when the main queue gets actually used; when these "background" jobs are happy with the non-reserved resources) -- Reuti > Any comments or ideas are welcome. > > Thanks, > Stuart Barkley > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rpnabar at gmail.com Fri Aug 20 13:40:15 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 20 Aug 2010 15:40:15 -0500 Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? Message-ID: What's the typical protocol about the cleanup of /tmp folders? Do people clean them on each reboot or at intervals with a cron (sounds a bad idea). I was always under the impression that a reboot cleans them but apparantly not on my CentOS distro, by default. I was burnt earlier today when ompi-ps acted erratically and I diagnosed it to be caused by stale state information in the /tmp folder. The remnant of some old dead jobs that had somehow crashed. One other option that I've seen mentioned is mounting /tmp on a tmpfs. Is that a good idea? The risk of using up too much RAM if a program gets out of hand writing to /tmp. On the other hand compute-nodes can go a long time without any reboots; so a more frequent cleanup cycle on /tmp might be desirable? I suppose most programs ought to cleanup behind them on /tmp but then again there are bound to be bad apples. Any comments? -- Rahul From jlforrest at berkeley.edu Fri Aug 20 13:51:34 2010 From: jlforrest at berkeley.edu (Jon Forrest) Date: Fri, 20 Aug 2010 13:51:34 -0700 Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? In-Reply-To: References: Message-ID: <4C6EEAD6.7070209@berkeley.edu> On 8/20/2010 1:40 PM, Rahul Nabar wrote: > What's the typical protocol about the cleanup of /tmp folders? Do > people clean them on each reboot or at intervals with a cron (sounds a > bad idea). I was always under the impression that a reboot cleans them > but apparantly not on my CentOS distro, by default. I have a cronjob that removes anything in /tmp and in /scratch, which not everybody uses, that's older than a week old. I determined this 1 week old policy by asking my users what the maximum length of any of their jobs could be. For some people this might be longer, and for some it might be shorter. Also, since an open file isn't actually removed, you probably can't damage things if you guess wrong, although the file might not be visible in a directory listing. > One other option that I've seen mentioned is mounting /tmp on a tmpfs. > Is that a good idea? The risk of using up too much RAM if a program > gets out of hand writing to /tmp. Right. I don't think this is a good idea for scratch space for the reasons you mention. It does make sense for things like compilers and other programs that create very transient and small files. Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From reuti at staff.uni-marburg.de Fri Aug 20 14:40:35 2010 From: reuti at staff.uni-marburg.de (Reuti) Date: Fri, 20 Aug 2010 23:40:35 +0200 Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? In-Reply-To: References: Message-ID: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> Hi, Am 20.08.2010 um 22:40 schrieb Rahul Nabar: > What's the typical protocol about the cleanup of /tmp folders? Do > people clean them on each reboot or at intervals with a cron (sounds a > bad idea). I was always under the impression that a reboot cleans them > but apparantly not on my CentOS distro, by default. > > I was burnt earlier today when ompi-ps acted erratically and I > diagnosed it to be caused by stale state information in the /tmp > folder. The remnant of some old dead jobs that had somehow crashed. > > One other option that I've seen mentioned is mounting /tmp on a tmpfs. > Is that a good idea? The risk of using up too much RAM if a program > gets out of hand writing to /tmp. > > On the other hand compute-nodes can go a long time without any > reboots; so a more frequent cleanup cycle on /tmp might be desirable? > I suppose most programs ought to cleanup behind them on /tmp but then > again there are bound to be bad apples. are you using any queuing system? I try to get all applications set up in such a way, that they write all their stuff to $TMPDIR. It's in [OS]GE and I think also in Torque for some time now, to be created automatically (as job specific directory on a node) and removed after the job. A load sensor which checks the space on a node in /scratch and put the queue instance into alarm state, if it falls under a certain value, can in addition prevent a black hole in the cluster, where one after the other job crashes due to missing scratch space. -- Reuti > Any comments? > > -- > Rahul > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rpnabar at gmail.com Fri Aug 20 15:53:52 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 20 Aug 2010 17:53:52 -0500 Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? In-Reply-To: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> Message-ID: On Fri, Aug 20, 2010 at 4:40 PM, Reuti wrote: > Am 20.08.2010 um 22:40 schrieb Rahul Nabar: Thanks Jon and Reuti! > are you using any queuing system? I try to get all applications set up in such a way, that they write all their stuff to $TMPDIR. It's in [OS]GE and I think also in Torque for some time now, to be created automatically (as job specific directory on a node) and removed after the job. Yes. I'm using Torque. That's an interesting feature! I'll check it out. I guess one other option is to put an epilogue that does rm -fr /tmp/* Do you use a HDD temp or a tmpfs in RAM? > > A load sensor which checks the space on a node in /scratch and put the queue instance into alarm state, if it falls under a certain value, can in addition prevent a black hole in the cluster, where one after the other job crashes due to missing scratch space. > That seems neat too! But I am not sure if torque can do an alarm state on a queue too that way. -- Rahul From alscheinine at tuffmail.us Fri Aug 20 16:26:42 2010 From: alscheinine at tuffmail.us (Alan Louis Scheinine) Date: Fri, 20 Aug 2010 18:26:42 -0500 Subject: [Beowulf] Cluster Metrics? (Upper management view) In-Reply-To: References: Message-ID: <4C6F0F32.3050301@tuffmail.us> The measure of a cluster depends on how it is intended to be used. Big computer centers tend to measure the percentage of time the CPUs are running jobs. In contrast, if a cluster is used for program develop there must be idle CPUs in order to reduce that wait time, keeping in mind when people want computer power is not uniform. Wait time that delays program development can be costly. Another tendency is that labs like to buy small clusters so they can have something available when they need it even if CPUs are idle when they don't need it. What does your management define as success? Your situation is odd because they not only want statistics, they are leaving it up to you (you'all) to define what are the goals, it seems. What do project leaders and lower managers want to be the goals? What upper management wants might, in the end, reflect what lower level management (those whose groups use the cluster) want, and lower level managers might have time to talk with you'all. Alan Scheinine 200 Georgann Dr., Apt. E6 Vicksburg, MS 39180 Email: alscheinine at tuffmail.us Mobile phone: 225 288 4176 http://www.flickr.com/photos/ascheinine From mdidomenico4 at gmail.com Fri Aug 20 17:34:46 2010 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Fri, 20 Aug 2010 20:34:46 -0400 Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? In-Reply-To: References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> Message-ID: Redhat and the likes have a utility called 'tmpwatch' On Fri, Aug 20, 2010 at 6:53 PM, Rahul Nabar wrote: > On Fri, Aug 20, 2010 at 4:40 PM, Reuti wrote: >> Am 20.08.2010 um 22:40 schrieb Rahul Nabar: > > Thanks Jon and Reuti! > >> are you using any queuing system? I try to get all applications set up in such a way, that they write all their stuff to $TMPDIR. It's in [OS]GE and I think also in Torque for some time now, to be created automatically (as job specific directory on a node) and removed after the job. > > Yes. I'm using Torque. That's an interesting feature! I'll check it > out. I guess one other option is to put an epilogue that does rm -fr > /tmp/* > > Do you use a HDD temp or a tmpfs in RAM? > >> >> A load sensor which checks the space on a node in /scratch and put the queue instance into alarm state, if it falls under a certain value, can in addition prevent a black hole in the cluster, where one after the other job crashes due to missing scratch space. >> > > That seems neat too! But I am not sure if torque can do an alarm state > on a queue too that way. > > -- > Rahul > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From hahn at mcmaster.ca Fri Aug 20 20:32:30 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 20 Aug 2010 23:32:30 -0400 (EDT) Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? In-Reply-To: References: Message-ID: > What's the typical protocol about the cleanup of /tmp folders? Do we just leave the default (2 week) cron-driven tmpwatch in place. we don't have a lot of users who bother with /tmp, though, since we have reasonable lustre-based storage on all our big clusters. if we took the time to do it right, we'd probably make per-job subdirectories in /tmp, then the tree after some delay (few days). > people clean them on each reboot on reboot doesn't make sense to me - why should the user care whether we've rebooted a node? > or at intervals with a cron (sounds a bad idea). why? > One other option that I've seen mentioned is mounting /tmp on a tmpfs. > Is that a good idea? The risk of using up too much RAM if a program > gets out of hand writing to /tmp. well, tmpfs can be given a max size. > I suppose most programs ought to cleanup behind them on /tmp but then > again there are bound to be bad apples. to me, /tmp is for transient files: created during a job and normally not expected to live beyond the job. but providing a delay so users can grab files (say, logs after a job crash) is a little less BOFHish. for files of more than transient value (say, checkpoints, outputs) the user should write to another filesystem. we provide /home (but very small, and discouraged for IO, /work (Lustre, bigger), /scratch (Lustre, no quota, month or two expiry) and /tmp (disk, not managed other than 2-week expire.) I'm not really sure how well we get users to go with the purpose and tuning of these filesystems. we've never tried to do serious profiling of user IO (strace, I suppose, not sure how much overhead that would impose. a kernel module that hooked into VFS could be less intrusive.) (for context, we're an academic HPC consortium, 21 institutions, > 30 clusters, 3800 user accounts). From hahn at mcmaster.ca Fri Aug 20 23:06:28 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 21 Aug 2010 02:06:28 -0400 (EDT) Subject: [Beowulf] Cluster Metrics? (Upper management view) In-Reply-To: References: Message-ID: > I think measuring a clusters success based on the number of jobs run > or cpu's used is a bad measure of true success. I would be more > inclined to consider a cluster a success by speaking with the people > who use it and find out not only whether they can use it effectively > and/or what new science having cluster is being enabled by them. now try that with a large user-base ;) I think there are two broad categories of cluster: dedicated and shared. dedicated clusters are easy: limited number of codes, users, etc. straightforward metrics are appropriate, such as pend time (perhaps as a fraction of wallclock), job fail rates, fraction-of-peak measures. past this, things are harder and fuzzier. we try pretty hard to get research outcomes from our users (lit citations, grants, grad student and postdoc counts.) we try other metrics too: trying to find researchers who get and account, generate minimal usage, then stop ("frustrated"). for bigger, shared facilities, the simple metrics become less useful - for instance, pend:wallclock is meaningful as long as cluster contention doesn't "shape" user behavior. once users start reacting to contention (by submitting fewer jobs, or maybe more), the metric's spoiled. > then only thing i find most of the below metrics overly useful is > figuring out whether or not we need a bigger cluster. which i guess it's a little hard to imagine a case where metrics wouldn't call for a larger cluster - does anyone really have persistently underutilized clusters? > I also think you need to ask the "business" people what measure they > would consider a cluster as a worthwhile investment, it doesn't sound > as if you have that from your email. my guess is that suits should be talked to about opportunity cost, and not given a bunch of stats about utilization. that means you need to get some info from users about what they're doing. but also to figure out whether there's more they could do. and really, talking to the users is important to do anyway. >> clusters? ?Upper management is asking for us to define and provide >> some sort of "numbers" which can be used to gage the success of our >> cluster project. take a look at your cluster stats: do you have different groups with bursty activity, but which interleaves on the cluster? that's obviously better than multiple groups each having (probably smaller) clusters with lower utilization over time... >> - 90/95th percentile wait time for jobs in various queues. ?Is smaller >> better meaning the jobs don't wait long and users are happy? ?Is wait time is kind of tricky. if you have low wait, then either the cluster is underutilized, or it's magically rightsized (perhaps a perfectly steady, predictable workload). once you have contention, the question is why - is there a user who queues 10k jobs every monday? do users submit chained (dependent) jobs, where the second is counted as waiting. do you have fairshare turned on, or any kind of static limits or partitioning? >> - Availability during scheduled hours (ignoring scheduled maintenance >> times). ?Common metric, but how do people actually measure/compute >> this? ?What about down nodes? ?Some scheduled percentage (5%?) assumed >> down? I don't think it makes sense to obsess about this - yes, it's an easy number, but it doesn't tell you much from the user's perspective. From Bill.Rankin at sas.com Mon Aug 23 06:57:01 2010 From: Bill.Rankin at sas.com (Bill Rankin) Date: Mon, 23 Aug 2010 13:57:01 +0000 Subject: [Beowulf] Cluster Metrics? (Upper management view) In-Reply-To: References: Message-ID: <76097BB0C025054786EFAB631C4A2E3C0939C320@MERCMBX02D.na.SAS.com> Michael Di Domenico wrote: > I think measuring a clusters success based on the number of jobs run > or cpu's used is a bad measure of true success. I would be more > inclined to consider a cluster a success by speaking with the people > who use it and find out not only whether they can use it effectively > and/or what new science having cluster is being enabled by them. Bingo. In a former life I was director of an fairly large academic cluster facility. One of the things I always dreaded were writing up the monthly and quarterly (and annual) reports. These reports were basically used to justify our existence to the upper university administration. Here are some of the things I included: First section was a summary of the computational usage of the cluster, broken down by research group. In our case this data was dumped from the SGE report system. Include monthly usage and year to date. If you haven't already, break down your users into groups based upon research area and/or application. Use Unix group IDs to identify them and create access groups within SGE or whatever workload manager you are using. A lot of this can be scripted and run under cron. Put your pretty graphs in this section. :-) Include a summary/analysis section where you explain the data. Second section was an overview of any new research groups that had started using the cluster. In the annual report this section covered all the research groups that had used the cluster. Here is where you want to make the case that the cluster is an important part of your organization's research infrastructure. Include budget/grant amount for the new groups. List the PI's and their CVs as well as any new applications that you are supporting. Third section was a summary of any cluster administration issues. Include outages (past and future), hardware/software installs and updates and any other issues. Finally, the last section covered future growth. I included any meetings or presentations we had done, potentially new research groups we were talking to, and any new hardware or software we were procuring. The quarterly and annual reports were basically concatenations of the monthlies (literally cut-n-paste). The annual report also tended to include other things like budgets, but that was a separate process. As others have mentioned, the contents of the report really depend on your organization and what you are trying to show as well as your target audience. In my case, the reader list for the monthlies was fairly limited, with the quarterly and annual reports being more widely distributed. So the former tended to be short and terse while the latter were more detailed and complete. Last piece of advice - for the raw data, script as much as you can. You'll be doing this often so it's worth the investment to automate. Also do not do like I often did and leave all this until the last few days before it is due. Collect the information throughout the month and then it's just a matter of an afternoon's worth of editing rather than scrambling around to get, for example, all the PIs CV and grant information at the last moment. (1/2 :-) Good luck, -bill From stuartb at 4gh.net Mon Aug 23 07:39:04 2010 From: stuartb at 4gh.net (Stuart Barkley) Date: Mon, 23 Aug 2010 10:39:04 -0400 (EDT) Subject: [Beowulf] Cluster Metrics? (Upper management view) In-Reply-To: References: Message-ID: Thanks for the various comments. There are some good ideas suggested. To somewhat clarify: The management metric request is being made of all parts of the entire agency. They probably don't know (or care) what HPC really means. We are lucky that we do get to define the metrics to measure success. They don't want a whole lots of statistics, just a couple of "simple numbers". I'm just not sure how best to do that. I've more been thinking about the possible/useful metrics needed to manage the systems. These are large general purpose shared clusters. One is primarily for MPI (with infiniband) and the other for serial jobs of one node threaded jobs. Now that some results are being seen, other groups are starting to fund expansions. This will further complicate things since the groups doing the funding will want some guarantee of access (qos, dedicated nodes, enhanced fairshare) and reporting on their usage share. I'm working on that now and think most of it can be accomplished. This will be where the pretty graphs and quarterly reports will occur. Bill Rankin's advice sounds very helpful there. Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone From deadline at eadline.org Mon Aug 23 10:44:40 2010 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 23 Aug 2010 13:44:40 -0400 (EDT) Subject: [Beowulf] The end of Sun Grid Engine? In-Reply-To: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> Message-ID: <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> Is there someone on the list who is following the Oracle/Sun Grid Engine situation that can explain what the plan is? I have read that it is no longer freely available. http://insidehpc.com/2010/08/20/sun-gridengine-now-100-less-free/ Although previous version up until 6.2u5 are under the SISSL http://www.opensource.org/licenses/sisslpl.php -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From landman at scalableinformatics.com Mon Aug 23 11:11:47 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 23 Aug 2010 14:11:47 -0400 Subject: [Beowulf] The end of Sun Grid Engine? In-Reply-To: <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> Message-ID: <4C72B9E3.5080305@scalableinformatics.com> Douglas Eadline wrote: > Is there someone on the list who is following the Oracle/Sun Grid Engine > situation that can explain what the plan is? > > I have read that it is no longer freely available. > > http://insidehpc.com/2010/08/20/sun-gridengine-now-100-less-free/ > > Although previous version up until 6.2u5 are under the SISSL > > http://www.opensource.org/licenses/sisslpl.php Short version as I understand it 1) binaries are 60 day evaluation limited by license. You want different options you either a) pay Oracle (a reasonable thing to do, and no, I am not in their employ), or b) build your own. 2) Source is SISSL. This goes copyleft if things don't work out the way the "standards board" decides. Not sure what this means relative to patching. 3) Patches won't be publicly available before the paid support has them (again, this is reasonable ... you want the people paying for support to have first crack at them). The license change actually happened in 2009. Note: this hit the gridengine lists on Monday-ish, and I didn't notice until Thursday, when I started seeing talk of the fork. I am not sure the issues is as problematic as it seemed at first glance, it depends upon the license cost, terms, etc. for commercial support. Some on the list have noted that the libdrmaa isn't GPL, so you can't link it against GPL based code. I think (sadly) this might make the Perl DRMAA modules problematic. I'll go back and look. I think there may be issues with the Python modules as well. In short, ugh. Its a little bit of a mess, but largely because the community wasn't prepared for the changes. Some folks are re-evaluating their usage over this, some are taking a wait and see. I'd advise the latter. I don't think its the end of the world. Its SISSL, so in theory anyway, its open. You just can't admix it with other good open tools, the vast majority of which are decidedly non SISSL. At least not easily admixed. > > > > -- > Doug > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From bernard at vanhpc.org Mon Aug 23 11:19:22 2010 From: bernard at vanhpc.org (Bernard Li) Date: Mon, 23 Aug 2010 11:19:22 -0700 Subject: [Beowulf] The end of Sun Grid Engine? In-Reply-To: <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> Message-ID: Hi Doug: On Mon, Aug 23, 2010 at 10:44 AM, Douglas Eadline wrote: > Is there someone on the list who is following the Oracle/Sun Grid Engine > situation that can explain what the plan is? > > I have read that it is no longer freely available. > > ?http://insidehpc.com/2010/08/20/sun-gridengine-now-100-less-free/ > > Although previous version up until 6.2u5 are under the SISSL > > ?http://www.opensource.org/licenses/sisslpl.php See the following discussion at the GE mailing-list: http://gridengine.sunsource.net/ds/viewMessage.do?dsForumId=38&dsMessageId=276287 >From the gist of it, the "courtesy binaries" will continue to be free for use, but don't quote me on that :-) I'd follow that thread to see if there's any official word on it. Cheers, Bernard From gus at ldeo.columbia.edu Mon Aug 23 11:19:20 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Mon, 23 Aug 2010 14:19:20 -0400 Subject: [Beowulf] The end of Sun Grid Engine? In-Reply-To: <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> Message-ID: <4C72BBA8.4070908@ldeo.columbia.edu> Douglas Eadline wrote: > Is there someone on the list who is following the Oracle/Sun Grid Engine > situation that can explain what the plan is? > I have read that it is no longer freely available. > http://insidehpc.com/2010/08/20/sun-gridengine-now-100-less-free/ > Although previous version up until 6.2u5 are under the SISSL > http://www.opensource.org/licenses/sisslpl.php > -- > Doug Hi Doug, list There has been some discussion of these events in the Rocks mailing list. SGE (now christianized Open Grid Scheduler, SGO ?) is part of the Rocks standard distribution. See these threads: http://marc.info/?l=npaci-rocks-discussion&m=128223161802709&w=2 http://marc.info/?l=npaci-rocks-discussion&m=128222982532237&w=2 Exactly one year ago there was some apprehension that this would happen to Torque, but Cluster Resources / Adaptive Computing seems to have kept it open. See these threads on the Torque and Maui user lists: http://www.supercluster.org/pipermail/torqueusers/2009-August/009349.html http://www.supercluster.org/pipermail/mauiusers/2009-August/003936.html http://www.supercluster.org/pipermail/mauiusers/2009-August/003938.html Gus Correa (a Torque/PBS user) --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- From jlb17 at duke.edu Mon Aug 23 11:35:12 2010 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Mon, 23 Aug 2010 14:35:12 -0400 (EDT) Subject: [Beowulf] The end of Sun Grid Engine? In-Reply-To: <4C72B9E3.5080305@scalableinformatics.com> References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> <4C72B9E3.5080305@scalableinformatics.com> Message-ID: On Mon, 23 Aug 2010 at 2:11pm, Joe Landman wrote > Douglas Eadline wrote: >> Is there someone on the list who is following the Oracle/Sun Grid Engine >> situation that can explain what the plan is? >> >> I have read that it is no longer freely available. >> >> http://insidehpc.com/2010/08/20/sun-gridengine-now-100-less-free/ >> >> Although previous version up until 6.2u5 are under the SISSL >> >> http://www.opensource.org/licenses/sisslpl.php > > Short version as I understand it > > 1) binaries are 60 day evaluation limited by license. You want different > options you either a) pay Oracle (a reasonable thing to do, and no, I am not > in their employ), or b) build your own. To clarify a few minor points here (again, as *I* understand it), the eval period is actually 90 days and does *not* apply to the "courtesy binaries", which are available up to and including 6.2u5 and are missing a couple of modules. The concerns on the SGE list seem to have arisen because a) there are no courtesy binaries for 6.2u6, b) no post-6.2u5 patches have made it to the source repository, and c) there has been little to no word from Oracle as to future plans for source code access and courtesy binaries. > I am not sure the issues is as problematic as it seemed at first glance, it > depends upon the license cost, terms, etc. for commercial support. I think it's rather problematic for some users. As I mentioned on the SGE list, IME academic clusters keep their licensing costs as low as possible (PIs *hate* spending money on anything other than cores). Whatever Oracle is charging (and especially if it's based on core/socket/node count), I practically guarantee you it's more than most academic clusters would pay. > In short, ugh. Its a little bit of a mess, but largely because the community > wasn't prepared for the changes. Some folks are re-evaluating their usage > over this, some are taking a wait and see. I'd advise the latter. I don't > think its the end of the world. Its SISSL, so in theory anyway, its open. > You just can't admix it with other good open tools, the vast majority of > which are decidedly non SISSL. At least not easily admixed. I agree with the wait and see stance, although if the already-started "fork" can work out some of the bugs in 6.2u5 (as they seem highly motivated to do), it will probably gain traction pretty quickly (barring more open movements from Oracle). -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From samuel at unimelb.edu.au Mon Aug 23 23:51:14 2010 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Aug 2010 16:51:14 +1000 Subject: [Beowulf] The end of Sun Grid Engine? In-Reply-To: <4C72BBA8.4070908@ldeo.columbia.edu> References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> <50817.192.168.93.213.1282585480.squirrel@mail.eadline.org> <4C72BBA8.4070908@ldeo.columbia.edu> Message-ID: <4C736BE2.8090309@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 24/08/10 04:19, Gus Correa wrote: > Exactly one year ago there was some apprehension that > this would happen to Torque, but Cluster Resources / > Adaptive Computing seems to have kept it open. There is no copyright assignment for patches and contributions to Torque, plus Torque is itself a fork of OpenPBS so IMHO there's no way SC/CR/AC could relicense it without getting permission from all the copyright holders. cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxza+EACgkQO2KABBYQAh8gsgCeOR87XkqCSSbIvk2wGRL9/zkd Y7EAniUyF0sVmJ8XLQ20afhPMQM8/y+D =UT8B -----END PGP SIGNATURE----- From samuel at unimelb.edu.au Mon Aug 23 23:53:28 2010 From: samuel at unimelb.edu.au (Christopher Samuel) Date: Tue, 24 Aug 2010 16:53:28 +1000 Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? In-Reply-To: References: <853ADBCB-E4BD-41A6-AF78-8017DEE0B1EF@staff.uni-marburg.de> Message-ID: <4C736C68.30004@unimelb.edu.au> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 21/08/10 08:53, Rahul Nabar wrote: > Yes. I'm using Torque. That's an interesting feature! > I'll check it out. You want to check out the $tmpdir option for the pbs_mom configuration file. cheers! Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computational Initiative Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkxzbGgACgkQO2KABBYQAh9ywACdFd8KUm2Rq2m3z9xuFVFL8pkt a5QAnjTXZZHqKD30NUgW9CRdQ2dT04QK =EDQy -----END PGP SIGNATURE----- From rpnabar at gmail.com Tue Aug 24 10:17:21 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 24 Aug 2010 12:17:21 -0500 Subject: [Beowulf] typical protocol for cleanup of /tmp: on reboot? cron job? tmpfs? In-Reply-To: References: Message-ID: On Fri, Aug 20, 2010 at 10:32 PM, Mark Hahn wrote: >> or at intervals with a cron (sounds a bad idea). > > why? > Just because I thought if a long lifetime process had some useful data in /tmp I didn't want to inadvertently delete it. -- Rahul From rpnabar at gmail.com Tue Aug 31 07:51:20 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 31 Aug 2010 09:51:20 -0500 Subject: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script. Message-ID: My scheduler, Torque flags compute-nodes as "busy" when the load gets above a threshold "ideal load". My settings on 8-core compute nodes have this ideal_load set to 8 but I am wondering if this is appropriate or not? $max_load 9.0 $ideal_load 8.0 I do understand the"ideal load = # of cores" heuristic but in at least 30% of our jobs ( if not more ) I find the load average greater than 8. Sometimes even in the 9-10 range. But does this mean there is something wrong or do I take this to be the "happy" scenario for HPC: i.e. not only are all CPU's busy but the pipeline of processes waiting for their CPU slice is also relatively full. After all, a "under-loaded" HPC node is a waste of an expensive resource? On the other hand, if there truly were something wrong with a node[*] and I was to use a high load avearage as one of the signs of impending trouble what would be a good threshold? Above what load-average on a compute node do people get actually worried? It makes sense to set PBS's default "busy" warning to that limit instead of just "8". I'm ignoring the 5/10/15 min load average distinction. I'm assuming Torque is using the most appropriate one! *e.g. runaway process, infinite loop in user code, multiple jobs accidentally assigned to some node etc. -- Rahul From reuti at Staff.Uni-Marburg.DE Tue Aug 31 08:58:37 2010 From: reuti at Staff.Uni-Marburg.DE (Reuti) Date: Tue, 31 Aug 2010 17:58:37 +0200 Subject: [Beowulf] When is compute-node load-average "high" in the HPC context? Setting correct thresholds on a warning script. In-Reply-To: References: Message-ID: <29E4598B-4AFA-43ED-A5A8-B241CACCF217@staff.uni-marburg.de> Am 31.08.2010 um 16:51 schrieb Rahul Nabar: > My scheduler, Torque flags compute-nodes as "busy" when the load gets > above a threshold "ideal load". My settings on 8-core compute nodes > have this ideal_load set to 8 but I am wondering if this is > appropriate or not? > > $max_load 9.0 > $ideal_load 8.0 > > I do understand the"ideal load = # of cores" heuristic but in at least Yep. > 30% of our jobs ( if not more ) I find the load average greater than > 8. Sometimes even in the 9-10 range. But does this mean there is > something wrong or do I take this to be the "happy" scenario for HPC: > i.e. not only are all CPU's busy but the pipeline of processes waiting > for their CPU slice is also relatively full. After all, a > "under-loaded" HPC node is a waste of an expensive resource? With recent kernels also (kernel) processes in D state count as running. Hence the load appears higher than the running processes would imply when only these are added up. -- Reuti > On the other hand, if there truly were something wrong with a node[*] > and I was to use a high load avearage as one of the signs of > impending trouble what would be a good threshold? Above what > load-average on a compute node do people get actually worried? It > makes sense to set PBS's default "busy" warning to that limit instead > of just "8". > > I'm ignoring the 5/10/15 min load average distinction. I'm assuming > Torque is using the most appropriate one! > > *e.g. runaway process, infinite loop in user code, multiple jobs > accidentally assigned to some node etc. > > -- > Rahul > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf