From eagles051387 at gmail.com Tue Jul 1 00:26:15 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:23 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <6D1C4C9B-432F-4547-93F4-391B0847951D@xs4all.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <6D1C4C9B-432F-4547-93F4-391B0847951D@xs4all.nl> Message-ID: not sure if this applies to all kinds of senarios that clusters are used in but isnt the more ram you have the better? On 6/30/08, Vincent Diepeveen wrote: > > Toon, > > Can you drop a line on how important RAM is for weather forecasting in > latest type of calculations you're performing? > > Thanks, > Vincent > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > Jim Lux wrote: >> >> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's >>> scale of operations. We face this all the time with NASA work. >>> Semiconductor manufacturers have no real reason to produce special purpose >>> or customized versions of their products for space use, because they can >>> sell all they can make to the consumer market. More than once, I've had a >>> phone call along the lines of this: >>> "Jim: I'm interested in your new ABC321 part." >>> "Rep: Great. I'll just send the NDA over and we can talk about it." >>> "Jim: Great, you have my email and my fax # is..." >>> "Rep: By the way, what sort of volume are you going to be using?" >>> "Jim: Oh, 10-12.." >>> "Rep: thousand per week, excellent..." >>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every >>> year." >>> "Rep: Oh..." >>> {Well, to be fair, it's not that bad, they don't hang up on you.. >>> >> >> Since about a year, it's been clear to me that weather forecasting (i.e., >> running a more or less sophisticated atmospheric model to provide weather >> predictions) is going to be "mainstream" in the sense that every business >> that needs such forecasts for its operations can simply run them in-house. >> >> Case in point: I bought a $1100 HP box (the obvious target group being >> teenage downloaders) which performs the HIRLAM limited area model *on the >> grid that we used until October 2006* in December last year. >> >> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. >> >> I wonder what effect this will have on CPU developments ... >> >> -- >> Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290 >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >> At home: http://moene.indiv.nluug.nl/~toon/ >> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/3a4124c6/attachment.html From andrew at moonet.co.uk Tue Jul 1 01:20:56 2008 From: andrew at moonet.co.uk (andrew holway) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> Message-ID: Hi Jon, We have our own stack which we stick on top of the customers favourite red hat clone. Usually Scientific Linux. Here is a bit more about it. http://www.clustervision.com/products_os.php We sell as a standalone product and it does quite well. I could even go so far to say that it is 'stack of choice' in many European institutions. We have done a couple of M$ installations too. Ta Andy On Sat, Jun 28, 2008 at 12:09 PM, Jon Aquilina wrote: > congrats. just wondering what distro is being used on your clusters? > > On Thu, Jun 26, 2008 at 8:52 PM, Joe Landman > wrote: >> >> andrew holway wrote: >>> >>> http://www.clustervision.com/pr_top500_uk.php >> >> cool ... congratulations to ClusterVision! >> >> -- >> Joseph Landman, Ph.D >> Founder and CEO >> Scalable Informatics LLC, >> email: landman@scalableinformatics.com >> web : http://www.scalableinformatics.com >> http://jackrabbit.scalableinformatics.com >> phone: +1 734 786 8423 >> fax : +1 866 888 3112 >> cell : +1 734 612 4615 >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Jonathan Aquilina From Dan.Kidger at quadrics.com Tue Jul 1 01:42:59 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger@quadrics.com) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> >Hi Jon, >We have our own stack which we stick on top of the customers favourite >red hat clone. Usually Scientific Linux. > >Here is a bit more about it. > >http://www.clustervision.com/products_os.php > >We sell as a standalone product and it does quite well. I could even >go so far to say that it is 'stack of choice' in many European >institutions. Every throught of getting a job in Sales and Marketing? :-) Daniel. On Sat, Jun 28, 2008 at 12:09 PM, Jon Aquilina wrote: > congrats. just wondering what distro is being used on your clusters? > > On Thu, Jun 26, 2008 at 8:52 PM, Joe Landman > wrote: >> >> andrew holway wrote: >>> >>> http://www.clustervision.com/pr_top500_uk.php >> >> cool ... congratulations to ClusterVision! >> >> -- >> Joseph Landman, Ph.D >> Founder and CEO >> Scalable Informatics LLC, From Dan.Kidger at quadrics.com Tue Jul 1 01:46:14 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger@quadrics.com) Date: Fri Mar 19 01:07:24 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <1214864562.6912.29.camel@Vigor13> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <1214864562.6912.29.camel@Vigor13> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AD@quadbrsex1.quadrics.com> John is correct here. It is one thing to do long range climate prediction yourself using distributed computing and tweaking the stochastics based on a set of starting conditions, and another to try and work out if it will be sunny next Tuesday. Weather modelling is a different animal to CP- you need a supply of fresh input data - and a sophisticated infrastructure to harvest , collate, sanitise and feed these numbers into your computer model. Also with CP you typically run many instances concurrently which takes weeks/months to complete, but with WM, you have maybe 6 hours to run the whole job from start to finish which implies a closely coupled cluster. Daniel ------------------------------------------------------------- Dr. Daniel Kidger, Quadrics Ltd. daniel.kidger@quadrics.com One Bridewell St., Mobile: +44 (0)779 209 1851 Bristol, BS1 2AA, UK Office: +44 (0)117 915 5519 ----------------------- www.quadrics.com -------------------- -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of John Hearns Sent: 30 June 2008 23:23 To: beowulf@beowulf.org Subject: Re: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? On Mon, 2008-06-30 at 20:20 +0200, Toon Moene wrote: > > Since about a year, it's been clear to me that weather forecasting > (i.e., running a more or less sophisticated atmospheric model to provide > weather predictions) is going to be "mainstream" in the sense that every > business that needs such forecasts for its operations can simply run > them in-house. Garbage in, garbage out. By that I mean that the CPU horsepower may be more and more readily affordable for businesses like that - let's say it is an ice-cream wholesaler who would like to have a three day forecast to allow stocking of their outlets with ice cream. However, the models depend on input from sensor networks - not my area of expertise, but I should imagine manned and unmanned weather stations, ocean buoys to measure wave height, satellite sensors. Do we see such data sources being made freely available, and in real time (ie not archived data sets)?? _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eagles051387 at gmail.com Tue Jul 1 02:28:59 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: >We have our own stack which we stick on top of the customers favourite >red hat clone. Usually Scientific Linux. does it necessarily have to be a redhat clone. can it also be a debian based clone? On 7/1/08, Dan.Kidger@quadrics.com wrote: > > >Hi Jon, > > >We have our own stack which we stick on top of the customers favourite > >red hat clone. Usually Scientific Linux. > > > >Here is a bit more about it. > > > >http://www.clustervision.com/products_os.php > > > >We sell as a standalone product and it does quite well. I could even > >go so far to say that it is 'stack of choice' in many European > >institutions. > > Every throught of getting a job in Sales and Marketing? :-) > > > Daniel. > > > On Sat, Jun 28, 2008 at 12:09 PM, Jon Aquilina > wrote: > > congrats. just wondering what distro is being used on your clusters? > > > > On Thu, Jun 26, 2008 at 8:52 PM, Joe Landman > > wrote: > >> > >> andrew holway wrote: > >>> > >>> http://www.clustervision.com/pr_top500_uk.php > >> > >> cool ... congratulations to ClusterVision! > >> > >> -- > >> Joseph Landman, Ph.D > >> Founder and CEO > >> Scalable Informatics LLC, > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/5e20fcc3/attachment.html From henning.fehrmann at aei.mpg.de Tue Jul 1 02:36:43 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] automount on high ports Message-ID: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> Hello, we need to automount NFS directories on high ports to increase the number of possible mounts. Currently, we are limited up to ca 360 mounts. The NFS-server exports with the option 'insecure' but the mounts still end up on ports <1024 on the client side. Is there a way to enable automounts on higher ports? How can it be done manually: mount -t nfs -o ....? We are using autofs version 5. Thank you, Henning From steve_heaton at exemail.com.au Tue Jul 1 03:28:40 2008 From: steve_heaton at exemail.com.au (Particle Boy) Date: Fri Mar 19 01:07:24 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf], Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <200807010728.m617S3Uc011226@bluewest.scyld.com> References: <200807010728.m617S3Uc011226@bluewest.scyld.com> Message-ID: <486A06D8.2050705@exemail.com.au> Date: Mon, 30 Jun 2008 23:22:32 +0100 From: John Hearns > However, the models depend on input from sensor networks - not my area > of expertise, but I should imagine manned and unmanned weather >stations, >ocean buoys to measure wave height, satellite sensors. >Do we see such data sources being made freely available, and in real >time (ie not archived data sets)?? G'day John and all In a nutshell yes, you can can get sets of initial conditions from various agencies around the globe. The NCEP at NOAA is a great resource. SOO/STRC at UCAR packages WRF EMS with the pointers built right in for the various feeds :) Cheers Stevo From eagles051387 at gmail.com Tue Jul 1 03:38:52 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative Message-ID: does anyone know an altenative to openmosix?? would it be worth reviving the development of the kernel? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/430935c0/attachment.html From eagles051387 at gmail.com Tue Jul 1 03:39:48 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] software for compatible with a cluster Message-ID: does anyone know of any rendering software that will work with a cluster? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/45e77b3e/attachment.html From geoff at galitz.org Tue Jul 1 04:04:48 2008 From: geoff at galitz.org (Geoff Galitz) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: Message-ID: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> I know people who use Houdini for this: http://www.sidefx.com/index.php I cannot vouch for how well it works or what is involved, though. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jon Aquilina Sent: Dienstag, 1. Juli 2008 12:40 To: Beowulf Mailing List Subject: [Beowulf] software for compatible with a cluster does anyone know of any rendering software that will work with a cluster? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/0ef16522/attachment.html From eagles051387 at gmail.com Tue Jul 1 04:26:38 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> Message-ID: reason i am asking is because i would like to setup a rendering cluster and provide rendering services. does this also work for 3d animated movies that require rendering or does one need somethin entierly different for that? On 7/1/08, Geoff Galitz wrote: > > > > > > I know people who use Houdini for this: > > > > http://www.sidefx.com/index.php > > > > I cannot vouch for how well it works or what is involved, though. > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] *On > Behalf Of *Jon Aquilina > *Sent:* Dienstag, 1. Juli 2008 12:40 > *To:* Beowulf Mailing List > *Subject:* [Beowulf] software for compatible with a cluster > > > > does anyone know of any rendering software that will work with a cluster? > > -- > Jonathan Aquilina > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/4d618ee7/attachment.html From gerry.creager at tamu.edu Tue Jul 1 04:59:03 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Fri Mar 19 01:07:24 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48694BD5.5090303@moene.indiv.nluug.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <48693DCA.3010903@tamu.edu> <48694BD5.5090303@moene.indiv.nluug.nl> Message-ID: <486A1C07.9050208@tamu.edu> Toon Moene wrote: > Gerry Creager wrote: > >> I'm running WRF on ranger, the 580 TF Sun cluster at utexas.edu. I >> can complete the WRF single domain run, using 384 cores in ~30 min >> wall clock time. At the WRF Users Conference last week, the number of >> folks I talked to running WRF on workstations or "operationally" on >> 16-64 core clusters was impressive. I suspect a lot of desktop >> weather forecasting will, as you suggest, become the norm. The >> question, then, is: Are we looking at an enterprise where everyone >> with a gaming machine thinks they understand the model well enough to >> try predicting the weather, or are some still in awe of Lorenz' >> hypothesis about its complexity? > > This is where I think the pluses of the established meteorological > society will be: We know how to establish the quality of meteorological > models, how to compare them, how to dive into their parametrizations to > figure out the relevant differences and to solve the problems. > > Because we know this, we will be sought after. However, we will be > working inside the industry that needs this knowlegde, and outside > academia or institutionalized weather centres. This is already starting to happen. However, what I continue to see is managers wanting/expecting an absolute answer be generated numerically, and they're paying less attention to the modelers' concerns about the "goodness" of the model in certain settings. As an example, for our evening news programs, we've someone purporting to be a meteorologist. Over the last 10 years, the proportion of folks actually trained in meteorology has grown significantly, and talking to them one-on-one, they tend to recognize the limitations of the models they present. Yet, rather than saying the temperature tomorrow will be in a range from 93-98 deg F (with apologies to our brothers across the Pond) they're generally required to say, "96F" because their managers believe the public requires an absolute number. Perhaps, in some industries where statistical analysis is more integral, we'll see appropriate use of the data... gerry -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From gerry.creager at tamu.edu Tue Jul 1 05:13:47 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Fri Mar 19 01:07:24 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <1214864562.6912.29.camel@Vigor13> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <1214864562.6912.29.camel@Vigor13> Message-ID: <486A1F7B.9080408@tamu.edu> John Hearns wrote: > On Mon, 2008-06-30 at 20:20 +0200, Toon Moene wrote: > >> Since about a year, it's been clear to me that weather forecasting >> (i.e., running a more or less sophisticated atmospheric model to provide >> weather predictions) is going to be "mainstream" in the sense that every >> business that needs such forecasts for its operations can simply run >> them in-house. > > Garbage in, garbage out. > > By that I mean that the CPU horsepower may be more and more readily > affordable for businesses like that - let's say it is an ice-cream > wholesaler who would like to have a three day forecast to allow stocking > of their outlets with ice cream. > However, the models depend on input from sensor networks - not my area > of expertise, but I should imagine manned and unmanned weather stations, > ocean buoys to measure wave height, satellite sensors. > Do we see such data sources being made freely available, and in real > time (ie not archived data sets)?? In the US, at least for academic institutions and hobbyists, surface and upper air observations of the sort you describe are generally available for incorporation into models for data assimilation. Models are generally forced and bounded using model data from other atmospheric models, also available. As I understand it from colleagues in Europe, getting similar data over there is more problemmatical. > Hopefully on topic the Manchester Guardian newspaper (you all know me > now for a Guardian reader) is running a "Free Our Data" campaign - to > pressurise Government to make freely available GIS type data and census > data which the Government has. I'm personally unconvinced of the > overwhelming justification for (say) the Ordnance Survey to give all of > its mapping data away for free. > http://www.freeourdata.org.uk/ Last summer, in Paris, I had a discussion on this subject with the Ordinance Survey's chief cartographer. It is their intent to free the data save reasonable costs of reproduction/maintenance as soon as they can establish these. In the US, this is the norm. In Texas, where I live, there's a site with State basemap data, highly accurate roadway data, land-use/land-cover, census, etc. that's just an FTP call away, or, if you want to pay roughly $10 per DVD, they'll burn a copy for you (cost of personnel for reproduction of the DVD). Some states have deemed their data proprietary. A lot have locked their data down somewhat since 9/11, as our Department of Homeland Security has called for restricting access to Critical Infrastructure data. Note that the last listing of Critical Infrastructure for Texas listed some 268 pages of delineation, description and justification. I fear it's been updated/expanded since then. It included banks, cemeteries, schools, bridges, water and sewer plants, shopping malls, high-traffic motor-ways, refrigerated facilities, supermarkets, gas stations, bridges, power transformer and generation sites, power transmission lines, petroleum pipelines, and gas stations, to name a few. There was discussion of adding individual residences to the list. As you can see, restricting access to "critical infrastructure" could result in a blank map. -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From m.janssens at opencfd.co.uk Tue Jul 1 05:48:39 2008 From: m.janssens at opencfd.co.uk (Mattijs Janssens) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <200807011348.39343.m.janssens@opencfd.co.uk> On Tuesday 01 July 2008 11:38, Jon Aquilina wrote: > does anyone know an altenative to openmosix?? would it be worth reviving > the development of the kernel? maybe http://www.kerrighed.org (and that is all I know about it) Regards, Mattijs From geoff at galitz.org Tue Jul 1 05:50:33 2008 From: geoff at galitz.org (Geoff Galitz) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: It seems that much of the effort that was going into openMOSIX is now going into KVM. http://kvm.qumranet.com/kvmwiki I think the idea is that MOSIX functionality is more easily developed and deployed in the form of virtual machines than directly at the kernel level. There are some trade-offs, of course... more overhead being chief among them but the virtualization model is clearly the overall favorite. It sure does beat the heck out of having to track each kernel individually. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jon Aquilina Sent: Dienstag, 1. Juli 2008 12:39 To: Beowulf Mailing List Subject: [Beowulf] open mosix alternative does anyone know an altenative to openmosix?? would it be worth reviving the development of the kernel? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/3427056d/attachment.html From mark.kosmowski at gmail.com Tue Jul 1 05:51:54 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: <200807010728.m617S3Ub011226@bluewest.scyld.com> References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: At some point there a cost-benefit analysis needs to be performed. If my cluster at peak usage only uses 4 Gb RAM per CPU (I live in single-core land still and do not yet differentiate between CPU and core) and my nodes all have 16 Gb per CPU then I am wasting RAM resources and would be better off buying new machines and physically transferring the RAM to and from them or running more jobs each distributed across fewer CPUs. Or saving on my electricity bill and powering down some nodes. As heretical as this last sounds, I'm tempted to throw in the towel on my PhD studies because I can no longer afford the power to run my three node cluster at home. Energy costs may end up being the straw that breaks this camel's back. Mark E. Kosmowski > From: "Jon Aquilina" > > not sure if this applies to all kinds of senarios that clusters are used in > but isnt the more ram you have the better? > > On 6/30/08, Vincent Diepeveen wrote: > > > > Toon, > > > > Can you drop a line on how important RAM is for weather forecasting in > > latest type of calculations you're performing? > > > > Thanks, > > Vincent > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > Jim Lux wrote: > >> > >> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's > >>> scale of operations. We face this all the time with NASA work. > >>> Semiconductor manufacturers have no real reason to produce special purpose > >>> or customized versions of their products for space use, because they can > >>> sell all they can make to the consumer market. More than once, I've had a > >>> phone call along the lines of this: > >>> "Jim: I'm interested in your new ABC321 part." > >>> "Rep: Great. I'll just send the NDA over and we can talk about it." > >>> "Jim: Great, you have my email and my fax # is..." > >>> "Rep: By the way, what sort of volume are you going to be using?" > >>> "Jim: Oh, 10-12.." > >>> "Rep: thousand per week, excellent..." > >>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every > >>> year." > >>> "Rep: Oh..." > >>> {Well, to be fair, it's not that bad, they don't hang up on you.. > >>> > >> > >> Since about a year, it's been clear to me that weather forecasting (i.e., > >> running a more or less sophisticated atmospheric model to provide weather > >> predictions) is going to be "mainstream" in the sense that every business > >> that needs such forecasts for its operations can simply run them in-house. > >> > >> Case in point: I bought a $1100 HP box (the obvious target group being > >> teenage downloaders) which performs the HIRLAM limited area model *on the > >> grid that we used until October 2006* in December last year. > >> > >> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. > >> > >> I wonder what effect this will have on CPU developments ... > >> > >> -- > >> Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290 > >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > >> At home: http://moene.indiv.nluug.nl/~toon/ > >> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > >> > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > -- > Jonathan Aquilina From mark.kosmowski at gmail.com Tue Jul 1 05:53:35 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Fri Mar 19 01:07:24 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Message-ID: And I forgot to change the subject. Apologies. On 7/1/08, Mark Kosmowski wrote: > At some point there a cost-benefit analysis needs to be performed. If > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > single-core land still and do not yet differentiate between CPU and > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > resources and would be better off buying new machines and physically > transferring the RAM to and from them or running more jobs each > distributed across fewer CPUs. Or saving on my electricity bill and > powering down some nodes. > > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. > > Mark E. Kosmowski > > > From: "Jon Aquilina" > > > > > not sure if this applies to all kinds of senarios that clusters are used in > > but isnt the more ram you have the better? > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > Toon, > > > > > > Can you drop a line on how important RAM is for weather forecasting in > > > latest type of calculations you're performing? > > > > > > Thanks, > > > Vincent > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > Jim Lux wrote: > > >> > > >> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's > > >>> scale of operations. We face this all the time with NASA work. > > >>> Semiconductor manufacturers have no real reason to produce special purpose > > >>> or customized versions of their products for space use, because they can > > >>> sell all they can make to the consumer market. More than once, I've had a > > >>> phone call along the lines of this: > > >>> "Jim: I'm interested in your new ABC321 part." > > >>> "Rep: Great. I'll just send the NDA over and we can talk about it." > > >>> "Jim: Great, you have my email and my fax # is..." > > >>> "Rep: By the way, what sort of volume are you going to be using?" > > >>> "Jim: Oh, 10-12.." > > >>> "Rep: thousand per week, excellent..." > > >>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every > > >>> year." > > >>> "Rep: Oh..." > > >>> {Well, to be fair, it's not that bad, they don't hang up on you.. > > >>> > > >> > > >> Since about a year, it's been clear to me that weather forecasting (i.e., > > >> running a more or less sophisticated atmospheric model to provide weather > > >> predictions) is going to be "mainstream" in the sense that every business > > >> that needs such forecasts for its operations can simply run them in-house. > > >> > > >> Case in point: I bought a $1100 HP box (the obvious target group being > > >> teenage downloaders) which performs the HIRLAM limited area model *on the > > >> grid that we used until October 2006* in December last year. > > >> > > >> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. > > >> > > >> I wonder what effect this will have on CPU developments ... > > >> > > >> -- > > >> Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290 > > >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > >> At home: http://moene.indiv.nluug.nl/~toon/ > > >> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > >> > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > -- > > Jonathan Aquilina > From geoff at galitz.org Tue Jul 1 05:54:26 2008 From: geoff at galitz.org (Geoff Galitz) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> Message-ID: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> That is out of my field of expertise. Sounds like a question for professional digital artists. I can put you in touch some folks that most likely know the answer to your questions, if you like. Anybody know of any current approaches to this? Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: Jon Aquilina [mailto:eagles051387@gmail.com] Sent: Dienstag, 1. Juli 2008 13:27 To: Geoff Galitz Cc: Beowulf Mailing List Subject: Re: [Beowulf] software for compatible with a cluster reason i am asking is because i would like to setup a rendering cluster and provide rendering services. does this also work for 3d animated movies that require rendering or does one need somethin entierly different for that? On 7/1/08, Geoff Galitz wrote: I know people who use Houdini for this: http://www.sidefx.com/index.php I cannot vouch for how well it works or what is involved, though. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jon Aquilina Sent: Dienstag, 1. Juli 2008 12:40 To: Beowulf Mailing List Subject: [Beowulf] software for compatible with a cluster does anyone know of any rendering software that will work with a cluster? -- Jonathan Aquilina -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/8044b6fb/attachment.html From ajt at rri.sari.ac.uk Tue Jul 1 06:14:38 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <486A2DBE.10302@rri.sari.ac.uk> Jon Aquilina wrote: > does anyone know an altenative to openmosix?? would it be worth reviving > the development of the kernel? Hello, Jonathan. I'm still running openMosix (linux-2.4.26-om1) and I did have an attempt at porting it to the 2.4.32 kernel so I could use SATA disks, but I couldn't get process migration to work. My deb's for rebuilding the openMosix kernel under Ubuntu 6.06.1 LTS are at: http://bioinformatics.rri.sari.ac.uk/openmosix We are currently evaluating Kerrighed as an alternative: http://www.kerrighed.org Kerrighed also forms the basis of 'XtreemOS': http://www.xtreemos.eu/ Although Kerrighed looks very promising, it is also quite fragile in our hands. If one node crashes, you lose the entire cluster. That said, the Kerrighed project is extremely well supported and I believe it will be a good alternative in the near future. We will continue to run openMosix in the short-term, but I may evaluate MOSIX2: http://www.mosix.org/ I was, previously, opposed to Mosix on idealogical grounds and loyal to Moshe Bar but to be fair to Mosix is now free for non-profit use and the source code is available (but not GPL). Please let me know if you are seriously considering reviving openMosix! Tony. -- Dr. A.J.Travis, | mailto:ajt@rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From landman at scalableinformatics.com Tue Jul 1 07:00:06 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: <486A3866.7030302@scalableinformatics.com> Mark Kosmowski wrote: > At some point there a cost-benefit analysis needs to be performed. If > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > single-core land still and do not yet differentiate between CPU and > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > resources and would be better off buying new machines and physically > transferring the RAM to and from them or running more jobs each > distributed across fewer CPUs. Or saving on my electricity bill and > powering down some nodes. Possible, though if you do heavy IO even with single core chips, and you are running a 64 bit OS, the extra buffer cache is not to be rejected lightly. > > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. Which country are you in? You may be able to apply for "free" computing resources. Tera-grid in the US, other similar resources. Mark Hahn might give you pointers for Canada, and the folks at Streamline/Clustervision/... might be able to give you pointers for UK/EU. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From eagles051387 at gmail.com Tue Jul 1 07:18:50 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> Message-ID: that would be greatly appreciated On 7/1/08, Geoff Galitz wrote: > > > > That is out of my field of expertise. Sounds like a question for > professional digital artists. I can put you in touch some folks that most > likely know the answer to your questions, if you like. > > > > Anybody know of any current approaches to this? > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* Jon Aquilina [mailto:eagles051387@gmail.com] > *Sent:* Dienstag, 1. Juli 2008 13:27 > *To:* Geoff Galitz > *Cc:* Beowulf Mailing List > *Subject:* Re: [Beowulf] software for compatible with a cluster > > > > reason i am asking is because i would like to setup a rendering cluster and > provide rendering services. does this also work for 3d animated movies that > require rendering or does one need somethin entierly different for that? > > On 7/1/08, *Geoff Galitz* wrote: > > > > > > I know people who use Houdini for this: > > > > http://www.sidefx.com/index.php > > > > I cannot vouch for how well it works or what is involved, though. > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] *On > Behalf Of *Jon Aquilina > *Sent:* Dienstag, 1. Juli 2008 12:40 > *To:* Beowulf Mailing List > *Subject:* [Beowulf] software for compatible with a cluster > > > > does anyone know of any rendering software that will work with a cluster? > > -- > Jonathan Aquilina > > > > > -- > Jonathan Aquilina > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/00aa8d76/attachment.html From vanallsburg at hope.edu Tue Jul 1 07:43:34 2008 From: vanallsburg at hope.edu (Paul Van Allsburg) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> Message-ID: <486A4296.4050501@hope.edu> I'd like to do the same, as a project for a group of students... Please keep me in the loop? Thanks! Paul -- Paul Van Allsburg Computational Science & Modeling Facilitator Natural Sciences Division, Hope College 35 East 12th Street Holland, Michigan 49423 616-395-7292 http://www.hope.edu/academic/csm/ Jon Aquilina wrote: > that would be greatly appreciated > > On 7/1/08, *Geoff Galitz* > > wrote: > > > > That is out of my field of expertise. Sounds like a question for > professional digital artists. I can put you in touch some folks > that most likely know the answer to your questions, if you like. > > > > Anybody know of any current approaches to this? > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > > * From: * Jon Aquilina [mailto:eagles051387@gmail.com > ] > *Sent:* Dienstag, 1. Juli 2008 13:27 > *To:* Geoff Galitz > *Cc:* Beowulf Mailing List > *Subject:* Re: [Beowulf] software for compatible with a cluster > > > > reason i am asking is because i would like to setup a rendering > cluster and provide rendering services. does this also work for 3d > animated movies that require rendering or does one need somethin > entierly different for that? > > On 7/1/08, *Geoff Galitz* > wrote: > > > > > > I know people who use Houdini for this: > > > > http://www.sidefx.com/index.php > > > > I cannot vouch for how well it works or what is involved, though. > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > > * From: * beowulf-bounces@beowulf.org > > [mailto:beowulf-bounces@beowulf.org > ] *On Behalf Of *Jon Aquilina > *Sent:* Dienstag, 1. Juli 2008 12:40 > *To:* Beowulf Mailing List > *Subject:* [Beowulf] software for compatible with a cluster > > > > does anyone know of any rendering software that will work with a > cluster? > > -- > Jonathan Aquilina > > > > > -- > Jonathan Aquilina > > > > > -- > Jonathan Aquilina From perry at piermont.com Tue Jul 1 07:44:48 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:24 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48694432.4020608@scalableinformatics.com> (Joe Landman's message of "Mon\, 30 Jun 2008 16\:38\:10 -0400") References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <48693DCA.3010903@tamu.edu> <48694432.4020608@scalableinformatics.com> Message-ID: <87d4lxk6jj.fsf@snark.cb.piermont.com> Joe Landman writes: > I see a curious phenomenon going on in crash simulation and NVH. We > see an increasing "decoupling" if you will, between the detailed > issues of simulation and coding, and the end user using the simulation > system. That is, the users may know the engineering side, but don't > seem to grasp the finer aspects of the simulation ... what to take as > reasonably accurate, and what to grasp might not be. > > I don't see this in chemistry, in large part due to many of the users > also writing their own software. On the contrary. I know computational chemistry specialists who worry about users of the common commercial software (Gaussian, Jaguar, etc.) not knowing what to believe and what not to believe in the output. Since I've seen people in synthetic organic labs running the simulation software to design possible synthetic pathways without understanding the software, I think this worry is perfectly valid. The overwhelming majority of users are not computational chemists at all -- they're ordinary organic chemists, and they don't have a good gut feel for what the limitations of the tools are. I know of very few users of computational chemistry software who roll their own. Try reading the computational chemistry mailing lists for a little while, or reading the journals, and you'll get a feel for what the average user is like. There might be a lot of people writing software out there, but there are vastly more who just want to get answers and don't understand how the programs work at all. Perry -- Perry E. Metzger perry@piermont.com From perry at piermont.com Tue Jul 1 07:53:06 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> (Henning Fehrmann's message of "Tue\, 1 Jul 2008 11\:36\:43 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> Message-ID: <878wwlk65p.fsf@snark.cb.piermont.com> Henning Fehrmann writes: > we need to automount NFS directories on high ports to increase the > number of possible mounts. Currently, we are limited up to ca 360 mounts. A TCP socket is a 4-tuple of localhost:localport:remotehost:remoteport A given localhost:localport pair can speak to an unlimted array of remotehost:remoteport sets. For example, in theory, your SMTP port can get connections from up to 2^32 different hosts on each of 2^16 different sockets from each, for a total space of 2^48 connections to a single local socket number. This in no way restricts how many connections can come in to another port, either, because a given socket is again the full 4-tuple -- if you have an SSH port, it too can get 2^48 connections. Now, there is this (odd) convention that only root can open a socket below 1024, so hosts "trust" (what a bad idea) sockets under that number. You can still, however, get up to 1023 connections from any given remote host to a given local host's port. Thus, your problem sounds rather odd. There is no obvious reason you should be limited to 360 connections. Perhaps your problem is not what you think it is at all. Could you explain it in more detail? -- Perry E. Metzger perry@piermont.com From ajt at rri.sari.ac.uk Tue Jul 1 08:31:48 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <486A4DE4.1090807@rri.sari.ac.uk> Geoff Galitz wrote: > [...] > I think the idea is that MOSIX functionality is more easily developed > and deployed in the form of virtual machines than directly at the kernel > level. There are some trade-offs, of course... more overhead being > chief among them but the virtualization model is clearly the overall > favorite. It sure does beat the heck out of having to track each kernel > individually. Hello, Geoff. MOSIX functionality is mainly about load-balancing between independent kernels, and avoiding severe memory depletion by migrating processes between kernels. In fact (open)MOSIX implements an SMP-like model, but with a high-latency interconect (usually GBit ethernet). There is no need to 'track' kernels, because the oM HPC extension does it for you. The principle objective of SSI computing is to use many small machines as if they are one big one. This is the opposite of virtualisation which uses one (or a few) BIG machines like a lot of small ones. It does this by virtually separating the kernels. There is some confusion about this because it *is* very convenient to teach about or develop and test SSI software on virtual compute nodes if you don't have a lot of real nodes, but it defeats the purpose of SSI to use this approach in production. You might be interested to know that one reason Moshe Bar gave when he announced the end of the openMosix project was that SMP is now so cheap that SSI clustering less of a factor in computing: http://sourceforge.net/forum/forum.php?forum_id=715406 I'm not sure I agree - I still find openMosix useful, and I'll continue using it on our Beowulf here until I find a better alternative. Tony. -- Dr. A.J.Travis, | mailto:ajt@rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From tjrc at sanger.ac.uk Tue Jul 1 08:40:27 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <878wwlk65p.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> Message-ID: On 1 Jul 2008, at 3:53 pm, Perry E. Metzger wrote: > > Henning Fehrmann writes: >> we need to automount NFS directories on high ports to increase the >> number of possible mounts. Currently, we are limited up to ca 360 >> mounts. > > A TCP socket is a 4-tuple of localhost:localport:remotehost:remoteport > > A given localhost:localport pair can speak to an unlimted array of > remotehost:remoteport sets. For example, in theory, your SMTP port can > get connections from up to 2^32 different hosts on each of 2^16 > different sockets from each, for a total space of 2^48 connections to > a single local socket number. This in no way restricts how many > connections can come in to another port, either, because a given > socket is again the full 4-tuple -- if you have an SSH port, it too > can get 2^48 connections. > > Now, there is this (odd) convention that only root can open a socket > below 1024, so hosts "trust" (what a bad idea) sockets under that > number. You can still, however, get up to 1023 connections from any > given remote host to a given local host's port. > > Thus, your problem sounds rather odd. There is no obvious reason you > should be limited to 360 connections. > > Perhaps your problem is not what you think it is at all. Could you > explain it in more detail? Certainly on my systems where I use the am-utils automounter, I find the limit on the number of simultaneously mounted filesystems is more in the region of 1500. I've been desperately trying to reduce the number of NFS filesystems we have though. Currently our automount map has about 600 entries, I think. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From perry at piermont.com Tue Jul 1 08:48:47 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] automount on high ports In-Reply-To: (Tim Cutts's message of "Tue\, 1 Jul 2008 16\:40\:27 +0100") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> Message-ID: <87od5hip0g.fsf@snark.cb.piermont.com> Tim Cutts writes: > Certainly on my systems where I use the am-utils automounter, I find > the limit on the number of simultaneously mounted filesystems is more > in the region of 1500. And that's doubtless not from TCP port issues but because of other kinds of resources being limited. > I've been desperately trying to reduce the number of NFS filesystems > we have though. Currently our automount map has about 600 entries, > I think. Sometimes that's reasonable. I've seen large sites where everyone has a workstation in front of them and all of the thousands of users get their home dir automounted when they sit in front of a box and log in. However, one notes that in such a situation, the automount maps have thousands or tens of thousands of entries, but any given machine generally only is mounting a few file systems. -- Perry E. Metzger perry@piermont.com From kilian at stanford.edu Tue Jul 1 08:49:42 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <200807010849.42415.kilian@stanford.edu> Hi Jon, On Tuesday 01 July 2008 03:38:52 am Jon Aquilina wrote: > does anyone know an altenative to openmosix?? You may want to check out OpenSSI: http://www.openssi.org As its name says, that's a SSI clustering solution, with unified process namespace, full process migration, load-balancing, single root filesystem, etc. A complete list of features is available at: http://wiki.openssi.org/go/Features Cheers, -- Kilian From henning.fehrmann at aei.mpg.de Tue Jul 1 09:47:47 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <878wwlk65p.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> Message-ID: <20080701164747.GA15901@gretchen.aei.uni-hannover.de> On Tue, Jul 01, 2008 at 10:53:06AM -0400, Perry E. Metzger wrote: > > Henning Fehrmann writes: > > we need to automount NFS directories on high ports to increase the > > number of possible mounts. Currently, we are limited up to ca 360 mounts. > > > Thus, your problem sounds rather odd. There is no obvious reason you > should be limited to 360 connections. > > Perhaps your problem is not what you think it is at all. Could you > explain it in more detail? I guess it has also something to do with the automounter. I am not able to increase this number. But even if the automounter would handle more we need to be able to use higher ports: netstat shows always ports below 1024. tcp 0 0 client:941 server:nfs We need to mount up to 1400 nfs exports. Cheers Henning From hahn at mcmaster.ca Tue Jul 1 09:51:32 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: >> We have our own stack which we stick on top of the customers favourite >> red hat clone. Usually Scientific Linux. > > does it necessarily have to be a redhat clone. can it also be a debian based > clone? but why? is there some concrete advantage to using Debian? I've never understood why Debian users tend to be very True Believer, or what it is that hooks them. From prentice at ias.edu Tue Jul 1 10:20:32 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: <486A6760.5010006@ias.edu> Mark Hahn wrote: >>> We have our own stack which we stick on top of the customers favourite >>> red hat clone. Usually Scientific Linux. >> >> does it necessarily have to be a redhat clone. can it also be a debian >> based >> clone? > > but why? is there some concrete advantage to using Debian? > I've never understood why Debian users tend to be very True Believer, > or what it is that hooks them. And the Debian users can say the same thing about Red Hat users. Or SUSE users. And if any still exist, the Slackware users could say the same thing about the both of them. But then the Slackware users could also point out that the first Linux distro was Slackware, so they are using the one true Linux distro... If you want to have a religious war about which distro to use, go somewhere else. I'm sure there are plenty of mailing lists and newsgroups where I'm sure that happens every day. This is a mailing list about beowulf clusters, and the last time I checked, you can create clusters using any Linux distribution you like, or even non-Linux operating systems, such as IRIX, Solaris, etc. -- Prentice From landman at scalableinformatics.com Tue Jul 1 10:46:01 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: <486A6760.5010006@ias.edu> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <486A6D59.7020704@scalableinformatics.com> Prentice Bisbal wrote: > Mark Hahn wrote: [...] > If you want to have a religious war about which distro to use, go > somewhere else. I'm sure there are plenty of mailing lists and > newsgroups where I'm sure that happens every day. Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. Some distros use good kernels, some do not. I won't mention who I think is in the latter category. FWIW: we tend to build systems and place our own kernel on them. Basically we want them to work, and not be surprised by bad things, like crashes due to 4k stacks or backported (mis)features. We also want them to have updated drivers, and NFS/file system bits. > This is a mailing list about beowulf clusters, and the last time I > checked, you can create clusters using any Linux distribution you like, > or even non-Linux operating systems, such as IRIX, Solaris, etc. With all due respect, I think Mark knows what this list is about. There are lots of folks out there using Fedora, RHEL, Ubuntu, Debian, SuSE, ... We generally don't care which distro is used. Only that the kernel is reasonable, stable under load, and supports updated file systems/network capability. Beowulf depends upon good kernels at the end of the day. You need high performance and stability throughout. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From thpierce at gmail.com Tue Jul 1 05:07:22 2008 From: thpierce at gmail.com (Tom Pierce) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] June New York/Jersey HPC users meeting Message-ID: <25e9e5ad0807010507s74ea33e7p42abeff3d275b5a2@mail.gmail.com> Dear Dan, First, you missed a enjoyable meeting with lively discussion, good pub food and beer. I hope we meet there again in July. I attended most of the meeting. My memory summarized it: Sun Grid Engine users were the majority at the meeting ( 60% SGE users, and 40% Torqur/Maui users) The installations of the two systems are different experiences. With SGE, you are about "half-done" after you install the system. The installation of Torque/Maui is more functional right out of the box. Both seem to have similar functionality when setup. SGE has Sun developers actively working on it, so the newest versions have more options. eg a Flexlm link for license management. Torque/Maui is open source, and has not been modified as often as SGE has. Altho cpusets, similar to SGE cpusets, have recently been added. Torque/Maui has commercial upgrades to Torque/Moab for large sites, or people who want paid support. (and Moab supports Flexlm license management). There seem to be more installations of Torque/Maui than there are of SGE, but that was just a discussion of perceptions. However, the history of PBS, up through Torque, means that there are a great many PBS scripts on the internet for job submissions of HPC applications. The discussion of MPI interfaces was ongoing. Neither system seemed to have an advantage. Torque has the OSC mpiexec script and SGE has some builtin hooks for MPI. The discussions mentioned Openmpi, LAM, MPICH, GM and no obvious resolution that one system was more functional or easier than the other for MPI codes. At the end, I would call it a "draw". Torque/Maui easier to setup and lots of examples vs SGE flexibility and Flexlm license mgt. Tom Daniel.Roberts@sanofi-aventis.com wrote: Anyone have minutes or conclusions to offer from this scheduler smack down? Thanks Dan -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] If you live or work in the New York/North Jersey Metropolitan area, mark your calender for this Thursday, June 19th. The NYCA-HUG (New York City Area HPC Users Group) will be trying to answer the ultimate question Torque or Sun Grid Engine? We will be discussing the pros/cons of each scheduler for HPC clusters. Come and add your experiences, wants, and rants. Then you decide. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/768b214b/attachment.html From merc4krugger at gmail.com Tue Jul 1 06:27:44 2008 From: merc4krugger at gmail.com (Krugger) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> Message-ID: Hi, Am I understanding it correctly? You want to have more than 360 mounts in a single NFS client? And you want that client to be run on a non-privileged port? What you are doing doesn't make much sense to me, but you can try adding the option "lockd.udpport=32768 lockd.tcpport=32768" to your kernel flags so that the kernel puts the daemon lockd that handles NFS locks at the port you selected in the client side. I don't understand how changing the port will help you get more mounts in. I would actually suggest you review the maximum allowed filehandles for each process. You will also need and start services manually, something like: statd -p 32765 -o 32766 mountd -p 32767 If you use modules you need to reconfigure you modules with "options lockd nlm_udpport=32768 nlm_tcpport=32768" to your /etc/modules.conf If I am misunderstanding and you are having a maximum of 360 clients for your NFS server, then maybe you are having a network problem, because with NFS3 your clients will lose connection to the server when de UDP starts losing packets due to heavy I/O from the calculations if both happen on the same network. Maybe NFS v4 might help with TCP connections or/and some sort of shaping to make sure there is enough bandwith reservered for NFS to operate properly. Notice that all have differant ports 32765,32766,32767,32768 Krugger On Tue, Jul 1, 2008 at 10:36 AM, Henning Fehrmann wrote: > Hello, > > we need to automount NFS directories on high ports to increase the number of possible mounts. > Currently, we are limited up to ca 360 mounts. > > The NFS-server exports with the option 'insecure' but the mounts still end up on ports <1024 on the client side. > > Is there a way to enable automounts on higher ports? How can it be done manually: > mount -t nfs -o ....? > > We are using autofs version 5. > > Thank you, > Henning > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From vernard at venger.net Tue Jul 1 08:19:15 2008 From: vernard at venger.net (Vernard Martin) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: Message-ID: <486A4AF3.9040108@venger.net> Jon Aquilina wrote: > does anyone know of any rendering software that will work with a cluster? The Big Daddy of them all, Pixar's RenderMan Pro Server is supported under Linux and is used by nearly everybody in Hollywood that does graphic rendering for movies. It ain't cheap but its pretty much the best there Check out https://renderman.pixar.com/products/techspecs/index.htm for more info. From gregory.warnes at rochester.edu Tue Jul 1 09:39:38 2008 From: gregory.warnes at rochester.edu (Gregory Warnes) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: <200807010849.42415.kilian@stanford.edu> Message-ID: Or, of course, the original Mosix project. Ammon Barak is very amiable and willing to work with folks. http://www.mosix.org -Greg On 7/1/08 11:49AM , "Kilian CAVALOTTI" wrote: > Hi Jon, > > On Tuesday 01 July 2008 03:38:52 am Jon Aquilina wrote: >> > does anyone know an altenative to openmosix?? > > You may want to check out OpenSSI: http://www.openssi.org > > As its name says, that's a SSI clustering solution, with unified process > namespace, full process migration, load-balancing, single root > filesystem, etc. A complete list of features is available at: > http://wiki.openssi.org/go/Features > > Cheers, > -- > Kilian > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Gregory R. Warnes, Ph.D Program Director Center for Computational Arts, Sciences, and Engineering University of Rochester Tel: 585-273-2794 Fax: 585-276-2097 Email: gregory.warnes@rochester.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/6b37be64/attachment.html From landman at scalableinformatics.com Tue Jul 1 11:06:34 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <486A722A.3000405@scalableinformatics.com> Hi Job Jon Aquilina wrote: > does anyone know an altenative to openmosix?? would it be worth reviving > the development of the kernel? OpenMOSIX was all about process migration between different independent OSes. You can still get some of that with Scyld, with OpenSSI, and a few others. If you prefer more of an SMP model (simpler programming), you should look at ScaleMP DSMs. Some on this list argue the shared memory programming is not easier than distributed memory programming, though I am not one of them who makes this argument. It has different challenges, costs and benefits than MPI. It has different limitations. Not so surprisingly, with the advent of many-core units, shared memory programming techniques are needed to get good performance within a single system. Disclosure: We are looking at these units for some of our work. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From smulcahy at aplpi.com Tue Jul 1 11:11:23 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: <486A6D59.7020704@scalableinformatics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <486A734B.3000701@aplpi.com> Joe Landman wrote: > Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. > Some distros use good kernels, some do not. I won't mention who I think > is in the latter category. > .. > We generally don't care which distro is used. Only that the kernel is > reasonable, stable under load, and supports updated file systems/network > capability. This information would be most interesting to me and surely others on the list .. can you talk about of the distributions that provide "good kernels" if not about the others (and hey, theres hundreds of Linux distributions out there - http://lwn.net/Distributions/ so we couldn't infer the bad ones from your omissions ;) -stephen -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From geoff at galitz.org Tue Jul 1 12:05:54 2008 From: geoff at galitz.org (Geoff Galitz) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <48693A89.3080605@moene.indiv.nluug.nl> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu><485A9520.2080508@scalableinformatics.com> <48693A89.3080605@moene.indiv.nluug.nl> Message-ID: <128FF5A06DBD4D74B8AA8CB6E4EF4B1F@geoffPC> Ohh... I was just waiting for the conversation to back to this. For an inside perspective: http://www.spiegel.de/international/europe/0,1518,562315,00.html Does that make me on-topic? -geoff Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Toon Moene Sent: Montag, 30. Juni 2008 21:57 To: Joe Landman Cc: beowulf@beowulf.org Subject: Re: [Beowulf] Re: "hobbyists" Joe Landman wrote: > Tactical nukes (aimed at armies) were on the table for a few of the NATO > scenarios involving responses to Soviet invasion of western Europe > (based upon some of the historical reading, though I am not sure how > serious they were). The western Europeans were understandably > un-enthusiastic about such scenarios. You bet we were. I was in the organization of the 400,000+ protest in Amsterdam in November. 1981. Cannon-fodder at a high level ... -- Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.indiv.nluug.nl/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jan.heichler at gmx.net Tue Jul 1 12:08:32 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Fri Mar 19 01:07:24 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: <66506789.20080701210832@gmx.net> An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/f49a393d/attachment.html From jan.heichler at gmx.net Tue Jul 1 12:09:08 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: <62974595.20080701210908@gmx.net> Hallo Dan, Dienstag, 1. Juli 2008, meintest Du: >>Hi Jon, >>We have our own stack which we stick on top of the customers favourite >>red hat clone. Usually Scientific Linux. >>Here is a bit more about it. >>http://www.clustervision.com/products_os.php >>We sell as a standalone product and it does quite well. I could even >>go so far to say that it is 'stack of choice' in many European >>institutions. DKqc> Every throught of getting a job in Sales and Marketing? :-) What makes you think that he hasn't that kind of job? ;-) @Andy: SCNR Regards Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/74653d4c/attachment.html From hahn at mcmaster.ca Tue Jul 1 12:19:24 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: <486A6760.5010006@ias.edu> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: >>> does it necessarily have to be a redhat clone. can it also be a debian >>> based >>> clone? >> >> but why? is there some concrete advantage to using Debian? >> I've never understood why Debian users tend to be very True Believer, >> or what it is that hooks them. > > And the Debian users can say the same thing about Red Hat users. Or SUSE very nice! an excellent parody of the True Believer response. but I ask again: what are the reasons one might prefer using debian? really, I'm not criticizing it - I really would like to know why it would matter whether someone (such as ClusterVisionOS (tm)) would use debian or another distro. From matt at technoronin.com Tue Jul 1 12:30:05 2008 From: matt at technoronin.com (Matt Lawrence) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: On Tue, 1 Jul 2008, Mark Kosmowski wrote: > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. Perhaps you should consider getting time on someone else's cluster. For something that only requires three nodes, there should be quite a number of places to run. -- Matt It's not what I know that counts. It's what I can remember in time to use. From hahn at mcmaster.ca Tue Jul 1 12:25:48 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: <486A6D59.7020704@scalableinformatics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: > Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. Some > distros use good kernels, some do not. I won't mention who I think is in the > latter category. I was hoping for some discussion of concrete issues. for instance, I have the impression debian uses something other than sysvinit - does that work out well? is it a problem getting commercial packages (pathscale/pgi/intel compilers, gaussian, etc) to run? the couple debian people I know tend to have more ideological motives (which I do NOT impugn, except that I am personally more swayed by practical, concrete reasons.) From landman at scalableinformatics.com Tue Jul 1 12:53:23 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <486A8B33.7020600@scalableinformatics.com> Mark Hahn wrote: >> Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. >> Some distros use good kernels, some do not. I won't mention who I >> think is in the latter category. > > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - does > that work out well? is it a problem getting commercial packages > (pathscale/pgi/intel compilers, gaussian, etc) to run? Hi Mark: We have multiple Ubuntu servers up, and thus far, no major problems ... just a few "translational" gotchas. We have successfully run pgi, intel, gaussian, gamess, ... on our Ubuntu units as well as our RHEL/Centos, Fedora, ... > > the couple debian people I know tend to have more ideological motives Yeah ... can't escape this. I like some of the elements of Ubuntu/Debian better than I do RHEL (the network configuration in Debian is IMO sane, while in RHEL/Centos/SuSE it is not). There are some aspects that are worse (no /etc/profile.d ... so I add that back in by hand ). > (which I do NOT impugn, except that I am personally more swayed by > practical, concrete reasons.) Building and deploying updated/correct kernels with Ubuntu/Debian is far easier (the build is much easier/saner) than with SuSE, RHEL, ... From a pragmatic view, this is what why we have a slight preference for that. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From lindahl at pbm.com Tue Jul 1 13:01:23 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: <486A722A.3000405@scalableinformatics.com> References: <486A722A.3000405@scalableinformatics.com> Message-ID: <20080701200122.GA23583@bx9.net> On Tue, Jul 01, 2008 at 02:06:34PM -0400, Joe Landman wrote: > If you prefer more of an SMP model (simpler programming), you should > look at ScaleMP DSMs. Some on this list argue the shared memory > programming is not easier than distributed memory programming, Gee, and I thought the biggest argument about ScaleMP was that the previous 50 times the same thing was attempted, it had low performance. I'd love to see some benchmarks (other than Stream). So if you do look at it, please share. -- greg From asabigue at fing.edu.uy Tue Jul 1 13:14:26 2008 From: asabigue at fing.edu.uy (ariel sabiguero yawelak) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: <486A9022.5070109@fing.edu.uy> Well Mark, don't give up! I am not sure which one is your application domain, but if you require 24x7 computation, then you should not be hosting that at home. On the other hand, if you are not doing real computation and you just have a testbed at home, maybe for debugging your parallel applications or something similar, you might be interested in a virtualized solution. Several years ago, I used to "debug" some neural networks at home, but training sessions (up to two weeks of training) happened at the university. I would suggest to do something like that. You can always scale-down your problem in several phases and save the complete data-set / problem for THE RUN. You are not being a heretic there, but suffering energy costs ;-) In more places that you may believe, useful computing nodes are being replaced just because of energy costs. Even in some application domains you can even loose computational power if you move from 4 nodes into a single quad-core (i.e. memory bandwidth problems). I know it is very nice to be able to do everything at home.. but maybe before dropping your studies or working overtime to pay the electricity bill, you might want to reconsider the fact of collapsing your phisical deploy into a single virtualized cluster. (or just dispatch several threads/processes in a single system). If you collapse into a single system you have only 1 mainboard, one HDD, one power source, one processor (physically speaking), .... and you can achieve almost the performance of 4 systems in one, consuming the power of.... well maybe even less than a single one. I don't want to go into discussions about performance gain/loose due to the variation of the hardware architecture. Invest some bucks (if you haven't done that yet) in a good power source. Efficiency of OEM unbranded power sources is realy pathetic. may be 45-50% efficiency, while a good power source might be 75-80% efficient. Use the energy for computing, not for heating your house. What I mean is that you could consider just collapsing a complete "small" cluster into single system. If your application is CPU-bound and not I/O bound, VMware Server could be an option, as it is free software (unfortunately not open, even tough some patches can be done on the drivers). I think it is not possible to publish benchmarking data about VMware, but I can tell you that in long timescales, the performance you get in the host OS is similar than the one of the guest OS. There are a lot of problems related to jitter, from crazy clocks to delays, but if your application is not sensitive to that, then you are Ok. Maybe this is not a solution, but you can provide more information regarding your problem before quitting... my 2 cents.... ariel Mark Kosmowski escribi?: > At some point there a cost-benefit analysis needs to be performed. If > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > single-core land still and do not yet differentiate between CPU and > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > resources and would be better off buying new machines and physically > transferring the RAM to and from them or running more jobs each > distributed across fewer CPUs. Or saving on my electricity bill and > powering down some nodes. > > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. > > Mark E. Kosmowski > > >> From: "Jon Aquilina" >> > > >> not sure if this applies to all kinds of senarios that clusters are used in >> but isnt the more ram you have the better? >> >> On 6/30/08, Vincent Diepeveen wrote: >> >>> Toon, >>> >>> Can you drop a line on how important RAM is for weather forecasting in >>> latest type of calculations you're performing? >>> >>> Thanks, >>> Vincent >>> >>> >>> On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: >>> >>> Jim Lux wrote: >>> >>>> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's >>>> >>>>> scale of operations. We face this all the time with NASA work. >>>>> Semiconductor manufacturers have no real reason to produce special purpose >>>>> or customized versions of their products for space use, because they can >>>>> sell all they can make to the consumer market. More than once, I've had a >>>>> phone call along the lines of this: >>>>> "Jim: I'm interested in your new ABC321 part." >>>>> "Rep: Great. I'll just send the NDA over and we can talk about it." >>>>> "Jim: Great, you have my email and my fax # is..." >>>>> "Rep: By the way, what sort of volume are you going to be using?" >>>>> "Jim: Oh, 10-12.." >>>>> "Rep: thousand per week, excellent..." >>>>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every >>>>> year." >>>>> "Rep: Oh..." >>>>> {Well, to be fair, it's not that bad, they don't hang up on you.. >>>>> >>>>> >>>> Since about a year, it's been clear to me that weather forecasting (i.e., >>>> running a more or less sophisticated atmospheric model to provide weather >>>> predictions) is going to be "mainstream" in the sense that every business >>>> that needs such forecasts for its operations can simply run them in-house. >>>> >>>> Case in point: I bought a $1100 HP box (the obvious target group being >>>> teenage downloaders) which performs the HIRLAM limited area model *on the >>>> grid that we used until October 2006* in December last year. >>>> >>>> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. >>>> >>>> I wonder what effect this will have on CPU developments ... >>>> >>>> -- >>>> Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290 >>>> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >>>> At home: http://moene.indiv.nluug.nl/~toon/ >>>> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html >>>> >>>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> >> -- >> Jonathan Aquilina >> > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From perry at piermont.com Tue Jul 1 13:21:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <20080701164747.GA15901@gretchen.aei.uni-hannover.de> (Henning Fehrmann's message of "Tue\, 1 Jul 2008 18\:47\:47 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> Message-ID: <87fxqtuzh8.fsf@snark.cb.piermont.com> Henning Fehrmann writes: >> Thus, your problem sounds rather odd. There is no obvious reason you >> should be limited to 360 connections. >> >> Perhaps your problem is not what you think it is at all. Could you >> explain it in more detail? > > I guess it has also something to do with the automounter. I am not able > to increase this number. > But even if the automounter would handle more we need to be able to > use higher ports: > netstat shows always ports below 1024. > > tcp 0 0 client:941 server:nfs > > We need to mount up to 1400 nfs exports. All NFS clients are connecting to a single port, not to a different port for every NFS export. You do not need 1400 listening TCP ports on a server to export 1400 different file systems. Only one port is needed, whether you are exporting one file system or one million, just as only one SMTP port is needed whether you are receiving mail from one client or from one million. The clients are connecting from ports below 1024 because Berkeley set up a hack in the original BSD stack so that only root could open ports below 1024. This way, you could "know" the process on the remote host was a root process, thus you could feel "secure" [sic]. It doesn't add any real security any more, but it is also not the cause of any problem you are experiencing. We can help you figure this out, but you will have to give a lot more detail about the problem. Please describe your network setup. How many servers do you have? How many clients? How many file systems are those servers exporting? How many is a typical client mounting, and why? Start there and we can try to move forward. -- Perry E. Metzger perry@piermont.com From landman at scalableinformatics.com Tue Jul 1 13:24:04 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: <20080701200122.GA23583@bx9.net> References: <486A722A.3000405@scalableinformatics.com> <20080701200122.GA23583@bx9.net> Message-ID: <486A9264.5090902@scalableinformatics.com> Greg Lindahl wrote: > On Tue, Jul 01, 2008 at 02:06:34PM -0400, Joe Landman wrote: > >> If you prefer more of an SMP model (simpler programming), you should >> look at ScaleMP DSMs. Some on this list argue the shared memory >> programming is not easier than distributed memory programming, > > Gee, and I thought the biggest argument about ScaleMP was that the > previous 50 times the same thing was attempted, it had low > performance. The researchy DSMs had low performance. That is known. This one seems not to be bad over good IB nets. You always have latency. Can't escape that. > I'd love to see some benchmarks (other than Stream). So if you do look > at it, please share. If you are serious about this, I'll bug Shai as to what is shareable. He does have benchmarks. The ones I have seen (real applications, not microbenchmarks), looked pretty good. Which is why we are looking at them. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From andrew at moonet.co.uk Tue Jul 1 13:35:29 2008 From: andrew at moonet.co.uk (andrew holway) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <66506789.20080701210832@gmx.net> Message-ID: > does it necessarily have to be a redhat clone. can it also be a debian based > clone? Not at all, If there were demand or a customer with enough cash to throw at the job then we would of course accommodate his every need. Considering that it is taking several rather expensive developers quite a long time to push out the latest incarnation, ClusterVisionOS 4 through beta this cost could be considerable to ensure a stable environment. I'm no expert in the subtleties of distributions but maintaining and supporting one to a high enough standard is quite enough work thanks very much :) Ta Andy From lindahl at pbm.com Tue Jul 1 13:37:14 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: <486A9264.5090902@scalableinformatics.com> References: <486A722A.3000405@scalableinformatics.com> <20080701200122.GA23583@bx9.net> <486A9264.5090902@scalableinformatics.com> Message-ID: <20080701203713.GB28024@bx9.net> On Tue, Jul 01, 2008 at 04:24:04PM -0400, Joe Landman wrote: > If you are serious about this, I'll bug Shai as to what is shareable. He > does have benchmarks. The ones I have seen (real applications, not > microbenchmarks), looked pretty good. Which is why we are looking at > them. If you look back on this mailing list, you'll see that I asked him for benchmarks, and he posted stream. Which isn't interesting, because it's embarrassingly parallel. -- greg From hahn at mcmaster.ca Tue Jul 1 13:44:05 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] open mosix alternative In-Reply-To: <20080701200122.GA23583@bx9.net> References: <486A722A.3000405@scalableinformatics.com> <20080701200122.GA23583@bx9.net> Message-ID: > I'd love to see some benchmarks (other than Stream). So if you do look > at it, please share. me too. in particular, I'd like to see "hot page" performance - where a multithreaded program bangs on a heavily write-shared page. From gerry.creager at tamu.edu Tue Jul 1 13:57:08 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Fri Mar 19 01:07:25 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <486A9822.7000902@moene.indiv.nluug.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <1214864562.6912.29.camel@Vigor13> <486A1F7B.9080408@tamu.edu> <486A9822.7000902@moene.indiv.nluug.nl> Message-ID: <486A9A24.9000800@tamu.edu> I was at the WRF conf. last week. A colleague from the Netherlands was lamenting that he couldn't get ECMWF data (I don't recall the annual cost/year but it was huge). NOAA/NCEP GFS data are available via FTP and regular enough to allow really simple scripting, as well as other methods. I don't understand why folks wouldn't use these data. As for competing, if our companies are not sufficiently technically astute, should we be protecting them from European companies, just because the data are free? Toon Moene wrote: > Gerry Creager wrote: > >> In the US, at least for academic institutions and hobbyists, surface >> and upper air observations of the sort you describe are generally >> available for incorporation into models for data assimilation. Models >> are generally forced and bounded using model data from other >> atmospheric models, also available. As I understand it from >> colleagues in Europe, getting similar data over there is more >> problemmatical. > > Exactly ! And what happens in Europe is that companies take the freely > available US data, use it to compete with US companies, and disregard > the (meteorological superior) ECMWF data, because it is not free. > > A colleague of mine held some very unpopular talks in Reading, England, > about this (according to his figures, 99 % of the meteorological data > used in Europe originates from the US). > -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From eagles051387 at gmail.com Tue Jul 1 15:53:34 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <486A4296.4050501@hope.edu> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> Message-ID: my idea is more of for my thesis. if i am goign ot do anything like this. vernard thanks for the link. whats it like in a cluster environment? On Tue, Jul 1, 2008 at 4:43 PM, Paul Van Allsburg wrote: > I'd like to do the same, as a project for a group of students... Please > keep me in the loop? > Thanks! > Paul > > -- > Paul Van Allsburg Computational Science & Modeling Facilitator > Natural Sciences Division, Hope College > 35 East 12th Street > Holland, Michigan 49423 > 616-395-7292 http://www.hope.edu/academic/csm/ > > > Jon Aquilina wrote: > >> that would be greatly appreciated >> >> On 7/1/08, *Geoff Galitz* > >> wrote: >> >> >> That is out of my field of expertise. Sounds like a question for >> professional digital artists. I can put you in touch some folks >> that most likely know the answer to your questions, if you like. >> >> >> Anybody know of any current approaches to this? >> >> >> Geoff Galitz >> Blankenheim NRW, Deutschland >> http://www.galitz.org >> >> * From: * Jon Aquilina [mailto:eagles051387@gmail.com >> ] >> *Sent:* Dienstag, 1. Juli 2008 13:27 >> *To:* Geoff Galitz >> *Cc:* Beowulf Mailing List >> *Subject:* Re: [Beowulf] software for compatible with a cluster >> >> >> reason i am asking is because i would like to setup a rendering >> cluster and provide rendering services. does this also work for 3d >> animated movies that require rendering or does one need somethin >> entierly different for that? >> >> On 7/1/08, *Geoff Galitz* > > wrote: >> >> >> >> I know people who use Houdini for this: >> >> >> http://www.sidefx.com/index.php >> >> >> I cannot vouch for how well it works or what is involved, though. >> >> >> >> Geoff Galitz >> Blankenheim NRW, Deutschland >> http://www.galitz.org >> >> * From: * beowulf-bounces@beowulf.org >> >> [mailto:beowulf-bounces@beowulf.org >> ] *On Behalf Of *Jon Aquilina >> *Sent:* Dienstag, 1. Juli 2008 12:40 >> *To:* Beowulf Mailing List >> *Subject:* [Beowulf] software for compatible with a cluster >> >> >> does anyone know of any rendering software that will work with a >> cluster? >> >> -- Jonathan Aquilina >> >> >> >> >> -- Jonathan Aquilina >> >> >> >> >> -- >> Jonathan Aquilina >> > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/e3522a07/attachment.html From perry at piermont.com Tue Jul 1 16:23:10 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: <486A6760.5010006@ias.edu> (Prentice Bisbal's message of "Tue\, 01 Jul 2008 13\:20\:32 -0400") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <87bq1hgpep.fsf@snark.cb.piermont.com> Prentice Bisbal writes: >>> does it necessarily have to be a redhat clone. can it also be a debian >>> based >>> clone? >> >> but why? is there some concrete advantage to using Debian? >> I've never understood why Debian users tend to be very True Believer, >> or what it is that hooks them. > > And the Debian users can say the same thing about Red Hat users. Or SUSE > users. And if any still exist, the Slackware users could say the same > thing about the both of them. But then the Slackware users could also > point out that the first Linux distro was Slackware, so they are using > the one true Linux distro... Precisely. It pays to allow people to use what they want. Fewer religious battles that way. Whether one distro or another has an advantage isn't the point -- people have their own tastes and it doesn't pay to tell them "no" without good reason. Perry -- Perry E. Metzger perry@piermont.com From perry at piermont.com Tue Jul 1 16:25:19 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: (Mark Hahn's message of "Tue\, 1 Jul 2008 15\:25\:48 -0400 \(EDT\)") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <877ic5gpb4.fsf@snark.cb.piermont.com> Mark Hahn writes: > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - > does that work out well? is it a problem getting commercial packages > (pathscale/pgi/intel compilers, gaussian, etc) to run? It is trivial to port init scripts between different init systems. They're just short shell scripts, they're utterly readable, and any sysadmin worth their salt can make the needed changes in a few minutes. If you have a large cluster, you need such a person anyway. Perry -- Perry E. Metzger perry@piermont.com From perry at piermont.com Tue Jul 1 16:31:50 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: (Mark Hahn's message of "Tue\, 1 Jul 2008 15\:19\:24 -0400 \(EDT\)") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <873amtgp09.fsf@snark.cb.piermont.com> Mark Hahn writes: >>> but why? is there some concrete advantage to using Debian? >>> I've never understood why Debian users tend to be very True Believer, >>> or what it is that hooks them. >> >> And the Debian users can say the same thing about Red Hat users. Or SUSE > > very nice! an excellent parody of the True Believer response. Actually, he was just being reasonable. > but I ask again: what are the reasons one might prefer using debian? > really, I'm not criticizing it - I really would like to know why it > would matter whether someone (such as ClusterVisionOS (tm)) would use > debian or another distro. Often it is just a question of what the people using the system are used to. I often prefer using BSD systems, largely because of certain technical advantages, but also to a great extent because my first big Unix boxes were Vaxes running 4.2BSD in the early 1980s and after 25 years with the same flavor of Unix you get used to the way things are done. It is much the same reason I use Emacs instead of vi -- I started using Emacs on Tops-20 decades ago and I'm too used to it now. If you told me I "have" to use vi, things would get ugly, even though I don't think there is anything wrong with using vi per se. Perry -- Perry E. Metzger perry@piermont.com From perry at piermont.com Tue Jul 1 16:34:17 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: (Jon Aquilina's message of "Wed\, 2 Jul 2008 00\:53\:34 +0200") References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> Message-ID: <87y74lfabq.fsf@snark.cb.piermont.com> "Jon Aquilina" writes: > my idea is more of for my thesis. If you're trying to do 3d animation on the cheap and you want something that's already cluster capable, I'd try Blender. It is open source and it has already made some reasonable length movies. Not being an animation type, I know nothing about how nice it is compared to commercial products, but it is hard to beat the price. Perry -- Perry E. Metzger perry@piermont.com From hahn at mcmaster.ca Tue Jul 1 22:06:43 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: References: Message-ID: >> I was hoping for some discussion of concrete issues. for instance, >> I have the impression debian uses something other than sysvinit - >> does that work out well? >> > Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, ... thanks. I guess I was assuming that mainstream debian was like ubuntu. >> is it a problem getting commercial >> packages (pathscale/pgi/intel compilers, gaussian, etc) to run? >> > I¹ve never had any major problems. Most linux vendors supply both RPM¹s and > .tar.gz installers, and I generally have better luck with the latter, even > on RPM based systems anyway. interesting - I wonder why. the main difference would be that the rpm format encodes dependencies... >> the couple debian people I know tend to have more ideological motives >> (which I do NOT impugn, except that I am personally more swayed by >> practical, concrete reasons.) >> > My Œconversion¹ to use of Debian had little to do with ideological motives, > and a lot more to do with minimizing the amount of time I had to take away > from my research to support the Linux clusters I was maintaining at the > time. again interesting, thanks. what sorts of things in rpm-based distros consumed your time? > Side note, one very nice thing about debian is the ability to upgrade a > system in-place from one O/S release to another via > > apt-get dist-upgrade > > Much nicer than reinstalling the O/S as seems to be (used to be?) the norm > with RPM-based systems I've done major version upgrades using rpm, admittedly in the pre-fedora days. it _is_ a nice capability - I'm a little surprised desktop-oriented distros don't emphasize it... From tjrc at sanger.ac.uk Tue Jul 1 22:37:19 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] A press release In-Reply-To: <486A8B33.7020600@scalableinformatics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> <486A8B33.7020600@scalableinformatics.com> Message-ID: <86A30BE3-3B8E-47C8-8286-D2D7E2C74A40@sanger.ac.uk> On 1 Jul 2008, at 8:53 pm, Joe Landman wrote: >> the couple debian people I know tend to have more ideological motives > > Yeah ... can't escape this. Indeed. Ubuntu is slightly more pragmatic than Debian, as far as the ideological stuff goes. > I like some of the elements of Ubuntu/Debian better than I do RHEL > (the network configuration in Debian is IMO sane, while in RHEL/ > Centos/SuSE it is not). There are some aspects that are worse (no / > etc/profile.d ... so I add that back in by hand ). Here, our clusters all run Debian, but we also have RHAS and SLES around when support matrices demand it (Oracle, mainly). I'd agree that fundamentally it's a case of what you're used to. We stopped using Red Hat widely about four years ago, and the reasons (which are probably not valid any more) were: 1) Not all userland programs were 64-bit file aware. 2) There were certain features which we just couldn't get to work properly on RHAS - a prime example being multipath SAN access. It "just worked" on Debian. 3) Smooth upgrades from one major release to the next without having to reinstall. While this is probably not important for beowulf nodes, it is for more complex servers. I still prefer Debian's package management system, but that's probably because I'm used to it, rather than it inherently being superior. yast2 can do pretty much everything that aptitude does, although I think aptitude is more amenable to automation through cfengine and the like. There are some very powerful little parts of the packaging system, like dpkg-divert, which allows you to replace a file from a package with your own, in such a way that it will not be overwritten the next time the package is upgraded. For those of us that need to customise our systems that sort of thing is very useful, and saves a lot of work down the line. >> (which I do NOT impugn, except that I am personally more swayed by >> practical, concrete reasons.) > > > Building and deploying updated/correct kernels with Ubuntu/Debian is > far easier (the build is much easier/saner) than with SuSE, > RHEL, ... From a pragmatic view, this is what why we have a slight > preference for that. I'd agree with that. Using make-kpkg to build a custom kernel .deb which you can then easily deploy to all your machines is a real boon. At the end of the day, people should use what they're comfortable with. I don't necessarily buy the support argument; there are some companies (Platform, for example) who will support you whichever distro you use; all they care about is what kernel version and C library version you're running. I like this attitude and I wish it was more widespread amongst proprietary software vendors. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From eagles051387 at gmail.com Tue Jul 1 22:37:21 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:25 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <87y74lfabq.fsf@snark.cb.piermont.com> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> Message-ID: if i use blender how nicely does it work in a cluster? On Wed, Jul 2, 2008 at 1:34 AM, Perry E. Metzger wrote: > > "Jon Aquilina" writes: > > my idea is more of for my thesis. > > If you're trying to do 3d animation on the cheap and you want > something that's already cluster capable, I'd try Blender. It is open > source and it has already made some reasonable length movies. Not > being an animation type, I know nothing about how nice it is compared > to commercial products, but it is hard to beat the price. > > Perry > -- > Perry E. Metzger perry@piermont.com > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/97bf9a8b/attachment.html From carsten.aulbert at aei.mpg.de Wed Jul 2 00:26:58 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87fxqtuzh8.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> Message-ID: <486B2DC2.9010604@aei.mpg.de> Hi Perry, Perry E. Metzger wrote: > > All NFS clients are connecting to a single port, not to a different > port for every NFS export. You do not need 1400 listening TCP ports on > a server to export 1400 different file systems. Only one port is > needed, whether you are exporting one file system or one million, just > as only one SMTP port is needed whether you are receiving mail from > one client or from one million. > That's clear and not the problem > The clients are connecting from ports below 1024 because Berkeley set > up a hack in the original BSD stack so that only root could open ports > below 1024. This way, you could "know" the process on the remote host > was a root process, thus you could feel "secure" [sic]. It doesn't add > any real security any more, but it is also not the cause of any > problem you are experiencing. We might run out of "secure" ports. > We can help you figure this out, but you will have to give a lot more > detail about the problem. Please describe your network setup. How many > servers do you have? How many clients? How many file systems are those > servers exporting? How many is a typical client mounting, and why? > Start there and we can try to move forward. > OK, we have 1342 nodes which act as servers as well as clients. Every node exports a single local directory and all other nodes can mount this. What we do now to optimize the available bandwidth and IOs is spread millions of files according to a hash algorithm to all nodes (multiple copies as well) and then run a few 1000 jobs opening one file from one box then one file from the other box and so on. With a short autofs timeout that ought to work. Typically it is possible that a single process opens about 10-15 files per second, i.e. making 10-15 mounts per second. With 4 parallel process per node that's 40-60 mounts/second. With a timeout of 5 seconds we should roughly have 200-300 concurrent mounts (on average, no idea abut the variance). Our tests so far have shown that sometimes a node keeps a few mounts open (autofs4 problems AFAIK) and at some point is not able to mount more shares. Usually this occurs at about 350 mounts and we are not yet 100% sure if we are running out of secure ports. All our boxes export now with "insecure" option (NFSv3), but our clients all connect from a "secure" port, anyone here who might give us a hint how to force this in Linux? Thanks a lot Carsten From tjrc at sanger.ac.uk Wed Jul 2 01:19:50 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote: > OK, we have 1342 nodes which act as servers as well as clients. Every > node exports a single local directory and all other nodes can mount > this. > > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) and then run a few 1000 jobs opening one file from one > box then one file from the other box and so on. With a short autofs > timeout that ought to work. Typically it is possible that a single > process opens about 10-15 files per second, i.e. making 10-15 mounts > per > second. With 4 parallel process per node that's 40-60 mounts/second. > With a timeout of 5 seconds we should roughly have 200-300 concurrent > mounts (on average, no idea abut the variance). Please tell me you're not serious! The overheads of just performing the NFS mounts are going to kill you, never mind all the network traffic going all over the place. Since you've distributed the files to the local disks of the nodes, surely the right way to perform this work is to schedule the computations so that each node works on the data on its own local disk, and doesn't have to talk networked storage at all? Or don't you know in advance which files a particular job is going to need? Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From henning.fehrmann at aei.mpg.de Wed Jul 2 01:44:58 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> Message-ID: <20080702084458.GA12879@gretchen.aei.uni-hannover.de> On Wed, Jul 02, 2008 at 09:19:50AM +0100, Tim Cutts wrote: > > On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote: > > >OK, we have 1342 nodes which act as servers as well as clients. Every > >node exports a single local directory and all other nodes can mount this. > > > >What we do now to optimize the available bandwidth and IOs is spread > >millions of files according to a hash algorithm to all nodes (multiple > >copies as well) and then run a few 1000 jobs opening one file from one > >box then one file from the other box and so on. With a short autofs > >timeout that ought to work. Typically it is possible that a single > >process opens about 10-15 files per second, i.e. making 10-15 mounts per > >second. With 4 parallel process per node that's 40-60 mounts/second. > >With a timeout of 5 seconds we should roughly have 200-300 concurrent > >mounts (on average, no idea abut the variance). > > Please tell me you're not serious! The overheads of just performing the NFS mounts are going to kill you, never mind all the network traffic going > all over the place. > > Since you've distributed the files to the local disks of the nodes, surely the right way to perform this work is to schedule the computations so that > each node works on the data on its own local disk, and doesn't have to talk networked storage at all? Or don't you know in advance which files a > particular job is going to need? Yes, this is the problem. The amount of files is too big to store it everywhere (few TByte and 50 million files). Mounting a view NFS server does not provide the bandwidth. On the other hand, the coreswitch should be able to handle the flows non blocking. We think that nfs mounts are the fastest possibility to distribute the demanded files to the nodes. Henning From tjrc at sanger.ac.uk Wed Jul 2 01:45:21 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: References: Message-ID: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> On 2 Jul 2008, at 6:06 am, Mark Hahn wrote: >>> I was hoping for some discussion of concrete issues. for instance, >>> I have the impression debian uses something other than sysvinit - >>> does that work out well? >>> >> Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/ >> rc0.d, ... > > thanks. I guess I was assuming that mainstream debian was like > ubuntu. It's sort of the other way around. Remember that Ubuntu is based off a six-monthly snapshot of Debian's testing track, which is why Hardy looks a lot more like the upcoming Debian Lenny than it does like Debian Etch. > interesting - I wonder why. the main difference would be that the > rpm format encodes dependencies... The difficulty is that many ISVs tend to do a fairly terrible job of packaging their applications as RPM's or DEB's, for example creating init scripts which don't obey the distribution's policies, or making willy-nilly modifications to configuration files all over the place, even in other packages (which in the Debian world is a *big* no-no, that's why many Debian/Ubuntu packages have now moved to the conf.d type of configuration directory, so that other packages can drop in little independent snippets of configuration) I have seen, for example, .deb packages from a Large Company With Which We Are All Familiar which essentially attempted to convert your system into a Red Hat system by moving all your init scripts around and whatnot, so once you'd installed this abomination, you'd totally wrecked the ability of many of the main distro packages to be updated ever again. Oh, and of course uninstalling the package didn't put anything back the way it had been before. Like you, I tend to use tarballs if they are available, and if I want to turn them into packages I do it myself, and make sure they are policy compliant for the distro. So this, while not a statement in favour of either flavour of distro, is definitely a warning to be very wary of what packages that have come from sources other than the distro itself might do (which of course, you'd be wary of anyway for security reasons). Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ajt at rri.sari.ac.uk Wed Jul 2 02:23:06 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <486B48FA.7080403@rri.sari.ac.uk> Mark Hahn wrote: >[...] > but I ask again: what are the reasons one might prefer using debian? > really, I'm not criticizing it - I really would like to know why it > would matter whether someone (such as ClusterVisionOS (tm)) would use > debian or another distro. Hello, Mark. I've been on a well trodden path from trying out the 'free' version of Scyld under RH6.2, then using openMosix under all versions of RH up to RH9, Fedora up to core2, then Debian Sarge and now Ubuntu 6.06.1 LTS with an upgrade to 8.04.1 LTS imminent. As I see it, this has been a developmental journey and also a learning experiencefor me. As others on this thread have admitted, I'm not blind to the ideological objectives of Debian. However, I'm now using a very good commerically supported version of Linux with the what is widely acknowledged to be the largest user and developer community. It's my own experience of trying to do my work under RH/Fedora that's put me off these distro's and I see a BIG divide between 'real' HPC communities using BIG iron, and small Beowulf clusters like mine. I've got to admit that Tim Cutts did influence my decision to try out Debian (thanks, Tim!). I also use the (UK) NERC's Bio-Linux binary deb's and I was also influenced by their decision to change from RH to Debian for Bio-Linux. I can see that other communities use RH for similar reasons, though I should mention that our Beowulf spends a lot of time running quantum chemistry simulations (GAMESS etc.). I've pout up an Ubuntu blue-print for 'biobuntu', which consolidates the work I'm doing on several projects: https://blueprints.launchpad.net/ubuntu/+spec/biobuntu I am, of course, familiar with 'other' Biolinuxen and rpm repositories of bioinformatics software: http://en.wikipedia.org/wiki/BioLinux Having tried out many of these alternatives, I remain convinced that NEBC's Bio-Linux is most appropriate for my work. In particular, the level of support in the form of documentation and training courses provided by NEBC is very good. This means I don't have to reinvent the wheel - Always a good point for any Beowulf-related activity :-) Tony. -- Dr. A.J.Travis, | mailto:ajt@rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 02:35:57 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Carsten Aulbert wrote: > OK, we have 1342 nodes which act as servers as well as clients. Every > node exports a single local directory and all other nodes can mount this. Have you considered using a parallel file system ? > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) There have been many talks of improving performance by paying attention to the data locality on this very list. Are you not able to move the code to where the data is or move the data to where the code is ? F.e. using a simple TCP connection (nc, rsh, rsync or even http) to transfer the file to the local disk before using it is probably more efficient than the way you use NFS is you deal with small files (as they have to be written to some local storage). The setup and tear-down costs of the NFS connection (automounter, mount, unmount) simply doesn't exist in this case; the transfer of data on the wire happens the same way. Or you could even get around the limitation of storing it locally by using a ramdisk to temporarily store the files (if you have the free memory...) - from what I understand they are read then used immediately and not needed again in a short time frame so it makes no sense to store them for longer, a perfect application for a tmpfs. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 02:59:47 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: On Wed, 2 Jul 2008, Tim Cutts wrote: > The difficulty is that many ISVs tend to do a fairly terrible job of > packaging their applications as RPM's or DEB's I very much agree with this. While you mentioned init scripts that don't fit the distribution, I can add init scripts that are totally missing when they should be provided - a hand-made init script would not be part of the installed package and could fail in various ways if the package is updated or... uninstalled. > Like you, I tend to use tarballs if they are available, and if I > want to turn them into packages I do it myself, and make sure they > are policy compliant for the distro. I think that's actually more important than the distribution per-se. If you are able to package something to fit the distribution (f.e. to install a missing kernel module, add an important software package, etc.) you can more efficiently use your time later on as packaging (done properly) is normally a one-time effort. This goes into the direction that the admin should use the distribution, not fight it! -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From eagles051387 at gmail.com Wed Jul 2 04:16:43 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: one thing must not be forgotten though. in regards to pkging stuff for the ubuntu variation once someone like you and me you upload it for someone higher up on the chain to check and upload to the servers. so basically someone is checking what someone else has packaged. On 7/2/08, Tim Cutts wrote: > > > On 2 Jul 2008, at 6:06 am, Mark Hahn wrote: > > I was hoping for some discussion of concrete issues. for instance, >>>> I have the impression debian uses something other than sysvinit - >>>> does that work out well? >>>> >>>> Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, >>> ... >>> >> >> thanks. I guess I was assuming that mainstream debian was like ubuntu. >> > > It's sort of the other way around. Remember that Ubuntu is based off a > six-monthly snapshot of Debian's testing track, which is why Hardy looks a > lot more like the upcoming Debian Lenny than it does like Debian Etch. > > interesting - I wonder why. the main difference would be that the rpm >> format encodes dependencies... >> > > The difficulty is that many ISVs tend to do a fairly terrible job of > packaging their applications as RPM's or DEB's, for example creating init > scripts which don't obey the distribution's policies, or making willy-nilly > modifications to configuration files all over the place, even in other > packages (which in the Debian world is a *big* no-no, that's why many > Debian/Ubuntu packages have now moved to the conf.d type of configuration > directory, so that other packages can drop in little independent snippets of > configuration) > > I have seen, for example, .deb packages from a Large Company With Which We > Are All Familiar which essentially attempted to convert your system into a > Red Hat system by moving all your init scripts around and whatnot, so once > you'd installed this abomination, you'd totally wrecked the ability of many > of the main distro packages to be updated ever again. Oh, and of course > uninstalling the package didn't put anything back the way it had been > before. > > Like you, I tend to use tarballs if they are available, and if I want to > turn them into packages I do it myself, and make sure they are policy > compliant for the distro. > > So this, while not a statement in favour of either flavour of distro, is > definitely a warning to be very wary of what packages that have come from > sources other than the distro itself might do (which of course, you'd be > wary of anyway for security reasons). > > Tim > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, > a charity registered in England with number 1021457 and acompany registered > in England with number 2742969, whose registeredoffice is 215 Euston Road, > London, NW1 2BE._______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/5e5d59d0/attachment.html From eagles051387 at gmail.com Wed Jul 2 04:18:20 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: im also not sure what support is like in other distros but i commend the kubuntu volunteers who man that irc channel for support as well as those who help with development. are there any other distros that provide support like this? On 7/2/08, Jon Aquilina wrote: > > one thing must not be forgotten though. in regards to pkging stuff for the > ubuntu variation once someone like you and me you upload it for someone > higher up on the chain to check and upload to the servers. so basically > someone is checking what someone else has packaged. > > On 7/2/08, Tim Cutts wrote: >> >> >> On 2 Jul 2008, at 6:06 am, Mark Hahn wrote: >> >> I was hoping for some discussion of concrete issues. for instance, >>>>> I have the impression debian uses something other than sysvinit - >>>>> does that work out well? >>>>> >>>>> Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, >>>> ... >>>> >>> >>> thanks. I guess I was assuming that mainstream debian was like ubuntu. >>> >> >> It's sort of the other way around. Remember that Ubuntu is based off a >> six-monthly snapshot of Debian's testing track, which is why Hardy looks a >> lot more like the upcoming Debian Lenny than it does like Debian Etch. >> >> interesting - I wonder why. the main difference would be that the rpm >>> format encodes dependencies... >>> >> >> The difficulty is that many ISVs tend to do a fairly terrible job of >> packaging their applications as RPM's or DEB's, for example creating init >> scripts which don't obey the distribution's policies, or making willy-nilly >> modifications to configuration files all over the place, even in other >> packages (which in the Debian world is a *big* no-no, that's why many >> Debian/Ubuntu packages have now moved to the conf.d type of configuration >> directory, so that other packages can drop in little independent snippets of >> configuration) >> >> I have seen, for example, .deb packages from a Large Company With Which We >> Are All Familiar which essentially attempted to convert your system into a >> Red Hat system by moving all your init scripts around and whatnot, so once >> you'd installed this abomination, you'd totally wrecked the ability of many >> of the main distro packages to be updated ever again. Oh, and of course >> uninstalling the package didn't put anything back the way it had been >> before. >> >> Like you, I tend to use tarballs if they are available, and if I want to >> turn them into packages I do it myself, and make sure they are policy >> compliant for the distro. >> >> So this, while not a statement in favour of either flavour of distro, is >> definitely a warning to be very wary of what packages that have come from >> sources other than the distro itself might do (which of course, you'd be >> wary of anyway for security reasons). >> >> Tim >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, >> a charity registered in England with number 1021457 and acompany registered >> in England with number 2742969, whose registeredoffice is 215 Euston Road, >> London, NW1 2BE._______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > > -- > Jonathan Aquilina -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/56c63de1/attachment.html From carsten.aulbert at aei.mpg.de Wed Jul 2 04:22:41 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <486B6501.5000108@aei.mpg.de> Hi Bogdan, Bogdan Costescu wrote: > > Have you considered using a parallel file system ? We looked a bit into a few, but would love to get any input from anyone on that. What we found so far was not really convincing, e.g. glusterFS at that time was not really stable, lustre was too easy to crash - at l east at that time, ... > There have been many talks of improving performance by paying attention > to the data locality on this very list. Are you not able to move the > code to where the data is or move the data to where the code is ? In principle this *should* be possible, however then this particular user (and maybe many in the future) would need to circumvent the batch system and it's usually quite a hassle to set this up correctly beforehand. > > F.e. using a simple TCP connection (nc, rsh, rsync or even http) to > transfer the file to the local disk before using it is probably more > efficient than the way you use NFS is you deal with small files (as they > have to be written to some local storage). The setup and tear-down costs > of the NFS connection (automounter, mount, unmount) simply doesn't exist > in this case; the transfer of data on the wire happens the same way. Or > you could even get around the limitation of storing it locally by using > a ramdisk to temporarily store the files (if you have the free > memory...) - from what I understand they are read then used immediately > and not needed again in a short time frame so it makes no sense to store > them for longer, a perfect application for a tmpfs. The interesting bit is: Even with the data on a remote disk the overhead is not really that much more. The files are typically less than 100k in size, even doing an rsync or nc|tar from one box to another is REALLY slow with that many small files. tmpfs et al: The jobs usually reads the data once directly form the NFS share and processes it, it's not going back to this file again (well at least not this process). So I do think NFS would not be that bad although it won't be the optimal, but it's usually the easiest for the user to use and quite generic in the approach. Of course one could devise other and much better schemes, but you have always find a good compromise between usability and man-power needed to tailor a specific scheme. Thanks! Carsten From tjrc at sanger.ac.uk Wed Jul 2 04:29:03 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: On 2 Jul 2008, at 12:16 pm, Jon Aquilina wrote: > one thing must not be forgotten though. in regards to pkging stuff > for the > ubuntu variation once someone like you and me you upload it for > someone > higher up on the chain to check and upload to the servers. so > basically > someone is checking what someone else has packaged. For maintainers that aren't Debian Developers (or the Ubuntu equivalent), yes, that's true. In my case, I am formally a Debian Developer (have been for more than 10 years), so my GPG signature on a binary upload is considered good enough, and it's not checked further, other than for really serious failures like a failure of the package to build from source on one of the autobuilders. I do check them myself fairly thoroughly though - lintian is a very useful tool for checking that packages comply with policy. Besides, the packages I maintain for Debian are things I use heavily in my day job, so it's in my own interest to make sure they work properly! I suspect the amount of checking that goes on in the universe and multiverse parts of Ubuntu is pretty minimal - I believe the packages are basically straight rebuilds of the Debian source packages using the Ubuntu autobuilder network, so that the library dependencies are correct. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From perry at piermont.com Wed Jul 2 04:32:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: (Jon Aquilina's message of "Wed\, 2 Jul 2008 07\:37\:21 +0200") References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> Message-ID: <87wsk4ed20.fsf@snark.cb.piermont.com> "Jon Aquilina" writes: > if i use blender how nicely does it work in a cluster? I believe it works quite well. Perry From perry at piermont.com Wed Jul 2 04:50:48 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> (Tim Cutts's message of "Wed\, 2 Jul 2008 09\:19\:50 +0100") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> Message-ID: <87od5gec87.fsf@snark.cb.piermont.com> Tim Cutts writes: > On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote: > >> OK, we have 1342 nodes which act as servers as well as clients. Every >> node exports a single local directory and all other nodes can mount >> this. >> >> What we do now to optimize the available bandwidth and IOs is spread >> millions of files according to a hash algorithm to all nodes (multiple >> copies as well) and then run a few 1000 jobs opening one file from one >> box then one file from the other box and so on. With a short autofs >> timeout that ought to work. Typically it is possible that a single >> process opens about 10-15 files per second, i.e. making 10-15 mounts >> per >> second. With 4 parallel process per node that's 40-60 mounts/second. >> With a timeout of 5 seconds we should roughly have 200-300 concurrent >> mounts (on average, no idea abut the variance). > > Please tell me you're not serious! The overheads of just performing > the NFS mounts are going to kill you, never mind all the network > traffic going all over the place. > > Since you've distributed the files to the local disks of the nodes, > surely the right way to perform this work is to schedule the > computations so that each node works on the data on its own local > disk, and doesn't have to talk networked storage at all? Or don't you > know in advance which files a particular job is going to need? Perhaps it makes sense given their job load. Perhaps it doesn't. If they need access to far more storage than a single node can hold, it might make sense. If individual nodes need lots of I/O but only on a very rare basis, so the disk bandwidth would be unused on most nodes most of the time if they were doing everything locally, perhaps it might make sense. I'll agree that it isn't an obviously good solution to most workloads, but we don't really know what their workload is like so we can't say that this is a bad move ab initio. Perry From atchley at myri.com Wed Jul 2 05:07:27 2008 From: atchley at myri.com (Scott Atchley) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B6501.5000108@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <486B6501.5000108@aei.mpg.de> Message-ID: <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> On Jul 2, 2008, at 7:22 AM, Carsten Aulbert wrote: > Bogdan Costescu wrote: >> >> Have you considered using a parallel file system ? > > We looked a bit into a few, but would love to get any input from > anyone > on that. What we found so far was not really convincing, e.g. > glusterFS > at that time was not really stable, lustre was too easy to crash - > at l > east at that time, ... Hi Carsten, I have not looked at GlusterFS at all. I have worked with Lustre and PVFS2 (I wrote the shims to allow them to run on MX). Although I believe Lustre's robustness is very good these days, I do not believe that it will not work in your setting. I think that they currently do not recommend mounting a client on a node that is also working as a server as you are doing with NFS. I believe it is due to memory contention leading to deadlock. PVFS2 does, however, support your scenario where each node is a server and can be mounted locally as well. PVFS2 servers run in userspace and can be easily debugged. If you are using MPI-IO, it integrates nicely as well. Even so, keep in mind that using each node as a server will consume network resources and will compete with MPI communications. Scott From perry at piermont.com Wed Jul 2 05:28:48 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> (Carsten Aulbert's message of "Wed\, 02 Jul 2008 09\:26\:58 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <87d4lweagv.fsf@snark.cb.piermont.com> Carsten Aulbert writes: >> The clients are connecting from ports below 1024 because Berkeley set >> up a hack in the original BSD stack so that only root could open ports >> below 1024. This way, you could "know" the process on the remote host >> was a root process, thus you could feel "secure" [sic]. It doesn't add >> any real security any more, but it is also not the cause of any >> problem you are experiencing. > > We might run out of "secure" ports. A given client would need to be forming over 1000 connections to a given server NFS port for that to be a problem. This is not going to happen. The protocol doesn't work in such a way as to cause that to occur. >> We can help you figure this out, but you will have to give a lot more >> detail about the problem. Please describe your network setup. How many >> servers do you have? How many clients? How many file systems are those >> servers exporting? How many is a typical client mounting, and why? >> Start there and we can try to move forward. > > OK, we have 1342 nodes which act as servers as well as clients. Every > node exports a single local directory and all other nodes can mount this. Okay. In this instance, you're not going to run out of ports. Every machine might get 1341 connections from clients, and every machine might make 1341 client connections going out to other machines. None of this should cause you to run out of ports, period. If you don't understand that, refer back to my original message. A TCP socket is a unique 4-tuple. The host:port 2-tuples are NOT unique and not an exhaustible resource. There is is no way that your case is going to even remotely exhaust the 4-tuple space. > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) and then run a few 1000 jobs opening one file from one > box then one file from the other box and so on. With a short autofs > timeout that ought to work. I think there is no point in having a short autofs timeout, and you're likely to radically increase the overhead when you open files. > Our tests so far have shown that sometimes a node keeps a few mounts > open (autofs4 problems AFAIK) and at some point is not able to mount > more shares. Usually this occurs at about 350 mounts and we are not yet > 100% sure if we are running out of secure ports. You probably aren't running out of ports per se. You may be running out of OS resources, like file descriptors or something similar. -- Perry E. Metzger perry@piermont.com From landman at scalableinformatics.com Wed Jul 2 05:31:15 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <486B7513.1020509@scalableinformatics.com> Carsten Aulbert wrote: >> The clients are connecting from ports below 1024 because Berkeley set >> up a hack in the original BSD stack so that only root could open ports >> below 1024. This way, you could "know" the process on the remote host >> was a root process, thus you could feel "secure" [sic]. It doesn't add >> any real security any more, but it is also not the cause of any >> problem you are experiencing. > > We might run out of "secure" ports. But you can force NFS to connect from the ports above 1024 so this shouldn't be an issue. [...] > OK, we have 1342 nodes which act as servers as well as clients. Every There is a short writeup on this with quotes from Bruce Allen in HPCwire. Too bad you didn't opt for JackRabbits there :) > node exports a single local directory and all other nodes can mount this. Fine, nothing terrible. > > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) and then run a few 1000 jobs opening one file from one > box then one file from the other box and so on. With a short autofs Hmmm.... So you want to "track" spatial metadata (e.g. where the file is) according to some hash function that each node can execute, and then once this is known, perform IO. So, for example (as a relatively naive/simple minded version) some quick Perl pseudo-code ... # .... my $hash = MD5SUM($filename); my $machine = $hash % $Number_of_machines; my $machine_name= $name[$machine]; my $full_path = sprintf("/%s/%s",$machine_name,$filename); open(my $fh, ">".$full_path) or die "FATAL ERROR: unable to open $full_path\n"; # .... Is this about right? > timeout that ought to work. Typically it is possible that a single > process opens about 10-15 files per second, i.e. making 10-15 mounts per > second. With 4 parallel process per node that's 40-60 mounts/second. Hmmm ... mount latency we have seen is ~0.1 seconds or so, so I can believe 10-14/second. Note that due to strange latency effects in larger machines, we have also seen an automount take 0.5 seconds and more. Some delays due to name resolution. Never fully traced it, but this was on a 32 node cluster. You are talking a little bigger. > With a timeout of 5 seconds we should roughly have 200-300 concurrent > mounts (on average, no idea abut the variance). 200-300 mounts across 1342 nodes, sure. 200-300 mounts of one file system on one server from 200-300 client machines? I have some doubts ... > Our tests so far have shown that sometimes a node keeps a few mounts > open (autofs4 problems AFAIK) and at some point is not able to mount > more shares. Usually this occurs at about 350 mounts and we are not yet > 100% sure if we are running out of secure ports. Older kernels couldn't do more than 256 mounts. Not sure when/if this limit has been raised. This is a different problem though. If you have N machines mounting a file system, then you get N requests on port 2049 or similar (the inbound NFS port). You don't run out of secure ports. If the issue is that you are running 200+ outgoing mount requests from one machine, you will likely have a delay issue as you cross the 256 mount number (if your kernel hasn't been patched ... not sure if/when this has/will change). > All our boxes export now with "insecure" option (NFSv3), but our clients > all connect from a "secure" port, anyone here who might give us a hint > how to force this in Linux? See if you can get less than 256 mounts working well. If so, and it only starts falling off above 256 mounts, this would be important to know. Joe > > Thanks a lot > > Carsten > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From carsten.aulbert at aei.mpg.de Wed Jul 2 05:55:21 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87d4lweagv.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> Message-ID: <486B7AB9.9050202@aei.mpg.de> Hi Perry, Perry E. Metzger wrote: > > Okay. In this instance, you're not going to run out of ports. Every > machine might get 1341 connections from clients, and every machine > might make 1341 client connections going out to other machines. None > of this should cause you to run out of ports, period. If you don't > understand that, refer back to my original message. A TCP socket is a > unique 4-tuple. The host:port 2-tuples are NOT unique and not an > exhaustible resource. There is is no way that your case is going to > even remotely exhaust the 4-tuple space. Well, I understand your reasoning, but that's contradicted to what we do see netstat -an|awk '/2049/ {print $4}'|sed 's/10.10.13.41://'|sort -n shows us the follwing: 665 666 667 668 669 670 671 672 673 674 675 676 677 [...] 1017 1018 1019 1020 1021 1022 1023 Which corresponds exactly to the maximum achievable mounts of 358 right now. Besides, I'm far from being an expert on TCP/IP, but is it possible for a local process to bind to a port which is already in use but to another host? I don't think so, but may be wrong. Cheers Carsten From eagles051387 at gmail.com Wed Jul 2 06:05:09 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <20080702125625.GE47386@gby2.aoes.com> References: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> <20080702125625.GE47386@gby2.aoes.com> Message-ID: like you said in regards to maya money is a factor for me. if i do descide to setup a rendering cluster my problem is going to be finding someone who can make a small video in blender for me so i can render it. On 7/2/08, Greg Byshenk wrote: > > On Wed, Jul 02, 2008 at 07:32:55AM -0400, Perry E. Metzger wrote: > > "Jon Aquilina" writes: > > > > if i use blender how nicely does it work in a cluster? > > > I believe it works quite well. > > > The "Helmer" minicluster uses blender, and appears > to perform well. > > Also, Maya's 'muster' engine runs under Linux, and quite successfully. We > use it in a mixed environment, where the render pool consists of both > Windows workstations and Linux cluster nodes. > > Note, though, that like other commercial 3D products, Maya is expensive, > and may not be suitable for a student project. > > -- > Greg Byshenk > > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/1525345d/attachment.html From mark.kosmowski at gmail.com Wed Jul 2 06:11:46 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] Re: energy costs and poor grad students Message-ID: I'm in the US. I'm almost, but not quite ready for production runs - still learning the software / computational theory. I'm the first person in the research group (physical chemistry) to try to learn plane wave methods of solid state calculation as opposed to isolated atom-centered approximations and periodic atom centered calculations. It is turning out that the package I have spent the most time learning is perhaps not the best one for what we are doing. For a variety of reasons, many of which more off-topic than tac nukes and energy efficient washing machines ;) , I'm doing my studies part-time while working full-time in industry. I think I have come to a compromise that can keep me in business. Until I have a better understanding of the software and am ready for production runs, I'll stick to a small system that can be run on one node and leave the other two powered down. I've also applied for an adjunt instructor position at a local college for some extra cash and good experience. When I'm ready for production runs I can either just bite the bullet and pay the electricity bill or seek computer time elsewhere. Thanks for the encouragement, Mark E. Kosmowski On 7/1/08, ariel sabiguero yawelak wrote: > Well Mark, don't give up! > I am not sure which one is your application domain, but if you require 24x7 > computation, then you should not be hosting that at home. > On the other hand, if you are not doing real computation and you just have a > testbed at home, maybe for debugging your parallel applications or something > similar, you might be interested in a virtualized solution. Several years > ago, I used to "debug" some neural networks at home, but training sessions > (up to two weeks of training) happened at the university. > I would suggest to do something like that. > You can always scale-down your problem in several phases and save the > complete data-set / problem for THE RUN. > > You are not being a heretic there, but suffering energy costs ;-) > In more places that you may believe, useful computing nodes are being > replaced just because of energy costs. Even in some application domains you > can even loose computational power if you move from 4 nodes into a single > quad-core (i.e. memory bandwidth problems). I know it is very nice to be > able to do everything at home.. but maybe before dropping your studies or > working overtime to pay the electricity bill, you might want to reconsider > the fact of collapsing your phisical deploy into a single virtualized > cluster. (or just dispatch several threads/processes in a single system). > If you collapse into a single system you have only 1 mainboard, one HDD, one > power source, one processor (physically speaking), .... and you can achieve > almost the performance of 4 systems in one, consuming the power of.... well > maybe even less than a single one. I don't want to go into discussions about > performance gain/loose due to the variation of the hardware architecture. > Invest some bucks (if you haven't done that yet) in a good power source. > Efficiency of OEM unbranded power sources is realy pathetic. may be 45-50% > efficiency, while a good power source might be 75-80% efficient. Use the > energy for computing, not for heating your house. > What I mean is that you could consider just collapsing a complete "small" > cluster into single system. If your application is CPU-bound and not I/O > bound, VMware Server could be an option, as it is free software > (unfortunately not open, even tough some patches can be done on the > drivers). I think it is not possible to publish benchmarking data about > VMware, but I can tell you that in long timescales, the performance you get > in the host OS is similar than the one of the guest OS. There are a lot of > problems related to jitter, from crazy clocks to delays, but if your > application is not sensitive to that, then you are Ok. > Maybe this is not a solution, but you can provide more information regarding > your problem before quitting... > > my 2 cents.... > > ariel > > Mark Kosmowski escribi?: > > > At some point there a cost-benefit analysis needs to be performed. If > > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > > single-core land still and do not yet differentiate between CPU and > > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > > resources and would be better off buying new machines and physically > > transferring the RAM to and from them or running more jobs each > > distributed across fewer CPUs. Or saving on my electricity bill and > > powering down some nodes. > > > > As heretical as this last sounds, I'm tempted to throw in the towel on > > my PhD studies because I can no longer afford the power to run my > > three node cluster at home. Energy costs may end up being the straw > > that breaks this camel's back. > > > > Mark E. Kosmowski > > > > > > > > > From: "Jon Aquilina" > > > > > > > > > > > > > > > not sure if this applies to all kinds of senarios that clusters are used > in > > > but isnt the more ram you have the better? > > > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > > > > > Toon, > > > > > > > > Can you drop a line on how important RAM is for weather forecasting in > > > > latest type of calculations you're performing? > > > > > > > > Thanks, > > > > Vincent > > > > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > > > Jim Lux wrote: > > > > > > > > > > > > > Yep. And for good reason. Even a big DoD job is still tiny in > Nvidia's > > > > > > > > > > > > > > > > scale of operations. We face this all the time with NASA work. > > > > > > Semiconductor manufacturers have no real reason to produce > special purpose > > > > > > or customized versions of their products for space use, because > they can > > > > > > sell all they can make to the consumer market. More than once, > I've had a > > > > > > phone call along the lines of this: > > > > > > "Jim: I'm interested in your new ABC321 part." > > > > > > "Rep: Great. I'll just send the NDA over and we can talk about > it." > > > > > > "Jim: Great, you have my email and my fax # is..." > > > > > > "Rep: By the way, what sort of volume are you going to be using?" > > > > > > "Jim: Oh, 10-12.." > > > > > > "Rep: thousand per week, excellent..." > > > > > > "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe > every > > > > > > year." > > > > > > "Rep: Oh..." > > > > > > {Well, to be fair, it's not that bad, they don't hang up on you.. > > > > > > > > > > > > > > > > > > > > > > > Since about a year, it's been clear to me that weather forecasting > (i.e., > > > > > running a more or less sophisticated atmospheric model to provide > weather > > > > > predictions) is going to be "mainstream" in the sense that every > business > > > > > that needs such forecasts for its operations can simply run them > in-house. > > > > > > > > > > Case in point: I bought a $1100 HP box (the obvious target group > being > > > > > teenage downloaders) which performs the HIRLAM limited area model > *on the > > > > > grid that we used until October 2006* in December last year. > > > > > > > > > > It's about twice as slow as our then-operational 50-CPU Sun Fire > 15K. > > > > > > > > > > I wonder what effect this will have on CPU developments ... > > > > > > > > > > -- > > > > > Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 > 214290 > > > > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > > > At home: http://moene.indiv.nluug.nl/~toon/ > > > > > Progress of GNU Fortran: > http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf@beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > -- > > > Jonathan Aquilina > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > From landman at scalableinformatics.com Wed Jul 2 06:44:20 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: References: Message-ID: <486B8634.6020309@scalableinformatics.com> Hi Mark Mark Kosmowski wrote: > I'm in the US. I'm almost, but not quite ready for production runs - > still learning the software / computational theory. I'm the first > person in the research group (physical chemistry) to try to learn > plane wave methods of solid state calculation as opposed to isolated > atom-centered approximations and periodic atom centered calculations. Heh... my research group in grad school went through that transition in the mid 90s. Went from an LCAO-type simulation to CP like methods. We needed a t3e to run those (then). Love to compare notes and see which code you are using someday. On-list/off-list is fine. > It is turning out that the package I have spent the most time learning > is perhaps not the best one for what we are doing. For a variety of > reasons, many of which more off-topic than tac nukes and energy > efficient washing machines ;) , I'm doing my studies part-time while > working full-time in industry. More power to ya! I did mine that way too ... the writing was the hardest part. Just don't lose focus, or stop believing you can do it. When the light starts getting visible at the end of the process, it is quite satisfying. I have other words to describe this, but they require a beer lever to get them out of me ... > I think I have come to a compromise that can keep me in business. > Until I have a better understanding of the software and am ready for > production runs, I'll stick to a small system that can be run on one > node and leave the other two powered down. I've also applied for an > adjunt instructor position at a local college for some extra cash and > good experience. When I'm ready for production runs I can either just > bite the bullet and pay the electricity bill or seek computer time > elsewhere. Give us a shout when you want to try the time on a shared resource. Some folks here may be able to make good suggestions. RGB is a physics guy at Duke, doing lots of simulations, and might know of resources. Others here might as well. Joe > > Thanks for the encouragement, > > Mark E. Kosmowski > > On 7/1/08, ariel sabiguero yawelak wrote: >> Well Mark, don't give up! >> I am not sure which one is your application domain, but if you require 24x7 >> computation, then you should not be hosting that at home. >> On the other hand, if you are not doing real computation and you just have a >> testbed at home, maybe for debugging your parallel applications or something >> similar, you might be interested in a virtualized solution. Several years >> ago, I used to "debug" some neural networks at home, but training sessions >> (up to two weeks of training) happened at the university. >> I would suggest to do something like that. >> You can always scale-down your problem in several phases and save the >> complete data-set / problem for THE RUN. >> >> You are not being a heretic there, but suffering energy costs ;-) >> In more places that you may believe, useful computing nodes are being >> replaced just because of energy costs. Even in some application domains you >> can even loose computational power if you move from 4 nodes into a single >> quad-core (i.e. memory bandwidth problems). I know it is very nice to be >> able to do everything at home.. but maybe before dropping your studies or >> working overtime to pay the electricity bill, you might want to reconsider >> the fact of collapsing your phisical deploy into a single virtualized >> cluster. (or just dispatch several threads/processes in a single system). >> If you collapse into a single system you have only 1 mainboard, one HDD, one >> power source, one processor (physically speaking), .... and you can achieve >> almost the performance of 4 systems in one, consuming the power of.... well >> maybe even less than a single one. I don't want to go into discussions about >> performance gain/loose due to the variation of the hardware architecture. >> Invest some bucks (if you haven't done that yet) in a good power source. >> Efficiency of OEM unbranded power sources is realy pathetic. may be 45-50% >> efficiency, while a good power source might be 75-80% efficient. Use the >> energy for computing, not for heating your house. >> What I mean is that you could consider just collapsing a complete "small" >> cluster into single system. If your application is CPU-bound and not I/O >> bound, VMware Server could be an option, as it is free software >> (unfortunately not open, even tough some patches can be done on the >> drivers). I think it is not possible to publish benchmarking data about >> VMware, but I can tell you that in long timescales, the performance you get >> in the host OS is similar than the one of the guest OS. There are a lot of >> problems related to jitter, from crazy clocks to delays, but if your >> application is not sensitive to that, then you are Ok. >> Maybe this is not a solution, but you can provide more information regarding >> your problem before quitting... >> >> my 2 cents.... >> >> ariel >> >> Mark Kosmowski escribi?: >> >>> At some point there a cost-benefit analysis needs to be performed. If >>> my cluster at peak usage only uses 4 Gb RAM per CPU (I live in >>> single-core land still and do not yet differentiate between CPU and >>> core) and my nodes all have 16 Gb per CPU then I am wasting RAM >>> resources and would be better off buying new machines and physically >>> transferring the RAM to and from them or running more jobs each >>> distributed across fewer CPUs. Or saving on my electricity bill and >>> powering down some nodes. >>> >>> As heretical as this last sounds, I'm tempted to throw in the towel on >>> my PhD studies because I can no longer afford the power to run my >>> three node cluster at home. Energy costs may end up being the straw >>> that breaks this camel's back. >>> >>> Mark E. Kosmowski >>> >>> >>> >>>> From: "Jon Aquilina" >>>> >>>> >>> >>> >>>> not sure if this applies to all kinds of senarios that clusters are used >> in >>>> but isnt the more ram you have the better? >>>> >>>> On 6/30/08, Vincent Diepeveen wrote: >>>> >>>> >>>>> Toon, >>>>> >>>>> Can you drop a line on how important RAM is for weather forecasting in >>>>> latest type of calculations you're performing? >>>>> >>>>> Thanks, >>>>> Vincent >>>>> >>>>> >>>>> On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: >>>>> >>>>> Jim Lux wrote: >>>>> >>>>> >>>>>> Yep. And for good reason. Even a big DoD job is still tiny in >> Nvidia's >>>>>> >>>>>>> scale of operations. We face this all the time with NASA work. >>>>>>> Semiconductor manufacturers have no real reason to produce >> special purpose >>>>>>> or customized versions of their products for space use, because >> they can >>>>>>> sell all they can make to the consumer market. More than once, >> I've had a >>>>>>> phone call along the lines of this: >>>>>>> "Jim: I'm interested in your new ABC321 part." >>>>>>> "Rep: Great. I'll just send the NDA over and we can talk about >> it." >>>>>>> "Jim: Great, you have my email and my fax # is..." >>>>>>> "Rep: By the way, what sort of volume are you going to be using?" >>>>>>> "Jim: Oh, 10-12.." >>>>>>> "Rep: thousand per week, excellent..." >>>>>>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe >> every >>>>>>> year." >>>>>>> "Rep: Oh..." >>>>>>> {Well, to be fair, it's not that bad, they don't hang up on you.. >>>>>>> >>>>>>> >>>>>>> >>>>>> Since about a year, it's been clear to me that weather forecasting >> (i.e., >>>>>> running a more or less sophisticated atmospheric model to provide >> weather >>>>>> predictions) is going to be "mainstream" in the sense that every >> business >>>>>> that needs such forecasts for its operations can simply run them >> in-house. >>>>>> Case in point: I bought a $1100 HP box (the obvious target group >> being >>>>>> teenage downloaders) which performs the HIRLAM limited area model >> *on the >>>>>> grid that we used until October 2006* in December last year. >>>>>> >>>>>> It's about twice as slow as our then-operational 50-CPU Sun Fire >> 15K. >>>>>> I wonder what effect this will have on CPU developments ... >>>>>> >>>>>> -- >>>>>> Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 >> 214290 >>>>>> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >>>>>> At home: http://moene.indiv.nluug.nl/~toon/ >>>>>> Progress of GNU Fortran: >> http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf@beowulf.org >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>>> >>>>> >>>>> >>>> -- >>>> Jonathan Aquilina >>>> >>>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From henning.fehrmann at aei.mpg.de Wed Jul 2 06:42:28 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B7AB9.9050202@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <20080702134228.GA5152@gretchen.aei.uni-hannover.de> > Which corresponds exactly to the maximum achievable mounts of 358 right 359 ;) If the number of mounts is smaller the ports are randomly used in this range. It would be convenient to enter the insecure area. Using the option insecure for the NFS exports is apparently not sufficient. Also every nfs server is connected from a distinct port on the client side. Two mounts to a single server might end up on the same port. Cheers Henning From gerry.creager at tamu.edu Wed Jul 2 07:09:34 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <486B6501.5000108@aei.mpg.de> <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> Message-ID: <486B8C1E.2090007@tamu.edu> Scott Atchley wrote: > On Jul 2, 2008, at 7:22 AM, Carsten Aulbert wrote: > >> Bogdan Costescu wrote: >>> >>> Have you considered using a parallel file system ? >> >> We looked a bit into a few, but would love to get any input from anyone >> on that. What we found so far was not really convincing, e.g. glusterFS >> at that time was not really stable, lustre was too easy to crash - at l >> east at that time, ... > > Hi Carsten, > > I have not looked at GlusterFS at all. I have worked with Lustre and > PVFS2 (I wrote the shims to allow them to run on MX). > > Although I believe Lustre's robustness is very good these days, I do not > believe that it will not work in your setting. I think that they > currently do not recommend mounting a client on a node that is also > working as a server as you are doing with NFS. I believe it is due to > memory contention leading to deadlock. Lustre is good enough that it's the parallel FS at TACC for the Ranger cluster. And, I've had no real problems as a user thereof. We're brining up glustre on our new cluster here ( CentOS/RHEL5, not debian ). We looked at zfs but didn't have sufficient experience to go that path. > PVFS2 does, however, support your scenario where each node is a server > and can be mounted locally as well. PVFS2 servers run in userspace and > can be easily debugged. If you are using MPI-IO, it integrates nicely as > well. Even so, keep in mind that using each node as a server will > consume network resources and will compete with MPI communications. Someone at NCAR recently suggested we review PVFS2. I'm gonna do it as soon as I get a free moment on vacation. -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 07:12:09 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87d4lweagv.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > A given client would need to be forming over 1000 connections to a > given server NFS port for that to be a problem. Not quite. The reserved ports that are free for use (512 and up) are not all free to be taken by NFS as it pleases - there are many daemons that have to use those well-known ports. F.e. some years ago a common complaint was that the CUPS daemon (port 631) was often conflicting with NFS client mounts; I think that what was chosen by various distributions was the easy way out - make the NFS client only allocate ports starting at 650 or so. > Every machine might get 1341 connections from clients, and every > machine might make 1341 client connections going out to other > machines None of this should cause you to run out of ports, period. With all due respect, I think that you are not quite familiar with the NFS implementation on Linux (and maybe other NFS implementations). What you describe is the theoretical use of TCP connections; the way NFS on Linux uses TCP is not quite as you imagine: there is one port taken on the client for each NFS mount and that port is not reused. Also mounting 2 different mount points from the same NFS server to the same NFS client uses 2 TCP ports on the client side - at least with NFS v2 and v3; for v4 I think that there is only one connection between a client and a server independent on the number of mount points. I do encourage you to subscribe to the Linux NFS list if you want to learn more; I've been there for a long time (unfortunately not anymore...) and the people, especially the developers, were very helpful. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From ntmoore at gmail.com Wed Jul 2 07:22:37 2008 From: ntmoore at gmail.com (Nathan Moore) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: <486B8634.6020309@scalableinformatics.com> References: <486B8634.6020309@scalableinformatics.com> Message-ID: <6009416b0807020722l56f05affs878b762d285bba9d@mail.gmail.com> Does your university have public computer labs? Do the computers run some variant of Unix? At UMN, where I did my grad work in physics, there were a number of semi-public "Scientific Visualization" or "Large Data Analysis" labs that were hosted in the local supercomputer center. The center there has a number of large machines that you had to apply and give a really good rationale to use, but the smaller development labs (with 2-way to 10-way sunfires, similar sized sgi's, linux machines, etc) basically sat vacant 5-6 days per week. Some of the labs had a pbs queue, some had a condor queue, and some just required that background jobs be "nice +19 ./a.out". My graduate work required several large parametric studies which computationally looked like lots of monte-carlo-ish runs which could be done in parallel. The beauty of this was that no message passing was required, so, if there were 23 cores open one evening at 6pm, and assuming no one would be doing work overnight (for the next 14 hours), I could start 23 14 hour jobs at 6pm and have a little less than 2 weeks of cpu work done by 8am the next morning. I used (and mentioned) the technique in the paper, http://www.pnas.org/cgi/content/full/101/37/13431 (search for "computational impotence"). This only works though if your university's computer labs run a unix-ish os, and if the sysadmins are progressive. At the school where I presently teach similar endeavors have been much harder to start-up. Nathan Moore On Wed, Jul 2, 2008 at 8:44 AM, Joe Landman wrote: > Hi Mark > > Mark Kosmowski wrote: > >> I'm in the US. I'm almost, but not quite ready for production runs - >> still learning the software / computational theory. I'm the first >> person in the research group (physical chemistry) to try to learn >> plane wave methods of solid state calculation as opposed to isolated >> atom-centered approximations and periodic atom centered calculations. >> > > Heh... my research group in grad school went through that transition in the > mid 90s. Went from an LCAO-type simulation to CP like methods. We needed a > t3e to run those (then). > > Love to compare notes and see which code you are using someday. > On-list/off-list is fine. > > It is turning out that the package I have spent the most time learning >> is perhaps not the best one for what we are doing. For a variety of >> reasons, many of which more off-topic than tac nukes and energy >> efficient washing machines ;) , I'm doing my studies part-time while >> working full-time in industry. >> > > More power to ya! I did mine that way too ... the writing was the hardest > part. Just don't lose focus, or stop believing you can do it. When the > light starts getting visible at the end of the process, it is quite > satisfying. > > I have other words to describe this, but they require a beer lever to get > them out of me ... > > I think I have come to a compromise that can keep me in business. >> Until I have a better understanding of the software and am ready for >> production runs, I'll stick to a small system that can be run on one >> node and leave the other two powered down. I've also applied for an >> adjunt instructor position at a local college for some extra cash and >> good experience. When I'm ready for production runs I can either just >> bite the bullet and pay the electricity bill or seek computer time >> elsewhere. >> > > Give us a shout when you want to try the time on a shared resource. Some > folks here may be able to make good suggestions. RGB is a physics guy at > Duke, doing lots of simulations, and might know of resources. Others here > might as well. > > Joe > > > >> Thanks for the encouragement, >> >> Mark E. Kosmowski >> >> On 7/1/08, ariel sabiguero yawelak wrote: >> >>> Well Mark, don't give up! >>> I am not sure which one is your application domain, but if you require >>> 24x7 >>> computation, then you should not be hosting that at home. >>> On the other hand, if you are not doing real computation and you just >>> have a >>> testbed at home, maybe for debugging your parallel applications or >>> something >>> similar, you might be interested in a virtualized solution. Several years >>> ago, I used to "debug" some neural networks at home, but training >>> sessions >>> (up to two weeks of training) happened at the university. >>> I would suggest to do something like that. >>> You can always scale-down your problem in several phases and save the >>> complete data-set / problem for THE RUN. >>> >>> You are not being a heretic there, but suffering energy costs ;-) >>> In more places that you may believe, useful computing nodes are being >>> replaced just because of energy costs. Even in some application domains >>> you >>> can even loose computational power if you move from 4 nodes into a single >>> quad-core (i.e. memory bandwidth problems). I know it is very nice to be >>> able to do everything at home.. but maybe before dropping your studies or >>> working overtime to pay the electricity bill, you might want to >>> reconsider >>> the fact of collapsing your phisical deploy into a single virtualized >>> cluster. (or just dispatch several threads/processes in a single system). >>> If you collapse into a single system you have only 1 mainboard, one HDD, >>> one >>> power source, one processor (physically speaking), .... and you can >>> achieve >>> almost the performance of 4 systems in one, consuming the power of.... >>> well >>> maybe even less than a single one. I don't want to go into discussions >>> about >>> performance gain/loose due to the variation of the hardware architecture. >>> Invest some bucks (if you haven't done that yet) in a good power source. >>> Efficiency of OEM unbranded power sources is realy pathetic. may be >>> 45-50% >>> efficiency, while a good power source might be 75-80% efficient. Use the >>> energy for computing, not for heating your house. >>> What I mean is that you could consider just collapsing a complete "small" >>> cluster into single system. If your application is CPU-bound and not I/O >>> bound, VMware Server could be an option, as it is free software >>> (unfortunately not open, even tough some patches can be done on the >>> drivers). I think it is not possible to publish benchmarking data about >>> VMware, but I can tell you that in long timescales, the performance you >>> get >>> in the host OS is similar than the one of the guest OS. There are a lot >>> of >>> problems related to jitter, from crazy clocks to delays, but if your >>> application is not sensitive to that, then you are Ok. >>> Maybe this is not a solution, but you can provide more information >>> regarding >>> your problem before quitting... >>> >>> my 2 cents.... >>> >>> ariel >>> >>> Mark Kosmowski escribi?: >>> >>> At some point there a cost-benefit analysis needs to be performed. If >>>> my cluster at peak usage only uses 4 Gb RAM per CPU (I live in >>>> single-core land still and do not yet differentiate between CPU and >>>> core) and my nodes all have 16 Gb per CPU then I am wasting RAM >>>> resources and would be better off buying new machines and physically >>>> transferring the RAM to and from them or running more jobs each >>>> distributed across fewer CPUs. Or saving on my electricity bill and >>>> powering down some nodes. >>>> >>>> As heretical as this last sounds, I'm tempted to throw in the towel on >>>> my PhD studies because I can no longer afford the power to run my >>>> three node cluster at home. Energy costs may end up being the straw >>>> that breaks this camel's back. >>>> >>>> Mark E. Kosmowski >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/9fbf614d/attachment.html From perry at piermont.com Wed Jul 2 07:26:13 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B7AB9.9050202@aei.mpg.de> (Carsten Aulbert's message of "Wed\, 02 Jul 2008 14\:55\:21 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <874p78e516.fsf@snark.cb.piermont.com> Skip to the bottom for advice on how to make NFS only use non-prived ports. My guess is still that it isn't priv ports that are causing trouble, but I describe at the bottom what you need to do to get rid of that issue entirely. I'd advise reading the rest, but the part about how to disable the stuff is after the --- near the bottom. Carsten Aulbert writes: > Well, I understand your reasoning, but that's contradicted to what we do see > > netstat -an|awk '/2049/ {print $4}'|sed 's/10.10.13.41://'|sort -n > > shows us the follwing: Are those all mounts to ONE HOST? Because if they are, you're going to run out of ports. If you're connecting to multiple hosts should you be okay, but you certainly could run out of ports between two hosts -- you only have 1023 prived connections from a given host to a single port on another box. Of course, one might validly ask why the other 650 odd ports aren't usable -- clearly they should be, right? The limit is 1023, not 358. It might be that there is some Linux oddness here. Anyway, this shouldn't be a problem if you're connecting to MANY servers, but maybe there's some linux weirdness here. See below. > Which corresponds exactly to the maximum achievable mounts of 358 right > now. Besides, I'm far from being an expert on TCP/IP, but is it possible > for a local process to bind to a port which is already in use but to > another host? Of course! You can use the same local port number with connections to different remote hosts. You can even use the same local port number with multiple connections to the same remote host provided the remote host is using different port numbers on its end. Every open socket is a 4-tuple of localip:localport:remoteip:remoteport Provided two sockets don't share that 4-tuple, you can have both. Now, a given OS may screw up how they handle this, but the *protocol* certainly permits it. Perhaps you're right and Linux isn't dealing with this gracefully. We can check that. > I don't think so, but may be wrong. Then how does an SMTP server handle thousands of simultaneous connections all coming to port 25? :) In any case, this is what the NFS FAQ says. It does mention the priv port problem, but only in a context in which makes me think it is talking about two given hosts and not one client and many hosts. However, I might be wrong. See below: >From http://nfs.sourceforge.net/ B3. Why can't I mount more than 255 NFS file systems on my client? Why is it sometimes even less than 255? A. On Linux, each mounted file system is assigned a major number, which indicates what file system type it is (eg. ext3, nfs, isofs); and a minor number, which makes it unique among the file systems of the same type. In kernels prior to 2.6, Linux major and minor numbers have only 8 bits, so they may range numerically from zero to 255. Because a minor number has only 8 bits, a system can mount only 255 file systems of the same type. So a system can mount up to 255 NFS file systems, another 255 ext3 file system, 255 more iosfs file systems, and so on. Kernels after 2.6 have 20-bit wide minor numbers, which alleviate this restriction. For the Linux NFS client, however, the problem is somewhat worse because it is an anonymous file system. Local disk-based file systems have a block device associated with them, but anonymous file systems do not. /proc, for example, is an anonymous file system, and so are other network file systems like AFS. All anonymous file systems share the same major number, so there can be a maximum of only 255 anonymous file systems mounted on a single host. Usually you won't need more than ten or twenty total NFS mounts on any given client. In some large enterprises, though, your work and users might be spread across hundreds of NFS file servers. To work around the limitation on the number of NFS file systems you can mount on a single host, we recommend that you set up and run one of the automounter daemons for Linux. An automounter finds and mounts file systems as they are needed, and unmounts any that it finds are inactive. You can find more information on Linux automounters here. You may also run into a limit on the number of privileged network ports on your system. The NFS client uses a unique socket with its own port number for each NFS mount point. Using an automounter helps address the limited number of available ports by automatically unmounting file systems that are not in use, thus freeing their network ports. NFS version 4 support in the Linux NFS client uses a single socket per client-server pair, which also helps increase the allowable number of NFS mount points on a client. Now, until you brought this up, I would have guessed that this meant you could run out of priv ports between host A and host B -- i.e. host B is the client, is connecting to one port on host A, and is trying to mount more than 1023 file systems on host A and fails because it runs out of priv ports. However, if your test is not between two hosts but is rather between multiple hosts, perhaps for whatever reason Linux is braindead and is not allowing you to re-use the same local socket ports. We can diagnose that later. --- So, here are the things you need to do to totally remove the priv ports thing from the situation: 1) On the server, in your exports file you have to put the "insecure" option onto every exported file system. Otherwise the mountd will demand that the remote side use a "secure" mount. You've already done this according to the initial mail message. However, that only tells the server not to care if the client comes in from a port above 1024 2) The client side is where the action is -- the client picks the port it opens after all. Unfortunately, Linux DOES NOT have an option to do this. BSD, Solaris, etc. do, but not Linux. You need to hack the source to make it happen. On a reasonably current source tree, go to: /usr/src/linux/fs/nfs/mount_clnt.c and look for the argument structure being built for rpc_create. You need to or-in RPC_CLNT_CREATE_NONPRIVPORT to the .flags member, as in (for example, depending on your version, this is 2.6.24): .flags = RPC_CLNT_CREATE_INTR, to .flags = RPC_CLNT_CREATE_INTR | RPC_CLNT_CREATE_NONPRIVPORT, This is a bloody ugly hack that will make ALL connections unprived, so you might have trouble with "normal" mounts. This can be done more cleanly, but it would require more than a one line patch. However, it would get you through testing. If it works for you and you really need it, a clean mount option could be added. My guess is that this is not your problem! However, can check and see if I'm wrong, and if I am, then we can move on to fixing it better. Perry -- Perry E. Metzger perry@piermont.com From perry at piermont.com Wed Jul 2 07:30:14 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <20080702134228.GA5152@gretchen.aei.uni-hannover.de> (Henning Fehrmann's message of "Wed\, 2 Jul 2008 15\:42\:28 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <20080702134228.GA5152@gretchen.aei.uni-hannover.de> Message-ID: <87wsk4cqa1.fsf@snark.cb.piermont.com> Henning Fehrmann writes: >> Which corresponds exactly to the maximum achievable mounts of 358 right > > 359 ;) > > If the number of mounts is smaller the ports are randomly used in this range. > It would be convenient to enter the insecure area. > Using the option insecure for the NFS exports is apparently not > sufficient. Well, no, it isn't. The server doesn't control what the client does. The "insecure" option only says the server will accept such connections -- you have to tell the client to make them. On BSD and Solaris that's easy, but on Linux you need to hack the kernel. I have just sent a message explaining how do do that. Note that I still don't think this is your problem, but you might as well check. -- Perry E. Metzger perry@piermont.com From perry at piermont.com Wed Jul 2 07:35:45 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: (Bogdan Costescu's message of "Wed\, 2 Jul 2008 16\:12\:09 +0200 \(CEST\)") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> Message-ID: <87skuscq0u.fsf@snark.cb.piermont.com> Bogdan Costescu writes: >> Every machine might get 1341 connections from clients, and every >> machine might make 1341 client connections going out to other >> machines None of this should cause you to run out of ports, period. > > With all due respect, I think that you are not quite familiar with the > NFS implementation on Linux (and maybe other NFS > implementations). I'm plenty familiar with the implementations on other OSes. I only looked at the code on Linux this morning for the first time (never had call before)... > What you describe is the theoretical use of TCP > connections; the way NFS on Linux uses TCP is not quite as you > imagine: there is one port taken on the client for each NFS mount and > that port is not reused. That's not an NFS implementation issue. It is a TCP implementation issue. (Actually, I'm currently looking at the code and it may be an issue in the rpc code, but never mind that.) In general, the OS should let you use a given port to connect to as many remote hosts as you like. The only thing it should prevent is having you talk to a single remote host/port combination from one local port (because you can't -- that would be the same 4-tuple.) > Also mounting 2 different mount points from > the same NFS server to the same NFS client uses 2 TCP ports on the > client side - at least with NFS v2 and v3; for v4 I think that there > is only one connection between a client and a server independent on > the number of mount points. That is indeed correct. (Actually, linux can burn more than 2 ports, depending.) -- Perry E. Metzger perry@piermont.com From rgb at phy.duke.edu Wed Jul 2 07:50:40 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: <87bq1hgpep.fsf@snark.cb.piermont.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <87bq1hgpep.fsf@snark.cb.piermont.com> Message-ID: On Tue, 1 Jul 2008, Perry E. Metzger wrote: > > Prentice Bisbal writes: >>>> does it necessarily have to be a redhat clone. can it also be a debian >>>> based >>>> clone? >>> >>> but why? is there some concrete advantage to using Debian? >>> I've never understood why Debian users tend to be very True Believer, >>> or what it is that hooks them. >> >> And the Debian users can say the same thing about Red Hat users. Or SUSE >> users. And if any still exist, the Slackware users could say the same >> thing about the both of them. But then the Slackware users could also >> point out that the first Linux distro was Slackware, so they are using >> the one true Linux distro... Or rather, one of two or three contemporary "firsts", in the guise of SLS which became Slackware. I actually started with SLS and then transitioned to Slackware, all 20 or 30 little floppies of it. The problem (for me) was getting an install on a 4 MB system, which is all that I had at the time. > Precisely. It pays to allow people to use what they want. Fewer > religious battles that way. Whether one distro or another has an > advantage isn't the point -- people have their own tastes and it > doesn't pay to tell them "no" without good reason. It isn't all about religion. There are two "real" problems with Slackware. One is its packaging system, the other (related) is maintenance. It's packaging system doesn't really manage dependences or automated updates, and dependence resolution is a major pain in the ass when one is installing a large sheaf of applications all at once. I was once a passionate, fervent, nay, religious user -- it has/had a very SunOS/BSD-like etc layout that was quite painless for me to work, moving over from administrating a mostly-SunOS network, where RH had a much more SysV-like interface that I had to learn. The sources for most of its apps were visibly ports of of the same software I regularly built for the Suns -- remember that right up to linux, Sun workstations were "the" unix boxes for people that wrote and adopted Linux. Maintaining all the open source packages was "easy" on Suns because that is what the open source writers were using and was usually the makefile default, but it was a PITA (or more practically, "expensive" in human time and duplicated effort) there as well. Beyond automated install/updates and dependencies (that now can be sort-of-managed with add-ons basically derived from apt tools or rpm tools) Slackware's other major problem is simply its up-to-dateness. I don't know numbers, but I think it is way, way behind in number of users these days to both Debian and RH-derived distros, not to mention all the rest. I'd be surprised if it were as high as fifth in user base. This basically means that there is a time lag between package developments and releases in the other distros where the user (and hence DEVELOPER) base reside. Then there is a further delay in getting builds in that work with the existing dependencies, because there is no dependency system to speak of. Time lags of this sort are windows of opportunity when security exploits are discovered. They also annoy users, who ask "why is X available in distro Y but not here?" I think of Slackware as being a great hacker distro, a good distro for somebody who wants to work close to the metal (and very hard) to manage their sources, but not the best distro for trouble-free, scalable maintenance of a large network of systems OR for individual users installing a personal standalone workstation. These two points aren't (I think) "religion" -- they are practical costs associated with using the distro for clusters or workstation LANs or personal workstations that need to be considered when picking a distro for any of those purposes. When I considered them, I switched. The human costs are real; people pay money for them or they come out of a fixed opportunity cost time budget. One person can manage a staggeringly large, surprisingly heterogeneous network of RH-derived systems with kickstart with very little effort -- what effort one expends scales up to the entire network. Debian is reportedly similarly manageable at scale, although I have less experience there. I have never heard anyone say "Yeah, Slackware, that's the best distro to use if you have just one person and she has to manage four hundred systems in a mix of cluster, lab and desktop LAN settings. rgb > > Perry > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From perry at piermont.com Wed Jul 2 07:57:02 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 10\:50\:40 -0400 \(EDT\)") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <87bq1hgpep.fsf@snark.cb.piermont.com> Message-ID: <87skusbagx.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: >> Precisely. It pays to allow people to use what they want. Fewer >> religious battles that way. Whether one distro or another has an >> advantage isn't the point -- people have their own tastes and it >> doesn't pay to tell them "no" without good reason. > > It isn't all about religion. There are two "real" problems with > Slackware. One is its packaging system, the other (related) is > maintenance. I wasn't mentioning Slackware. The "major" distros are all pretty similar in features, but I wouldn't count Slackware that way. Perry From rgb at phy.duke.edu Wed Jul 2 08:12:05 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B7AB9.9050202@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Carsten Aulbert wrote: > Which corresponds exactly to the maximum achievable mounts of 358 right > now. Besides, I'm far from being an expert on TCP/IP, but is it possible > for a local process to bind to a port which is already in use but to > another host? I don't think so, but may be wrong. AFAIK, no they don't. The way TCP daemons that listen on a well-known/privileged port work is that they accept a connection on that port, then fork a connection on a higher unprivileged (>1023) port on both ends so that the daemon can listen once again. You can see this by running e.g. netstat -a. Many daemons have a limit that can be set on the number of simultaneous connections they can manage. However, this is for TCP ports that maintain a persistent connection. UDP ports are "connectionless" and hence somewhat different. They tend to make a connection, receive a command/request for some service, immediately deliver the result, and end the connection. NFS used to be built on top of UDP, and honestly I don't know what it does and how it (NFSv3) does it on TCP and am too lazy to look it up, but the RFCs are there to be read. rgb > > Cheers > > Carsten > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From mark.kosmowski at gmail.com Wed Jul 2 08:19:42 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: <486B8634.6020309@scalableinformatics.com> References: <486B8634.6020309@scalableinformatics.com> Message-ID: On 7/2/08, Joe Landman wrote: > Hi Mark > > Mark Kosmowski wrote: > > I'm in the US. I'm almost, but not quite ready for production runs - > > still learning the software / computational theory. I'm the first > > person in the research group (physical chemistry) to try to learn > > plane wave methods of solid state calculation as opposed to isolated > > atom-centered approximations and periodic atom centered calculations. > > > > Heh... my research group in grad school went through that transition in the > mid 90s. Went from an LCAO-type simulation to CP like methods. We needed a > t3e to run those (then). > > Love to compare notes and see which code you are using someday. > On-list/off-list is fine. Right now I'm using CPMD. This is the first package I've looked at and wrestled with the 32-bit limitations of memory allocation prior to the debut of the Opterons. I was at the cusp of buying UltraSparc hardware at student pricing to go forward when the Opterons were released to market, so I decided to go with the PC hardware I was already familiar with. We're comparing calculations to inelastic neutron scattering experiments and it looks like abinit or quantum espresso might be a better choice for this to do vibrational analysis at q-space other than the gamma point. Speaking of, I only have an eighth of a clue about understanding k-points (and, by extension, q-space). If anyone can suggest some reading for this topic that even a part-time chemistry student can understand it would be greatly appreciated. > > > It is turning out that the package I have spent the most time learning > > is perhaps not the best one for what we are doing. For a variety of > > reasons, many of which more off-topic than tac nukes and energy > > efficient washing machines ;) , I'm doing my studies part-time while > > working full-time in industry. > > > > More power to ya! I did mine that way too ... the writing was the hardest > part. Just don't lose focus, or stop believing you can do it. When the > light starts getting visible at the end of the process, it is quite > satisfying. > > I have other words to describe this, but they require a beer lever to get > them out of me ... I make mead on occaision - if you're ever in central NY (Syracuse - Rome - Utica area)... Speaking of satisfaction, I did teach myself enough Fortran to add to the CPMD code to give an output format natively readable by aClimax (used to calculate harmonics from fundamental frequencies for INS). This is/will be included in the recently/soon to be released version of CPMD. Heck, there's one or two pages of dissertation right there. :) > > > I think I have come to a compromise that can keep me in business. > > Until I have a better understanding of the software and am ready for > > production runs, I'll stick to a small system that can be run on one > > node and leave the other two powered down. I've also applied for an > > adjunt instructor position at a local college for some extra cash and > > good experience. When I'm ready for production runs I can either just > > bite the bullet and pay the electricity bill or seek computer time > > elsewhere. > > > > Give us a shout when you want to try the time on a shared resource. Some > folks here may be able to make good suggestions. RGB is a physics guy at > Duke, doing lots of simulations, and might know of resources. Others here > might as well. > > Joe > > Sounds good. The big thing is getting a bit better understanding of the theory, especially DFT dispersion correction to account for hydrogen bonding. I'm thinking that I will learn about DFT dispersion correction with CPMD to at least get a reasonable understanding and then consider learning one of the other packages to do q-space calculations. > > > > Thanks for the encouragement, > > > > Mark E. Kosmowski > > > > On 7/1/08, ariel sabiguero yawelak wrote: > > > > > Well Mark, don't give up! > > > I am not sure which one is your application domain, but if you require > 24x7 > > > computation, then you should not be hosting that at home. > > > On the other hand, if you are not doing real computation and you just > have a > > > testbed at home, maybe for debugging your parallel applications or > something > > > similar, you might be interested in a virtualized solution. Several > years > > > ago, I used to "debug" some neural networks at home, but training > sessions > > > (up to two weeks of training) happened at the university. > > > I would suggest to do something like that. > > > You can always scale-down your problem in several phases and save the > > > complete data-set / problem for THE RUN. > > > > > > You are not being a heretic there, but suffering energy costs ;-) > > > In more places that you may believe, useful computing nodes are being > > > replaced just because of energy costs. Even in some application domains > you > > > can even loose computational power if you move from 4 nodes into a > single > > > quad-core (i.e. memory bandwidth problems). I know it is very nice to be > > > able to do everything at home.. but maybe before dropping your studies > or > > > working overtime to pay the electricity bill, you might want to > reconsider > > > the fact of collapsing your phisical deploy into a single virtualized > > > cluster. (or just dispatch several threads/processes in a single > system). > > > If you collapse into a single system you have only 1 mainboard, one HDD, > one > > > power source, one processor (physically speaking), .... and you can > achieve > > > almost the performance of 4 systems in one, consuming the power of.... > well > > > maybe even less than a single one. I don't want to go into discussions > about > > > performance gain/loose due to the variation of the hardware > architecture. > > > Invest some bucks (if you haven't done that yet) in a good power source. > > > Efficiency of OEM unbranded power sources is realy pathetic. may be > 45-50% > > > efficiency, while a good power source might be 75-80% efficient. Use the > > > energy for computing, not for heating your house. > > > What I mean is that you could consider just collapsing a complete > "small" > > > cluster into single system. If your application is CPU-bound and not I/O > > > bound, VMware Server could be an option, as it is free software > > > (unfortunately not open, even tough some patches can be done on the > > > drivers). I think it is not possible to publish benchmarking data about > > > VMware, but I can tell you that in long timescales, the performance you > get > > > in the host OS is similar than the one of the guest OS. There are a lot > of > > > problems related to jitter, from crazy clocks to delays, but if your > > > application is not sensitive to that, then you are Ok. > > > Maybe this is not a solution, but you can provide more information > regarding > > > your problem before quitting... > > > > > > my 2 cents.... > > > > > > ariel > > > > > > Mark Kosmowski escribi?: > > > > > > > > > > At some point there a cost-benefit analysis needs to be performed. If > > > > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > > > > single-core land still and do not yet differentiate between CPU and > > > > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > > > > resources and would be better off buying new machines and physically > > > > transferring the RAM to and from them or running more jobs each > > > > distributed across fewer CPUs. Or saving on my electricity bill and > > > > powering down some nodes. > > > > > > > > As heretical as this last sounds, I'm tempted to throw in the towel on > > > > my PhD studies because I can no longer afford the power to run my > > > > three node cluster at home. Energy costs may end up being the straw > > > > that breaks this camel's back. > > > > > > > > Mark E. Kosmowski > > > > > > > > > > > > > > > > > > > > > From: "Jon Aquilina" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > not sure if this applies to all kinds of senarios that clusters are > used > > > > > > > > > > > > in > > > > > > > > > > > > but isnt the more ram you have the better? > > > > > > > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > > > > > > > > > > > > > > > > Toon, > > > > > > > > > > > > Can you drop a line on how important RAM is for weather > forecasting in > > > > > > latest type of calculations you're performing? > > > > > > > > > > > > Thanks, > > > > > > Vincent > > > > > > > > > > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > > > > > > > Jim Lux wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Yep. And for good reason. Even a big DoD job is still tiny in > > > > > > > > > > > > > > > > > > > > > > > > > Nvidia's > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > scale of operations. We face this all the time with NASA work. > > > > > > > > Semiconductor manufacturers have no real reason to produce > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > special purpose > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > or customized versions of their products for space use, > because > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > they can > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sell all they can make to the consumer market. More than once, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've had a > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > phone call along the lines of this: > > > > > > > > "Jim: I'm interested in your new ABC321 part." > > > > > > > > "Rep: Great. I'll just send the NDA over and we can talk about > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > it." > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "Jim: Great, you have my email and my fax # is..." > > > > > > > > "Rep: By the way, what sort of volume are you going to be > using?" > > > > > > > > "Jim: Oh, 10-12.." > > > > > > > > "Rep: thousand per week, excellent..." > > > > > > > > "Jim: No, a dozen pieces, total, lifetime buy, or at best > maybe > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > every > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > year." > > > > > > > > "Rep: Oh..." > > > > > > > > {Well, to be fair, it's not that bad, they don't hang up on > you.. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since about a year, it's been clear to me that weather > forecasting > > > > > > > > > > > > > > > > > > > > > > > > > (i.e., > > > > > > > > > > > > > > > > > > > > > > > > > running a more or less sophisticated atmospheric model to > provide > > > > > > > > > > > > > > > > > > > > > > > > > weather > > > > > > > > > > > > > > > > > > > > > > > > > predictions) is going to be "mainstream" in the sense that every > > > > > > > > > > > > > > > > > > > > > > > > > business > > > > > > > > > > > > > > > > > > > > > > > > > that needs such forecasts for its operations can simply run them > > > > > > > > > > > > > > > > > > > > > > > > > in-house. > > > > > > > > > > > > > > > > > > > > > > > > > Case in point: I bought a $1100 HP box (the obvious target > group > > > > > > > > > > > > > > > > > > > > > > > > > being > > > > > > > > > > > > > > > > > > > > > > > > > teenage downloaders) which performs the HIRLAM limited area > model > > > > > > > > > > > > > > > > > > > > > > > > > *on the > > > > > > > > > > > > > > > > > > > > > > > > > grid that we used until October 2006* in December last year. > > > > > > > > > > > > > > It's about twice as slow as our then-operational 50-CPU Sun Fire > > > > > > > > > > > > > > > > > > > > > > > > > 15K. > > > > > > > > > > > > > > > > > > > > > > > > > I wonder what effect this will have on CPU developments ... > > > > > > > > > > > > > > -- > > > > > > > Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 > > > > > > > > > > > > > > > > > > > > > > > > > 214290 > > > > > > > > > > > > > > > > > > > > > > > > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > > > > > At home: http://moene.indiv.nluug.nl/~toon/ > > > > > > > Progress of GNU Fortran: > > > > > > > > > > > > > > > > > > > > > > > > > http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Beowulf mailing list, Beowulf@beowulf.org > > > > > > To change your subscription (digest mode or unsubscribe) visit > > > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Jonathan Aquilina > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf@beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit > > > > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman@scalableinformatics.com > web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com > phone: +1 734 786 8423 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > From prentice at ias.edu Wed Jul 2 08:22:54 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <486B9D4E.80405@ias.edu> Mark Hahn wrote: >>>> does it necessarily have to be a redhat clone. can it also be a debian >>>> based >>>> clone? >>> >>> but why? is there some concrete advantage to using Debian? >>> I've never understood why Debian users tend to be very True Believer, >>> or what it is that hooks them. >> >> And the Debian users can say the same thing about Red Hat users. Or SUSE > > very nice! an excellent parody of the True Believer response. > > but I ask again: what are the reasons one might prefer using debian? > really, I'm not criticizing it - I really would like to know why it > would matter whether someone (such as ClusterVisionOS (tm)) would use > debian or another distro. > >From my interactions with others re: Debian, it's usually about true opensourceness, since Debian claims that every package distributed by them is GPLed, or some how meets some open source legal criteria. Also, I don't think there's any plan for Debian to go corporate, release and enterprise version, and effectively bite the had that feeds it, like Red Hat and SUSE did. Those are not technical issues, but philosophical/legal/political issues. Me? I use RH and it's derivatives for a couple of reasons. Here they are in historical order: 1. When I started learning Linux on my own, all the Linux authorities (websites, LJ, etc) recommended RH b/c RPM made it easy to install software, and if you bought a boxed version, you got the Metro-X X-server, which supported much more video hardware than XFree86 did at the time, and had an easy to use GUI to configure X. 2. Now that I'm a professional system admin who often has to support commercial apps, I find I have to use a RH-based distro for two reasons: A. Most commercial software "supports" only Red Hat. Some go so far as to refuse to install if RH is not detected. The most extreme case of this is EMC PowerPath, whose kernel modules won't install if it's not a RH (or SUSE) kernel. B. Red Hat has done such a good job of spreading FUD about the other Linux distros, management has a cow if you tell them you're installing something other than RH. This is why I consider Red Hat the Microsoft of Linux. None of those are technical issues, either. Since the term "Linux" applies to the kernel only in the strictest sense, there should be no technical reasons to choose one distro over another. Issues like nice GUI management tools are human issues not technical issues. -- Prentice From perry at piermont.com Wed Jul 2 08:23:27 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 11\:12\:05 -0400 \(EDT\)") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <87hcb8b98w.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: > On Wed, 2 Jul 2008, Carsten Aulbert wrote: >> Which corresponds exactly to the maximum achievable mounts of 358 right >> now. Besides, I'm far from being an expert on TCP/IP, but is it possible >> for a local process to bind to a port which is already in use but to >> another host? I don't think so, but may be wrong. > > AFAIK, no they don't. The way TCP daemons that listen on a > well-known/privileged port work is that they accept a connection on that > port, then fork a connection on a higher unprivileged (>1023) port on > both ends so that the daemon can listen once again. Try netstat on a heavily loaded SMTP box. You'll see all these connections from some random foreign port to port 25 locally -- lots of connections to port 25 at the same time. You don't switch to a different port number after the connection comes in, you stay on it. You can in theory talk to up to (nearly) 2^48 different foreign host/port combos off of local port 25, because every remote host/remote port pair makes for a different 4-tuple. > Many daemons have a limit that can be set on the number of > simultaneous connections they can manage. That's a resource issue, not a TCP architecture issue per se. You might not have enough memory, CPU, etc. to handle more than a certain number of connections. By the way, you can now design daemons to handle tens of thousands of simultaneous connections with clean event driven design on a modern multiprocessor with plenty of memory. This is way off topic, though. > However, this is for TCP ports that maintain a persistent connection. > UDP ports are "connectionless" and hence somewhat different. I'm assuming they're doing NFS over TCP. If they're using UDP, things are somewhat different because of the existence of "connectionless" UDP. However, they *should* use TCP for performance. (I know people used to claim the opposite, but it turns out you really want TCP so you get proper congestion control.) Perry -- Perry E. Metzger perry@piermont.com From peter.st.john at gmail.com Wed Jul 2 08:25:19 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: References: Message-ID: Mark, Would it be feasible to downclock your three nodes? All you physicists know better than I, that the power draw and heat production are not linear in GHz. A 1 GHz processor is less than half the cost per tick than a 2GHz, so if power budget is more urgent for you than time to completion then that might help; continue running all of your nodes, but slower. But I've never done this myself. OTOH as a mathematician I don't have to :-) See http://xkcd.com/435/ ("Purity") Peter On 7/2/08, Mark Kosmowski wrote: > > I'm in the US. I'm almost, but not quite ready for production runs - > still learning the software / computational theory. I'm the first > person in the research group (physical chemistry) to try to learn > plane wave methods of solid state calculation as opposed to isolated > atom-centered approximations and periodic atom centered calculations. > > It is turning out that the package I have spent the most time learning > is perhaps not the best one for what we are doing. For a variety of > reasons, many of which more off-topic than tac nukes and energy > efficient washing machines ;) , I'm doing my studies part-time while > working full-time in industry. > > I think I have come to a compromise that can keep me in business. > Until I have a better understanding of the software and am ready for > production runs, I'll stick to a small system that can be run on one > node and leave the other two powered down. I've also applied for an > adjunt instructor position at a local college for some extra cash and > good experience. When I'm ready for production runs I can either just > bite the bullet and pay the electricity bill or seek computer time > elsewhere. > > Thanks for the encouragement, > > Mark E. Kosmowski > > On 7/1/08, ariel sabiguero yawelak wrote: > > Well Mark, don't give up! > > I am not sure which one is your application domain, but if you require > 24x7 > > computation, then you should not be hosting that at home. > > On the other hand, if you are not doing real computation and you just > have a > > testbed at home, maybe for debugging your parallel applications or > something > > similar, you might be interested in a virtualized solution. Several years > > ago, I used to "debug" some neural networks at home, but training > sessions > > (up to two weeks of training) happened at the university. > > I would suggest to do something like that. > > You can always scale-down your problem in several phases and save the > > complete data-set / problem for THE RUN. > > > > You are not being a heretic there, but suffering energy costs ;-) > > In more places that you may believe, useful computing nodes are being > > replaced just because of energy costs. Even in some application domains > you > > can even loose computational power if you move from 4 nodes into a single > > quad-core (i.e. memory bandwidth problems). I know it is very nice to be > > able to do everything at home.. but maybe before dropping your studies or > > working overtime to pay the electricity bill, you might want to > reconsider > > the fact of collapsing your phisical deploy into a single virtualized > > cluster. (or just dispatch several threads/processes in a single system). > > If you collapse into a single system you have only 1 mainboard, one HDD, > one > > power source, one processor (physically speaking), .... and you can > achieve > > almost the performance of 4 systems in one, consuming the power of.... > well > > maybe even less than a single one. I don't want to go into discussions > about > > performance gain/loose due to the variation of the hardware architecture. > > Invest some bucks (if you haven't done that yet) in a good power source. > > Efficiency of OEM unbranded power sources is realy pathetic. may be > 45-50% > > efficiency, while a good power source might be 75-80% efficient. Use the > > energy for computing, not for heating your house. > > What I mean is that you could consider just collapsing a complete "small" > > cluster into single system. If your application is CPU-bound and not I/O > > bound, VMware Server could be an option, as it is free software > > (unfortunately not open, even tough some patches can be done on the > > drivers). I think it is not possible to publish benchmarking data about > > VMware, but I can tell you that in long timescales, the performance you > get > > in the host OS is similar than the one of the guest OS. There are a lot > of > > problems related to jitter, from crazy clocks to delays, but if your > > application is not sensitive to that, then you are Ok. > > Maybe this is not a solution, but you can provide more information > regarding > > your problem before quitting... > > > > my 2 cents.... > > > > ariel > > > > Mark Kosmowski escribi?: > > > > > At some point there a cost-benefit analysis needs to be performed. If > > > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > > > single-core land still and do not yet differentiate between CPU and > > > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > > > resources and would be better off buying new machines and physically > > > transferring the RAM to and from them or running more jobs each > > > distributed across fewer CPUs. Or saving on my electricity bill and > > > powering down some nodes. > > > > > > As heretical as this last sounds, I'm tempted to throw in the towel on > > > my PhD studies because I can no longer afford the power to run my > > > three node cluster at home. Energy costs may end up being the straw > > > that breaks this camel's back. > > > > > > Mark E. Kosmowski > > > > > > > > > > > > > From: "Jon Aquilina" > > > > > > > > > > > > > > > > > > > > > not sure if this applies to all kinds of senarios that clusters are > used > > in > > > > but isnt the more ram you have the better? > > > > > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > > > > > > > > Toon, > > > > > > > > > > Can you drop a line on how important RAM is for weather forecasting > in > > > > > latest type of calculations you're performing? > > > > > > > > > > Thanks, > > > > > Vincent > > > > > > > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > > > > > Jim Lux wrote: > > > > > > > > > > > > > > > > Yep. And for good reason. Even a big DoD job is still tiny in > > Nvidia's > > > > > > > > > > > > > > > > > > > scale of operations. We face this all the time with NASA work. > > > > > > > Semiconductor manufacturers have no real reason to produce > > special purpose > > > > > > > or customized versions of their products for space use, because > > they can > > > > > > > sell all they can make to the consumer market. More than once, > > I've had a > > > > > > > phone call along the lines of this: > > > > > > > "Jim: I'm interested in your new ABC321 part." > > > > > > > "Rep: Great. I'll just send the NDA over and we can talk about > > it." > > > > > > > "Jim: Great, you have my email and my fax # is..." > > > > > > > "Rep: By the way, what sort of volume are you going to be > using?" > > > > > > > "Jim: Oh, 10-12.." > > > > > > > "Rep: thousand per week, excellent..." > > > > > > > "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe > > every > > > > > > > year." > > > > > > > "Rep: Oh..." > > > > > > > {Well, to be fair, it's not that bad, they don't hang up on > you.. > > > > > > > > > > > > > > > > > > > > > > > > > > > Since about a year, it's been clear to me that weather > forecasting > > (i.e., > > > > > > running a more or less sophisticated atmospheric model to provide > > weather > > > > > > predictions) is going to be "mainstream" in the sense that every > > business > > > > > > that needs such forecasts for its operations can simply run them > > in-house. > > > > > > > > > > > > Case in point: I bought a $1100 HP box (the obvious target group > > being > > > > > > teenage downloaders) which performs the HIRLAM limited area model > > *on the > > > > > > grid that we used until October 2006* in December last year. > > > > > > > > > > > > It's about twice as slow as our then-operational 50-CPU Sun Fire > > 15K. > > > > > > > > > > > > I wonder what effect this will have on CPU developments ... > > > > > > > > > > > > -- > > > > > > Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 > > 214290 > > > > > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > > > > At home: http://moene.indiv.nluug.nl/~toon/ > > > > > > Progress of GNU Fortran: > > http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Beowulf mailing list, Beowulf@beowulf.org > > > > > To change your subscription (digest mode or unsubscribe) visit > > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Jonathan Aquilina > > > > > > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf@beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/ec8b64c0/attachment.html From prentice at ias.edu Wed Jul 2 08:28:53 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <486B9EB5.6020906@ias.edu> Mark Hahn wrote: >> Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. >> Some distros use good kernels, some do not. I won't mention who I >> think is in the latter category. > > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - does > that work out well? is it a problem getting commercial packages > (pathscale/pgi/intel compilers, gaussian, etc) to run? > > the couple debian people I know tend to have more ideological motives > (which I do NOT impugn, except that I am personally more swayed by > practical, concrete reasons.) I agree. I follow the same pragmatic rational paradigm. -- Prentice From atchley at myri.com Wed Jul 2 08:32:54 2008 From: atchley at myri.com (Scott Atchley) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486B8C1E.2090007@tamu.edu> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <486B6501.5000108@aei.mpg.de> <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> <486B8C1E.2090007@tamu.edu> Message-ID: On Jul 2, 2008, at 10:09 AM, Gerry Creager wrote: >> Although I believe Lustre's robustness is very good these days, I >> do not believe that it will not work in your setting. I think that >> they currently do not recommend mounting a client on a node that is >> also working as a server as you are doing with NFS. I believe it is >> due to memory contention leading to deadlock. > > Lustre is good enough that it's the parallel FS at TACC for the > Ranger cluster. And, I've had no real problems as a user thereof. > We're brining up glustre on our new cluster here ( > CentOS/RHEL5, not debian ). We looked at zfs but didn't > have sufficient experience to go that path. I believe that all the large DOE labs are using Lustre and would not if it were not reliable. My only concern was Carsten not having dedicated server nodes and mounting directly on those nodes. I may be off-base and hopefully one of the Lustre/SUN people might correct me if so. :-) Scott From rgb at phy.duke.edu Wed Jul 2 08:46:51 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87hcb8b98w.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > You don't switch to a different port number after the connection comes > in, you stay on it. You can in theory talk to up to (nearly) 2^48 > different foreign host/port combos off of local port 25, because every > remote host/remote port pair makes for a different 4-tuple. Ah. I should have known that. >> Many daemons have a limit that can be set on the number of >> simultaneous connections they can manage. > > That's a resource issue, not a TCP architecture issue per se. You > might not have enough memory, CPU, etc. to handle more than a certain > number of connections. > > By the way, you can now design daemons to handle tens of thousands of > simultaneous connections with clean event driven design on a modern > multiprocessor with plenty of memory. This is way off topic, though. Not on a cluster list. Networking in a very real sense IS the topic. I've written forking daemons (which is why I should have known, or remembered, about the four-tuple thing:-) because they are an essential component of IPCs in a network-based cluster or cluster distributed apps. Even though PVM and MPI make it easy to write portable code (and may well provide you with better performance than you can easily get on your own) there may well be occasions for cluster software writers to need to write their own networking, in band or out of band. >> However, this is for TCP ports that maintain a persistent connection. >> UDP ports are "connectionless" and hence somewhat different. > > I'm assuming they're doing NFS over TCP. If they're using UDP, things > are somewhat different because of the existence of "connectionless" > UDP. However, they *should* use TCP for performance. (I know people > used to claim the opposite, but it turns out you really want TCP so > you get proper congestion control.) Yah. To make UDP reliable, you have to load it down with most of the stuff in TCP anyway; it isn't clear that it was ever a great choice. IIRC PVM was originally built on UDP for similar reasons, but I think -- am not sure but think -- it is TCP today because it wasn't worth the hassle. I'm too lazy to crank up a PVM app to find out, though...;-) rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 08:53:31 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Robert G. Brown wrote: > The way TCP daemons that listen on a well-known/privileged port work > is that they accept a connection on that port, then fork a > connection on a higher unprivileged (>1023) port on both ends so > that the daemon can listen once again. 'man 7 socket' and look up SO_REUSEADDR. I don't quite know what you mean by 'forking a connection'; when the daemon encounters a fork() all open file descriptors (including sockets) are being kept in both the parent and the child. The child (usually the part of the daemon that processes the content that comes on that connection) gets the same 4-tuple as the parent. The parent closes its file handle so that only the child is then active on that connection. > You can see this by running e.g. netstat -a. I seriously doubt that you have seen such a behaviour. Empirical evidence which might pass easier than theoretical one: on the e-mail server that I admin, there is an iptable rule to only allow incoming connections to port 25 - if connections would suddenly be migrated to different ports they would be blocked and I would not receive any e-mails from this list. But I do, especially during the past few days... (not that I complain :-)) -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From perry at piermont.com Wed Jul 2 09:33:06 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 11\:46\:51 -0400 \(EDT\)") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> Message-ID: <87vdzo9rgd.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: > On Wed, 2 Jul 2008, Perry E. Metzger wrote: >> By the way, you can now design daemons to handle tens of thousands of >> simultaneous connections with clean event driven design on a modern >> multiprocessor with plenty of memory. This is way off topic, though. > > Not on a cluster list. Well, it actually kind of is. Typically, a box in an HPC cluster is running stuff that's compute bound and who's primary job isn't serving vast numbers of teeny high latency requests. That's much more what a web server does. However... > I've written forking daemons (which is why I should have known, or > remembered, about the four-tuple thing:-) because they are an essential > component of IPCs in a network-based cluster or cluster distributed > apps. One is best off *not* forking, actually. There's a good site on concurrency management for high performance servers. It is a bit old now but covers the topic well: http://www.kegel.com/c10k.html Myself, I'm a believer in event driven code. One thread, one core. All other concurrency management should be handled by events, not by multiple threads. Thread context switching is very very expensive, and threads are very expensive. Doing event driven programming wins overwhelmingly in such contexts. It is hard to impossible, on a modern machine, to handle tens of thousands of connections with forking or threads, but it is easy with events. I'm a fan of Niels Provos' "libevent" for such purposes. There are a lot of other libraries that plug in to it well, too. -- Perry E. Metzger perry@piermont.com From perry at piermont.com Wed Jul 2 09:37:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: (Bogdan Costescu's message of "Wed\, 2 Jul 2008 17\:53\:31 +0200 \(CEST\)") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <87iqvo9r8c.fsf@snark.cb.piermont.com> Bogdan Costescu writes: > 'man 7 socket' and look up SO_REUSEADDR. Incidently, I believe this may be part of the problem for the NFS client code in Linux. -- Perry E. Metzger perry@piermont.com From rgb at phy.duke.edu Wed Jul 2 10:54:37 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Bogdan Costescu wrote: > On Wed, 2 Jul 2008, Robert G. Brown wrote: > >> The way TCP daemons that listen on a well-known/privileged port work is >> that they accept a connection on that port, then fork a connection on a >> higher unprivileged (>1023) port on both ends so that the daemon can listen >> once again. > > 'man 7 socket' and look up SO_REUSEADDR. I don't quite know what you mean by > 'forking a connection'; when the daemon encounters a fork() all open file > descriptors (including sockets) are being kept in both the parent and the > child. The child (usually the part of the daemon that processes the content > that comes on that connection) gets the same 4-tuple as the parent. The > parent closes its file handle so that only the child is then active on that > connection. I'm stating it badly and incorrectly, confusing port with socket. See the following code. Server listens, bound to a specific port. When a connection is initiated by a (possibly remote) client, it accepts it (creating a socket with its own FD), leaving the original server socket FD unaffected. It then forks and the child CLOSES the original socket lest there be trouble. The server/parent similarly closes the client fd. The client typically got a "random" (kernel chosen) port on ITS side from the list of available unprotected ports when it formed its original socket, and it forms one side of the stream connection, with the server "accept" socket being the other. What I was trying to convey remarkably poorly is that once you've created a daemon and bound it to a port, if you try to start up a second daemon on that port you'll get a EADDRINUSE on the bind (and fail the loop that checks below), and so if you DON'T fork off the sockets with listen/accept you'll usually block the port indefinitely while handling each connection. I haven't tried (at least, not deliberately:-) not going through the asymmetric close so that the two processes both have all the FDs, but I'd guess bad things would happen if I did, a crap shoot race condition as to which process gets the data or worse. OTOH, some applications (esp nfsd and httpd) DO fork several child processes with the original open socket fd so that if incoming requests for a connection come while one of them is "busy" with the creation of a child of its own to handle the connection, another will pick it up round robin. Unless I'm misunderstanding how they work this or why. mail is even more interesting, as imapd has to stick around to manage each persistent imap connection, so an imapd server has umpty zillion instances of imapd. I don't know exactly what smtp daemons do -- postfix or sendmail. Anyway, some generic forking daemon code, adopted IIRC from Stevens originally and hacked around some to avoid TIME_WAIT and so on: server_fd = socket(AF_INET,SOCK_STREAM,0); if (server_fd < 0){ fprintf(stderr,"socket: %.100s", strerror(errno)); exit(1); } /* * Set socket options. We try to make the port reusable and have it * close as fast as possible without waiting in unnecessary wait states * on close. */ setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, (void *)&on, sizeof(on)); linger.l_onoff = 1; /* Linger for just a bit */ linger.l_linger = 0; /* do NOT linger -- exit and discard data. */ setsockopt(server_fd, SOL_SOCKET, SO_LINGER, (void *)&linger, sizeof(linger)); serverlen = sizeof(serverINETaddress); bzero( (char*) &serverINETaddress,serverlen); /* clear structure */ serverINETaddress.sin_family = AF_INET; /* Internet domain */ serverINETaddress.sin_addr.s_addr = htonl(INADDR_ANY); /* Accept all */ serverINETaddress.sin_port = htons(port); /* Server port number */ serverSockAddrPtr = (struct sockaddr*) &serverINETaddress; /* * Bind the socket to the desired port. Try up to six times (30sec) IF the * port is in use */ retries = 6; errno = 0; /* To zero any possible garbage value */ while(retries--){ if(bind(server_fd,serverSockAddrPtr,serverlen) < 0) { if(errno != EADDRINUSE){ close(server_fd); fprintf(stderr,"bind: %.100s\n", strerror(errno)); fprintf(stderr,"socket bind to port %d failed: %d.\n", port,errno); exit(255); } } else break; /* printf("Got no port: %s\n",strerror(errno)); */ sleep(5); } if(errno){ if(errno == EADDRINUSE){ fprintf(stderr,"Timeout (tried to bind six times five seconds apart)\n"); } close(server_fd); fprintf(stderr,"bind to port %d failed: %.100s\n",port,strerror(errno)); exit(0); } /* * Socket exists. Service it. Queue up to n_connxns incoming connections * or die. Default 10 matches the limits in the default xinetd. */ if(listen(server_fd,nconnxns) < 0){ fprintf(stderr,"listen: %.100s", strerror(errno)); exit(255); } /* Arrange SIGCHLD to be caught. */ signal(SIGCHLD, sigchld_handler); /* * Initialize client structures. */ clientlen = sizeof(clientINETaddress); clientSockAddrPtr = (struct sockaddr*) &clientINETaddress; /* * Loop "forever", or until daemon crashes or is killed with a signal. */ while(1){ /* Accept a client connection */ if((verbose == D_ALL) || (verbose == D_DAEMON)){ printf("D_DAEMON: Accepting Client connection...\n"); } /* * Wait in select until there is a connection. Presumably this is * more efficient than just blocking on the accept */ FD_ZERO(&fdset); FD_SET(server_fd, &fdset); ret = select(server_fd + 1, &fdset, NULL, NULL, NULL); if (ret < 0 || !FD_ISSET(server_fd, &fdset)) { if (errno == EINTR) continue; fprintf(stderr,"select: %.100s", strerror(errno)); continue; } /* * A call is waiting. Accept it. */ client_fd = accept(server_fd,clientSockAddrPtr,&clientlen); if (client_fd < 0){ if (errno == EINTR) continue; fprintf(stderr,"accept: %.100s", strerror(errno)); continue; } if((verbose == D_ALL) || (verbose == D_DAEMON)){ printf("D_DAEMON: ...client connection made.\n"); } /* * IF I GET HERE... * ...I'm a real daemon. I therefore fork and have the child process * the connection. The parent continues listening and can service * multiple connections in parallel. */ /* * CHILD. Close the listening (server) socket, and start using the * accepted (client) socket. We break out of the (infinite) loop to * handle the connection. */ if ((pid = fork()) == 0){ close(server_fd); break; } /* * PARENT. Stay in the loop. Close the client socket (it's the child's) * but leave the server socket open. */ if (pid < 0) fprintf(stderr,"fork: %.100s", strerror(errno)); else if((verbose == D_ALL) || (verbose == D_DAEMON)){ printf("D_DAEMON: Forked child %d to handle socket %d.\n", pid,client_fd); } close(client_fd); } /* No need to wait for children -- I'm the child */ signal(SIGCHLD, SIG_DFL); /* Dissociate from calling process group and control terminal */ setsid(); > >> You can see this by running e.g. netstat -a. > > I seriously doubt that you have seen such a behaviour. Empirical evidence > which might pass easier than theoretical one: on the e-mail server that I > admin, there is an iptable rule to only allow incoming connections to port 25 > - if connections would suddenly be migrated to different ports they would be > blocked and I would not receive any e-mails from this list. But I do, > especially during the past few days... (not that I complain :-)) > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From rgb at phy.duke.edu Wed Jul 2 11:03:31 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87vdzo9rgd.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > > "Robert G. Brown" writes: >> On Wed, 2 Jul 2008, Perry E. Metzger wrote: >>> By the way, you can now design daemons to handle tens of thousands of >>> simultaneous connections with clean event driven design on a modern >>> multiprocessor with plenty of memory. This is way off topic, though. >> >> Not on a cluster list. > > Well, it actually kind of is. Typically, a box in an HPC cluster is > running stuff that's compute bound and who's primary job isn't serving > vast numbers of teeny high latency requests. That's much more what a > web server does. However... I'd have to disagree. On some clusters, that is quite true. On others, it is very much not true, and whole markets of specialized network hardware that can manage vast numbers of teeny communications requests with acceptably low latency have come into being. And in between, there is, well, between, and TCP/IP at gigabit speeds is at least a contender for ways to fill it. >> I've written forking daemons (which is why I should have known, or >> remembered, about the four-tuple thing:-) because they are an essential >> component of IPCs in a network-based cluster or cluster distributed >> apps. > > One is best off *not* forking, actually. There's a good site on > concurrency management for high performance servers. It is a bit old > now but covers the topic well: http://www.kegel.com/c10k.html > > Myself, I'm a believer in event driven code. One thread, one core. All > other concurrency management should be handled by events, not by > multiple threads. Thread context switching is very very expensive, and > threads are very expensive. Doing event driven programming wins > overwhelmingly in such contexts. It is hard to impossible, on a > modern machine, to handle tens of thousands of connections with > forking or threads, but it is easy with events. > > I'm a fan of Niels Provos' "libevent" for such purposes. There are a > lot of other libraries that plug in to it well, too. Interesting. Makes sense, but a lot of boilerplate code for daemons has always used the fork approach. Of course, things were "smaller" back when the approach was dominant. The forking approach is easy to program and reminiscent of pipe code and so on. rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From perry at piermont.com Wed Jul 2 11:37:41 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:26 2010 Subject: dealing with lots of sockets (was Re: [Beowulf] automount on high ports) In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 14\:03\:31 -0400 \(EDT\)") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> Message-ID: <87skus874a.fsf_-_@snark.cb.piermont.com> "Robert G. Brown" writes: >> Well, it actually kind of is. Typically, a box in an HPC cluster is >> running stuff that's compute bound and who's primary job isn't serving >> vast numbers of teeny high latency requests. That's much more what a >> web server does. However... > > I'd have to disagree. On some clusters, that is quite true. On others, > it is very much not true, and whole markets of specialized network > hardware that can manage vast numbers of teeny communications requests > with acceptably low latency have come into being. And in between, there > is, well, between, and TCP/IP at gigabit speeds is at least a contender > for ways to fill it. I have to admit my experience here is limited. I'll take your word for it that there are systems where huge numbers of small, high latency requests are processed. (I thought that teeny stuff in HPC land was almost always where you brought in the low latency fabric and used specialized protocols, but...) >> Myself, I'm a believer in event driven code. One thread, one core. All >> other concurrency management should be handled by events, not by >> multiple threads.[....] > Interesting. Makes sense, but a lot of boilerplate code for daemons has > always used the fork approach. Of course, things were "smaller" back > when the approach was dominant. The forking approach is easy to program > and reminiscent of pipe code and so on. Sure, but it is way inefficient. Every single process you fork means another data segment, another stack segment, which means lots of memory. Every process you fork also means that concurrency is achieved only by context switching, which means loads of expense on changing MMU state and more. Even thread switching is orders of magnitude worse than a procedure call. Invoking an event is essentially just a procedure call, so that wins big time. Event driven systems can also avoid locking if you keep global data structures to a minimum, in a way you really can't manage well with threaded systems. That makes it easier to write correct code. The price you pay is that you have to think in terms of events, and few programmers have been trained that way. Perry -- Perry E. Metzger perry@piermont.com From cousins at umit.maine.edu Wed Jul 2 11:50:17 2008 From: cousins at umit.maine.edu (Steve Cousins) Date: Fri Mar 19 01:07:26 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: <200806281735.m5SHZ8vS025843@bluewest.scyld.com> References: <200806281735.m5SHZ8vS025843@bluewest.scyld.com> Message-ID: > Just under 60MB/sec seems to be the maximum tape transport read/write > limit. Pretty reliably the first write from the beginning of tape was a > bit slower than writes started further into the tape. I believe LTO-3 is rated at 80 MB/sec without compression. Testing it on our HP unit in an Overland library I get: WRITE: dd if=/dev/zero of=/dev/nst0 bs=512k count=10k 10240+0 records in 10240+0 records out 5368709120 bytes (5.4 GB) copied, 71.8723 seconds, 74.7 MB/s READ: dd of=/dev/null if=/dev/nst0 bs=512k count=10k 10240+0 records in 10240+0 records out 5368709120 bytes (5.4 GB) copied, 69.2487 seconds, 77.5 MB/s I used a 512K block size because that is what I use with our backups and it has given optimal performance since the DLT-7000 days. Good luck, Steve ______________________________________________________________________ Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 From bernard at vanhpc.org Wed Jul 2 12:24:03 2008 From: bernard at vanhpc.org (Bernard Li) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <87wsk4ed20.fsf@snark.cb.piermont.com> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> Message-ID: On Wed, Jul 2, 2008 at 4:32 AM, Perry E. Metzger wrote: > "Jon Aquilina" writes: >> if i use blender how nicely does it work in a cluster? > > I believe it works quite well. As far as I know blender does not have any built-in "clustering" capabilities. But what you do is render different frames on different cores (embarrassingly parallel) using a queuing/scheduling system. DrQueue seems to be quite popular with the rendering folks: http://drqueue.org/cwebsite/ Cheers, Bernard From coutinho at dcc.ufmg.br Wed Jul 2 12:34:48 2008 From: coutinho at dcc.ufmg.br (Bruno Coutinho) Date: Fri Mar 19 01:07:27 2010 Subject: dealing with lots of sockets (was Re: [Beowulf] automount on high ports) In-Reply-To: <87skus874a.fsf_-_@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> Message-ID: 2008/7/2 Perry E. Metzger : > > "Robert G. Brown" writes: > >> Well, it actually kind of is. Typically, a box in an HPC cluster is > >> running stuff that's compute bound and who's primary job isn't serving > >> vast numbers of teeny high latency requests. That's much more what a > >> web server does. However... > > > > I'd have to disagree. On some clusters, that is quite true. On others, > > it is very much not true, and whole markets of specialized network > > hardware that can manage vast numbers of teeny communications requests > > with acceptably low latency have come into being. And in between, there > > is, well, between, and TCP/IP at gigabit speeds is at least a contender > > for ways to fill it. > > I have to admit my experience here is limited. I'll take your word for > it that there are systems where huge numbers of small, high latency > requests are processed. (I thought that teeny stuff in HPC land was > almost always where you brought in the low latency fabric and used > specialized protocols, but...) > > >> Myself, I'm a believer in event driven code. One thread, one core. All > >> other concurrency management should be handled by events, not by > >> multiple threads.[....] > libevent can be used for event-based servers. http://www.monkey.org/~provos/libevent/ > > > Interesting. Makes sense, but a lot of boilerplate code for daemons has > > always used the fork approach. Of course, things were "smaller" back > > when the approach was dominant. The forking approach is easy to program > > and reminiscent of pipe code and so on. This site describe several approaches to solve this problem: http://www.kegel.com/c10k.html Look for Chromium's X15. It can handle thousands of simultaneous conections and can saturate gigabit networks even with lots of slow clients. > > Sure, but it is way inefficient. Every single process you fork means > another data segment, another stack segment, which means lots of > memory. Every process you fork also means that concurrency is achieved > only by context switching, which means loads of expense on changing > MMU state and more. Even thread switching is orders of magnitude worse > than a procedure call. Invoking an event is essentially just a > procedure call, so that wins big time. As fas I know, process creation can take up to 1,000,000 cycles. > > > Event driven systems can also avoid locking if you keep global data > structures to a minimum, in a way you really can't manage well with > threaded systems. That makes it easier to write correct code. > > The price you pay is that you have to think in terms of events, and > few programmers have been trained that way. > > Perry > -- > Perry E. Metzger perry@piermont.com > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/9a4a154d/attachment.html From lindahl at pbm.com Wed Jul 2 13:04:49 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87d4lweagv.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> Message-ID: <20080702200448.GA17424@bx9.net> On Wed, Jul 02, 2008 at 08:28:48AM -0400, Perry E. Metzger wrote: > None > of this should cause you to run out of ports, period. If you don't > understand that, refer back to my original message. A TCP socket is a > unique 4-tuple. The host:port 2-tuples are NOT unique and not an > exhaustible resource. There is is no way that your case is going to > even remotely exhaust the 4-tuple space. Perry, Go look at code that actually uses priv ports to connect out. Normally the port is picked in the connect() call, and that means you can have all the 4-tuples. But for priv ports, you have to loop trying specific candidate ports under 1024 until you get one, and then connect out from it. (Here's where Linux doesn't try all 1024, because it doesn't want to use ports that are someone else's fixed port.) The kernel doesn't know at assignment time who you are connecting out to. In the end, this means that the port numbers are reused slowly, and you have to wait a TIME_WAIT time before reusing them. Now I'm week on the details today, but this was an issue that I dealt with long ago with PBS, which insists on using priv ports. So I ended up hacking the kernel on the PBS master to have a reduced TIME_WAIT time. Problem solved, yukko. -- greg From mathog at caltech.edu Wed Jul 2 13:54:29 2008 From: mathog at caltech.edu (David Mathog) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? Message-ID: Steve Cousins wrote > David Mathog wrote: > > Just under 60MB/sec seems to be the maximum tape transport read/write > > limit. Pretty reliably the first write from the beginning of tape was a > > bit slower than writes started further into the tape. > > I believe LTO-3 is rated at 80 MB/sec without compression. Testing it on > our HP unit in an Overland library I get: > > WRITE: > > dd if=/dev/zero of=/dev/nst0 bs=512k count=10k > 10240+0 records in > 10240+0 records out > 5368709120 bytes (5.4 GB) copied, 71.8723 seconds, 74.7 MB/s > > READ: > > dd of=/dev/null if=/dev/nst0 bs=512k count=10k > 10240+0 records in > 10240+0 records out > 5368709120 bytes (5.4 GB) copied, 69.2487 seconds, 77.5 MB/s Rats. I wonder what the difference is now? If you don't already have it, please grab a copy of Exabyte's ltoTool from here: http://www.exabyte.com/support/online/downloads/downloads.cfm?did=1344&prod_id=581 % /usr/local/src/ltotool/ltoTool -C 1 /dev/nst0 ltoTool V4.63 -- Copyright (c) 1996-2006, Exabyte Corp. Tape Drive identified as LTO3(HP) Enabling compression...OK Done % /usr/local/src/ltotool/ltoTool -i /dev/nst0 ltoTool V4.63 -- Copyright (c) 1996-2006, Exabyte Corp. Tape Drive identified as LTO3(HP) /dev/nst0 - Vendor : HP /dev/nst0 - Product ID: Ultrium 3-SCSI /dev/nst0 - Firmware : D21D /dev/nst0 - Serialnum : HU10708TGG % dd if=/dev/zero of=/dev/nst0 bs=512k count=10k 10240+0 records in 10240+0 records out 5368709120 bytes (5.4 GB) copied, 38.6474 s, 139 MB/s % /usr/local/src/ltotool/ltoTool -C 0 /dev/nst0 ltoTool V4.63 -- Copyright (c) 1996-2006, Exabyte Corp. Tape Drive identified as LTO3(HP) Disabling compression...OK Done % dd if=/dev/zero of=/dev/nst0 bs=512k count=10k 10240+0 records in 10240+0 records out 5368709120 bytes (5.4 GB) copied, 91.9329 s, 58.4 MB/s Done So mine is not as fast as yours in the exact same test. HP's LTT tool shows an LTO 3 cartridge in the drive. (Does this drive even work with an LTO 2 or LTO 4?) % ulimit unlimited % uname -a 2.6.24-19-generic #1 SMP Wed Jun 4 15:10:52 UTC 2008 x86_64 GNU/Linux % cat /etc/issue Ubuntu 8.04 \n \l The system has 24 GB of RAM, dual Opteron 2218, and no cpufreq adjustment running (the BIOS on this one does not support power adjustment). The relevant SCSI messages from the last boot in /var/log/messages are: scsi6 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 3.0 aic7901: Ultra320 Wide Channel A, SCSI Id=7,PCI-X 101-133Mhz, 512 SCBs scsi 6:0:4:0: Sequential-Access HP Ultrium 3-SCSI D21D PQ: 0 ANSI: 3 target6:0:4: asynchronous scsi6:A:4:0: Tagged Queuing enabled. Depth 32 target6:0:4: Beginning Domain Validation target6:0:4: wide asynchronous target6:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU RTI PCOMP (6.25 ns, offset 64) target6:0:4: Domain Validation skipping write tests target6:0:4: Ending Domain Validation The module driving the Adaptec is "aic79xx", apparently with no special options configured anywhere for when it loads. Not sure which kernel parameters are relevant (if any). This is really unlikely to be relevant, but... % dd --version dd (coreutils) 6.10 Copyright (etc.) % ldd `which dd` linux-vdso.so.1 => (0x00007fff08bfe000) librt.so.1 => /lib/librt.so.1 (0x00007fdd00779000) libc.so.6 => /lib/libc.so.6 (0x00007fdd00417000) libpthread.so.0 => /lib/libpthread.so.0 (0x00007fdd001fb000) /lib64/ld-linux-x86-64.so.2 (0x00007fdd00982000) Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From mathog at caltech.edu Wed Jul 2 14:13:53 2008 From: mathog at caltech.edu (David Mathog) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? (Steve Cousins) Message-ID: Steve Cousins wrote: > > Just under 60MB/sec seems to be the maximum tape transport read/write > > limit. Pretty reliably the first write from the beginning of tape was a > > bit slower than writes started further into the tape. > > I believe LTO-3 is rated at 80 MB/sec without compression. I just checked that. The spec page here: http://h18006.www1.hp.com/products/storageworks/ultrium920/index.html says: Higher performance with dynamic data rate matching - Ultrium 920 Tape Drive 120MB/sec compressed data transfer using 2:1 compression, Or with compression off 60MB/sec, assuming that the 120MB/s was rate limited by the physical tape write speed and not at all by the compression. On the other hand, the specs for the CARTRIDGE here: http://www.hboutlet.com.au/catalog/product_info.php?products_id=181385 list 80MB/s uncompressed, which is the number you cited. Also here: http://en.wikipedia.org/wiki/Linear_Tape-Open they cite 80MB/s. Do different LTO-3 drives have different maximum tape write speeds? Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From perry at piermont.com Wed Jul 2 15:31:51 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <20080702200448.GA17424@bx9.net> (Greg Lindahl's message of "Wed\, 2 Jul 2008 13\:04\:49 -0700") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> Message-ID: <87hcb76hpk.fsf@snark.cb.piermont.com> Greg Lindahl writes: > Go look at code that actually uses priv ports to connect out. Normally > the port is picked in the connect() call, and that means you can have > all the 4-tuples. But for priv ports, you have to loop trying specific > candidate ports under 1024 until you get one, and then connect out > from it. (Here's where Linux doesn't try all 1024, because it doesn't > want to use ports that are someone else's fixed port.) The kernel > doesn't know at assignment time who you are connecting out to. In the > end, this means that the port numbers are reused slowly, and you have > to wait a TIME_WAIT time before reusing them. It isn't quite that bad. You can use one of the SO_REUSE* calls in the code to make things less dire. Apparently the kernel doesn't do that for NFS client connection establishment, though. There is probably some code to fix here. Anyway, you may notice that I handed the original poster a hacky patch that will let him use unprivileged ports. I still don't know if it is necessary, but it may make his life less bad, we'll see. > Now I'm week on the details today, but this was an issue that I dealt > with long ago with PBS, which insists on using priv ports. So I ended > up hacking the kernel on the PBS master to have a reduced TIME_WAIT > time. Problem solved, yukko. -- Perry E. Metzger perry@piermont.com From lindahl at pbm.com Wed Jul 2 15:37:14 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87hcb76hpk.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> Message-ID: <20080702223714.GA5908@bx9.net> On Wed, Jul 02, 2008 at 06:31:51PM -0400, Perry E. Metzger wrote: > It isn't quite that bad. You can use one of the SO_REUSE* calls in the > code to make things less dire. Apparently the kernel doesn't do that > for NFS client connection establishment, though. There is probably > some code to fix here. That's what I thought at first, too. But since you only have a 2-tuple and not a 4-tuple when it comes time to pick the port number, SO_REUSEADDR doesn't do anything. -- greg From cousins at umit.maine.edu Wed Jul 2 15:37:14 2008 From: cousins at umit.maine.edu (Steve Cousins) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: References: Message-ID: On Wed, 2 Jul 2008, David Mathog wrote: > Rats. > > I wonder what the difference is now? If you don't already have it, > please grab a copy of Exabyte's ltoTool from here: > > http://www.exabyte.com/support/online/downloads/downloads.cfm?did=1344&prod_id=581 > > % /usr/local/src/ltotool/ltoTool -C 1 /dev/nst0 > ltoTool V4.63 -- Copyright (c) 1996-2006, Exabyte Corp. > > Tape Drive identified as LTO3(HP) > Enabling compression...OK > > Done > > % /usr/local/src/ltotool/ltoTool -i /dev/nst0 > ltoTool V4.63 -- Copyright (c) 1996-2006, Exabyte Corp. > > Tape Drive identified as LTO3(HP) > /dev/nst0 - Vendor : HP > /dev/nst0 - Product ID: Ultrium 3-SCSI > /dev/nst0 - Firmware : D21D > /dev/nst0 - Serialnum : HU10708TGG Hi David, I get: ltoTool V4.63 -- Copyright (c) 1996-2006, Exabyte Corp. Tape Drive identified as LTO3(HP) /dev/nst0 - Vendor : HP /dev/nst0 - Product ID: Ultrium 3-SCSI /dev/nst0 - Firmware : G24H /dev/nst0 - Serialnum : HU105278YC So, based on the serial number it looks like yours is newer than mine but mine possibly has a newer firmware. It's hard to tell though since there are probably different firmwares depending on what vendor/library it is in. For the other information I have: ulimit: unlimited sh-3.1# uname -a Linux triton 2.6.20-1.2320.fc5.asl.1 #1 SMP Thu Aug 9 13:21:16 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux (yes it is an old distribution (FC5) and kernel but it is stable. uptime of 295 days until boot drive had a problem this week and I had to switch it out) dmesg: scsi4 : ioc0: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=28 scsi 4:0:1:0: Sequential-Access HP Ultrium 3-SCSI G24H PQ: 0 ANSI: 3 target4:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU RTI PCOMP (6.25 ns, offset 64) scsi 4:0:6:0: Medium Changer OVERLAND NEO Series 0507 PQ: 0 ANSI: 2 target4:0:6: FAST-10 WIDE SCSI 20.0 MB/s ST (100 ns, offset 15) scsi5 : ioc1: LSI53C1030, FwRev=01030a00h, Ports=1, MaxQ=222, IRQ=29 It is a dual Opteron 252 machine with 4GB of RAM and an LSI PCI-X two channel controller. So we are both running with 2.6 Ghz Opterons but you have twice as many cores and probably higher bandwidth. My motherboard is a Tyan S2882 I believe. dd --version shows: dd (coreutils) 5.97 Copyright (C) 2006 Free Software Foundation, Inc. This is free software. You may redistribute copies of it under the terms of the GNU General Public License . There is NO WARRANTY, to the extent permitted by law. Written by Paul Rubin, David MacKenzie, and Stuart Kemp. What happens if you turn off CTQ. I don't think CTQ will get you anything on a tape drive. Am I mistaken? Steve > % dd if=/dev/zero of=/dev/nst0 bs=512k count=10k > 10240+0 records in > 10240+0 records out > 5368709120 bytes (5.4 GB) copied, 38.6474 s, 139 MB/s > > % /usr/local/src/ltotool/ltoTool -C 0 /dev/nst0 > ltoTool V4.63 -- Copyright (c) 1996-2006, Exabyte Corp. > > Tape Drive identified as LTO3(HP) > Disabling compression...OK > > Done > % dd if=/dev/zero of=/dev/nst0 bs=512k count=10k > 10240+0 records in > 10240+0 records out > 5368709120 bytes (5.4 GB) copied, 91.9329 s, 58.4 MB/s > > Done > > > So mine is not as fast as yours in the exact same test. HP's LTT > tool shows an LTO 3 cartridge in the drive. (Does this drive even > work with an LTO 2 or LTO 4?) > > % ulimit > unlimited > % uname -a > 2.6.24-19-generic #1 SMP Wed Jun 4 15:10:52 UTC 2008 x86_64 GNU/Linux > % cat /etc/issue > Ubuntu 8.04 \n \l > > The system has 24 GB of RAM, dual Opteron 2218, and no cpufreq > adjustment running (the BIOS on this one does not support power > adjustment). The relevant SCSI messages from the last boot > in /var/log/messages are: > > scsi6 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 3.0 > > aic7901: Ultra320 Wide Channel A, SCSI Id=7,PCI-X 101-133Mhz, 512 SCBs > scsi 6:0:4:0: Sequential-Access HP Ultrium 3-SCSI D21D PQ: 0 ANSI: 3 > target6:0:4: asynchronous > scsi6:A:4:0: Tagged Queuing enabled. Depth 32 > target6:0:4: Beginning Domain Validation > target6:0:4: wide asynchronous > target6:0:4: FAST-160 WIDE SCSI 320.0 MB/s DT IU RTI PCOMP (6.25 ns, > offset 64) > target6:0:4: Domain Validation skipping write tests > target6:0:4: Ending Domain Validation > > The module driving the Adaptec is "aic79xx", apparently with no > special options configured anywhere for when it loads. > > Not sure which kernel parameters are relevant (if any). > > This is really unlikely to be relevant, but... > > % dd --version > dd (coreutils) 6.10 > Copyright (etc.) > % ldd `which dd` > linux-vdso.so.1 => (0x00007fff08bfe000) > librt.so.1 => /lib/librt.so.1 (0x00007fdd00779000) > libc.so.6 => /lib/libc.so.6 (0x00007fdd00417000) > libpthread.so.0 => /lib/libpthread.so.0 (0x00007fdd001fb000) > /lib64/ld-linux-x86-64.so.2 (0x00007fdd00982000) > > Thanks, > > David Mathog > mathog@caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > From cousins at umit.maine.edu Wed Jul 2 15:43:17 2008 From: cousins at umit.maine.edu (Steve Cousins) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? (Steve Cousins) In-Reply-To: References: Message-ID: On Wed, 2 Jul 2008, David Mathog wrote: > Steve Cousins wrote: > > Do different LTO-3 drives have different maximum tape write speeds? I don't know. I've always heard 80 MB/sec. lto.org shows: http://www.lto.org/technology/ugen.php?section=0&subsec=ugen for lto-3 "up to 160 MB/sec" which of course is with 2:1 compression and therefore 80 MB/sec uncompressed. Steve From rgb at phy.duke.edu Wed Jul 2 16:44:58 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:27 2010 Subject: dealing with lots of sockets (was Re: [Beowulf] automount on high ports) In-Reply-To: <87skus874a.fsf_-_@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > > "Robert G. Brown" writes: >>> Well, it actually kind of is. Typically, a box in an HPC cluster is >>> running stuff that's compute bound and who's primary job isn't serving >>> vast numbers of teeny high latency requests. That's much more what a >>> web server does. However... >> >> I'd have to disagree. On some clusters, that is quite true. On others, >> it is very much not true, and whole markets of specialized network >> hardware that can manage vast numbers of teeny communications requests >> with acceptably low latency have come into being. And in between, there >> is, well, between, and TCP/IP at gigabit speeds is at least a contender >> for ways to fill it. > > I have to admit my experience here is limited. I'll take your word for > it that there are systems where huge numbers of small, high latency > requests are processed. (I thought that teeny stuff in HPC land was > almost always where you brought in the low latency fabric and used > specialized protocols, but...) Not so much high latency, but there are many different messaging patterns. Some are BW dominated with large messages, some are many small latency dominated messages and use specialized networks, and some are in between -- medium sized messages, medium sized amounts. YMMV is a standard watchword. Some people do fine with TCP/IP over ethernet for whatever message size. I'm not quite sure what you mean by "vast numbers of teeny high latency requests" so I'm not sure if we really are disagreeing or agreeing in different words. If you mean that the problem of HA computing on a human timescale is different than the typical HPC problem then we agree very much, but then I don't see the point in the context of the current discussion. > Sure, but it is way inefficient. Every single process you fork means > another data segment, another stack segment, which means lots of > memory. Every process you fork also means that concurrency is achieved > only by context switching, which means loads of expense on changing > MMU state and more. Even thread switching is orders of magnitude worse > than a procedure call. Invoking an event is essentially just a > procedure call, so that wins big time. Sure, but for a lot of applications, one doesn't have a single server with umpty zillion connections -- which may be what you may mean with your "high latency teensy message" point above. If the connection is persistent, the overhead associated with task switching is just part of the normal multitasking of the OS. In cluster computing, one may have only a small set of these connections to any particular host, or one may have lots -- many to many communications, master-slave communications. Similarly, many daemon-driven tasks tend to be quite bounded. If a server load average is down under 0.1 nearly all the time, nobody cares, if the overhead of communication in a parallel application is down in the sub-1% range, people don't care much. But then, few cluster applications are built on forking daemons...;-) Still, it is important to understand why there are a lot of applications that are. In the old days, there were limits on how many processes, and open connections, and open files, and nearly any other related thing you could have at the same time, because memory was limited. Kernel resources (if nothing else) have to be allocated for each one, and kernel overhead associated with all of the connections, files, etc could scale up to where it more or less shut down a system. Nowadays, with my LAPTOP having 4 GB, multiple cores, far more scalable MP kernels, the limits are a lot more flexible, and it may well be better to maintain many persistent connections within a single application and make it essentially an extension of the kernel with the kernel managing the "multitasking" overhead of message reception per connection and then avoiding the additional multitasking associated with farming the information out per connection to a forked copy of a server process. As I said, very interesting and a good idea -- I'm learning from you -- but a good idea for certain applications, possibly more trouble than it's worth for others? Or maybe not. If you make writing event driven network code as easy, and as well documented, as writing standard socket code and standard daemon code, the forking daemon may become obsolete. Maybe it IS obsolete. So, what do you think? Should one "never" write a forking daemon, or inetd? [Incidentally, does this mean that you are similarly negative about forking applications in general, since similar resource constraints apply to ALL forks, right? Or should one use event driven servers only for big servers with no particular hurry on returning messages for any given connection? I'm guessing that when writing such a server, one has to do some of the work that the kernel would do for you for forked processes -- ensure that no connection is starved for timeslices or network slices, manage priorities if necessary, smoothly multitask any underlying computation associated with providing the data. After all, the MOST efficient server is one with the server code built into the kernel -- DOS plus an application, as it were. Why bother with the overhead of a general purpose multitasking operating system when you can handle all the multitasking native within your one monolithic application? Ditto networking -- why replicate general purpose features of the network stack in the kernel and network structs when you'll never need them for your ONE application? Usually one trades off the ease of programming and use in a general purpose environment against some penalty, as general purpose environments require more state information and overhead to maintain and operate. So are you arguing that there are no tradeoffs, and one should "always" write server network code (or code in a suitably segmented application) on an event model, or that it is a better one for some class of client applications, some pattern of use? This still is (I think) OT, as master-slave parallel applications are fairly common, with a toplevel master doling out units of work to the slaves and then collecting the results. I think that it is probably more usual to write the code for this as a non-forking application anyway, but I can still imagine exceptions. IIRC, some of these things are the motivation for e.g. Scyld and bproc. If anyone else on list is bored with this, let me know we can take it offline. > Event driven systems can also avoid locking if you keep global data > structures to a minimum, in a way you really can't manage well with > threaded systems. That makes it easier to write correct code. > > The price you pay is that you have to think in terms of events, and > few programmers have been trained that way. What do you mean by events? Things picked out with a select statement, e.g. I/O waiting to happen on a file descriptor? Signals? I think the bigger problem is that a lot of the events in question are basically (fundamentally) kernel interrupts, I/O being driven by one or more asynchronous processes, and you're right, a lot of programmers never learn to manage this because it is actually pretty difficult. One has to handle blocking vs non-blocking issues, raw I/O (in many cases), a scheduler of sorts to ensure that connections aren't starved (unless you are content to process events in FIFO order, letting an event piggy or buggy/crashed process hang the entire pending queue). Forking provides you with a certain amount of "automatic" robust parallelism. Without it, one has to make the code a lot more robust; if a forked connection crashes, it crashes just one connection, not the server or any of the rest of the existing connections. The kernel DOES do a lot of things for you on a forked process that you have to do for yourself in event driven code, and it isn't exactly trivial to provide it either well, efficiently, or robustly (where the kernel is perforce all three, within the limits imposed by its general purpose design). As I said, people wrote lots of applications on UDP because they thought "hmmm, I don't need ALL the overhead associated with making TCP robust, I'll use lightweight UDP instead and write my own packet sequencer, my own retransmit, etc." Then they discovered that by the time they ended up with something that was reliable, they hadn't really saved much -- or may well have ended up with something even slower than TCP. People work(ed) HARD on making TCP fairly efficient and making it handle edge cases. Doing it on your own is unlikely to match either one, unless you are an uberprogrammer. You sound like you probably are, but I'm not sure everyone is...;-) I'm not arguing, mind you -- I already believe that writing an event driven server (or client, or both in a more symmetric model) makes sense for a certain class of applications, including many/most of the ones relevant to cluster computing. I'm asking if one should NEVER write a forking daemon because the libraries you mention above provide schedulers and can manage dropped connections or hung resources or because you think that the programmer should always be able to add them as needed, or if there is a problem scale and server type for which it makes sense, and others for which it is overkill or for which the services provided by the kernel for forked processes (or threads, a rose by any other name...) are worth their cost. An event driven application IS basically a kernel, in a manner of speaking. Should every daemon be a kernel, or can some use the existing kernel for kernel-like functionality and focus on just provisioning a single connection well? rgb > > Perry > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From larry.stewart at sicortex.com Wed Jul 2 17:48:32 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Fri Mar 19 01:07:27 2010 Subject: dealing with lots of sockets (was Re: [Beowulf] automount on high ports) In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> Message-ID: <127E7A7E-F115-4349-8DB8-568FC882EB7B@sicortex.com> > >> Sure, but it is way inefficient. Every single process you fork means >> another data segment, another stack segment, which means lots of >> memory. Every process you fork also means that concurrency is >> achieved >> only by context switching, which means loads of expense on changing >> MMU state and more. Even thread switching is orders of magnitude >> worse >> than a procedure call. Invoking an event is essentially just a >> procedure call, so that wins big time. My experience is likely (a) dated or (b) inapplicable, but what's the point of a group if you can't toss it out? Back in 1994, with 90 MHz pentiums, NCSA's httpd was the leading webserver with a design that forked a new process for every request. This works, and provides nice isolation for those cases where your application is buggy. It is also a poor-man's threading system in that it lets the application not worry about blocking behavior of network sockets and so forth. It was a trifle slow however, being limited to 40 or so requests per second. My obligatory internet startup wrote a new single-threaded single- process web server based on select(2) with careful attention to the blocking or not nature of the kernel calls and were able to handle some hundreds of connections per second on the same hardware and over 1000 open connections before breaking the stack. Alas it was never made open source and the company is gone. More recently at SiCortex, We've been using libevent to write single threaded applications that do multithreaded things. On our 16 megabyte 70 MHz freescale embedded boot processors, this is very handy for reducing the memory footprint. On the x86 front end, a single process has no difficulty multiplexing 1000 streams of console data this way. I'd hate to have a process for each one of those! We're also using conserver for console access and that is also written with a single linux process multiplexing 50 or so consoles. I don't know whether conserver's internals are threads or events. So if anyone wants to try an easy to use event library, I can recommend libevent. The learning curve is modest. It does require a little turning inside out to do things like have a tftp client as a libevent task but its not bad. -Larry From lindahl at pbm.com Wed Jul 2 18:05:37 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:27 2010 Subject: dealing with lots of sockets (was Re: [Beowulf] automount on high ports) In-Reply-To: <127E7A7E-F115-4349-8DB8-568FC882EB7B@sicortex.com> References: <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> <127E7A7E-F115-4349-8DB8-568FC882EB7B@sicortex.com> Message-ID: <20080703010537.GA14390@bx9.net> On Wed, Jul 02, 2008 at 08:48:32PM -0400, Lawrence Stewart wrote: > Back in 1994, with 90 MHz pentiums, NCSA's httpd was the leading > webserver with a design that forked a new process for every request. Apache eventually moved to a model where forked processes handled several requests serially before eventually dying and being re-forked. This reduces the fork overhead per request to something reasonable. Recently there's a threaded version, but that's not the default. Our web crawler at Blekko is event-driven: the work is divided up into short subroutines which do non-blocking things, and when blocking is needed, you return to the "system" indicating what code to execute when the answer you're waiting for comes back. This is just event-driven programming inside-out. Works great, too, because the code is prettier than your typical event-driven code. Now Legion had pretty code, but the fact that all of the contexts shared a single stack meant that only the guy at the top of the stack could execute. But I digress. -- greg From perry at piermont.com Wed Jul 2 18:06:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: dealing with lots of sockets In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 19\:44\:58 -0400 \(EDT\)") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> Message-ID: <87mykz4vyo.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: > I'm not quite sure what you mean by "vast numbers of teeny high > latency requests" so I'm not sure if we really are disagreeing or > agreeing in different words. I mostly have worried about such schemes in the case of, say, 10,000 people connecting to a web server, sending an 80 byte request, and getting back a few k several hundred ms later. (I've also dealt a bit with transaction systems with more stringent throughput requirements, but rarely with things that require an ack really, really fast.) That said, I'm pretty sure event systems win over threads if you're coordinating pretty much anything... >> Sure, but it is way inefficient. Every single process you fork means >> another data segment, another stack segment, which means lots of >> memory. Every process you fork also means that concurrency is achieved >> only by context switching, which means loads of expense on changing >> MMU state and more. Even thread switching is orders of magnitude worse >> than a procedure call. Invoking an event is essentially just a >> procedure call, so that wins big time. > > Sure, but for a lot of applications, one doesn't have a single server > with umpty zillion connections Well, often one doesn't build things that way, but that's sort of a choice, isn't it. Your machine has only one or two or eight processors, and any other processes/threads above that which you create are not actually operating in parallel but are just a programming abstraction. It is perfectly possible to structure almost any application so there is just the one thread per core and you otherwise handle the programming abstraction with events instead of additional threads, processes or what have you. > If the connection is persistent, the overhead associated with task > switching is just part of the normal multitasking of the OS. That overhead is VERY high. Incredibly high. Most people don't really understand how high it is. If you compare the performance of an http server that manages 10,000 simultaneous connections with events, versus one that handles it with threads, you'll see there is no comparison -- events always beat threads into the ground, because you can't get away from threads requiring a new stack for each thread, and you can't get away from the fact that context switching is far more expensive than a procedure dispatch. > Similarly, many daemon-driven tasks tend to be quite bounded. If a > server load average is down under 0.1 nearly all the time, nobody cares, That implies almost nothing is in the run queue. For an HPC system, one hopes that the load is hovering around 1. Less means you're wasting processor, more means you're spending too much time context switching. But I digress.. > Still, it is important to understand why there are a lot of applications > that are. In the old days, there were limits on how many processes, and > open connections, and open files, and nearly any other related thing you > could have at the same time, because memory was limited. Believe it or not, memory is still limited, and context switch time is still pretty bad. Changing MMU contexts is unpleasant. Even if you don't have to do that, because you're using another thread in the same MMU context rather than a process, the overhead is still quite painful. Seeing is believing. There are lots of good papers out there on concurrency strategies for systems with vast numbers of sockets to manage, and there is no doubt what the answer is -- threads suck compared to events, full stop. Event systems scale linearly for far longer. > Or maybe not. If you make writing event driven network code as easy, > and as well documented, as writing standard socket code and standard > daemon code, the forking daemon may become obsolete. Maybe it IS > obsolete. It is pretty easy. The only problem is getting your mind wrapped around it and getting experience with it. Most people have been writing fully linear programs for a whole career. If you tell them to try events, or try functional programming, or other things they're not used to, they almost always scream in agony for weeks until they get used to it. "Weeks" is often more overhead than people are willing to suffer. That said, I am comfortable with both of those paradigms... > So, what do you think? Should one "never" write a forking daemon, or > inetd? It depends. If you're doing something where there is going to be one socket talking to the system a tiny percentage of the time, why would you bother building an event driven server? If you're building something to serve files to 20,000 client machines over persistent TCP connections and the network interface is going to be saturated, hell yes, you should never use 20,000 threads for that, write the thing event driven or you'll die. It is all about the right tool for the job. Apps that are all about massive concurrent communication need events. Apps that are about very little concurrent communication probably don't need them. >> Event driven systems can also avoid locking if you keep global data >> structures to a minimum, in a way you really can't manage well with >> threaded systems. That makes it easier to write correct code. >> >> The price you pay is that you have to think in terms of events, and >> few programmers have been trained that way. > > What do you mean by events? Things picked out with a select statement, > e.g. I/O waiting to happen on a file descriptor? Signals? More the former, not the latter. Event driven programming typically uses registered callbacks that are triggered by a central "Event Loop" when events happen. In such a system, one never blocks for anything -- all activity is performed in callbacks, and one simply returns from a callback if one can't proceed further. The programming paradigm is quite alien to most people. I'd read the libevent man page to get a vague introduction. -- Perry E. Metzger perry@piermont.com From carsten.aulbert at aei.mpg.de Wed Jul 2 22:40:48 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <20080702223714.GA5908@bx9.net> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> Message-ID: <486C6660.5070705@aei.mpg.de> Hi all, Greg Lindahl wrote: > On Wed, Jul 02, 2008 at 06:31:51PM -0400, Perry E. Metzger wrote: > >> It isn't quite that bad. You can use one of the SO_REUSE* calls in the >> code to make things less dire. Apparently the kernel doesn't do that >> for NFS client connection establishment, though. There is probably >> some code to fix here. > > That's what I thought at first, too. But since you only have a 2-tuple > and not a 4-tuple when it comes time to pick the port number, > SO_REUSEADDR doesn't do anything. > A solution proposed by the nfs guys is pretty simple: Change the values of /proc/sys/sunrpc/{min,max}_resvport appropriately. But they don't know which ceiling will be next. But we will test it. Thanks for now and I'll read through the other side thread about forking vs. threading vs. serialization :) Carsten From tjrc at sanger.ac.uk Thu Jul 3 01:34:19 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <486B9D4E.80405@ias.edu> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486B9D4E.80405@ias.edu> Message-ID: <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> On 2 Jul 2008, at 4:22 pm, Prentice Bisbal wrote: > 2. Now that I'm a professional system admin who often has to support > commercial apps, I find I have to use a RH-based distro for two > reasons: > A. Most commercial software "supports" only Red Hat. Some go so far as > to refuse to install if RH is not detected. The most extreme case of > this is EMC PowerPath, whose kernel modules won't install if it's > not a > RH (or SUSE) kernel. We have that problem as well, with HP SFS. The way we get around is simply that we run our older Lustre clients using Debian Sarge with a SuSE 9 kernel, which is perfectly possible, if a bit icky. > > B. Red Hat has done such a good job of spreading FUD about the other > Linux distros, management has a cow if you tell them you're installing > something other than RH. Fortunately, our management doesn't have that fear. In fact their fear is usually the other way around: Following the reams of broken promises from certain large UNIX vendors in particular about the future of certain products and features, which required us to copy petabytes of storage to new filesystems, which took more than six months. Our IT management now has a completely rational fear of buying commercial UNIX products, because the company might be bought out, change its focus, charge you a fortune for continued support, or all of the above. Consequently, we go for open source stuff whenever possible, and as far as the distribution was concerned, Debian was the obvious choice -- and pretty much the only choice at the time, since Fedora did not exist, and I'm still not sure how separate from Red Hat Fedora and CentOS really are, but that's probably just my ignorance. The fact that we could easily demonstrate that Debian did everything we needed of it on a technical level made the decision a very comfortable one for the management. Another aspect to our choice of Debian is that pretty much all of the bioinformatics software written here is itself open-sourced and given away, and if you, as a small genomics lab, want to run mirrors of various chunks of our stuff, you can, without worrying that any part of it might have licensing issues. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From eagles051387 at gmail.com Thu Jul 3 04:26:13 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> Message-ID: now what happens if someone comes to me for rendering services and they used maya will the maya file be able to use the software mentioned above or would i need some other software for that? On 7/2/08, Bernard Li wrote: > > On Wed, Jul 2, 2008 at 4:32 AM, Perry E. Metzger > wrote: > > > "Jon Aquilina" writes: > >> if i use blender how nicely does it work in a cluster? > > > > I believe it works quite well. > > As far as I know blender does not have any built-in "clustering" > capabilities. But what you do is render different frames on different > cores (embarrassingly parallel) using a queuing/scheduling system. > DrQueue seems to be quite popular with the rendering folks: > > http://drqueue.org/cwebsite/ > > Cheers, > > Bernard > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080703/e76e7673/attachment.html From eagles051387 at gmail.com Thu Jul 3 04:32:22 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:27 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] In-Reply-To: References: Message-ID: dont throw in the towel just for that. try and see if you can get research funding through the university you are attending On 7/1/08, Mark Kosmowski wrote: > > And I forgot to change the subject. Apologies. > > On 7/1/08, Mark Kosmowski wrote: > > At some point there a cost-benefit analysis needs to be performed. If > > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > > single-core land still and do not yet differentiate between CPU and > > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > > resources and would be better off buying new machines and physically > > transferring the RAM to and from them or running more jobs each > > distributed across fewer CPUs. Or saving on my electricity bill and > > powering down some nodes. > > > > As heretical as this last sounds, I'm tempted to throw in the towel on > > my PhD studies because I can no longer afford the power to run my > > three node cluster at home. Energy costs may end up being the straw > > that breaks this camel's back. > > > > Mark E. Kosmowski > > > > > From: "Jon Aquilina" > > > > > > > > not sure if this applies to all kinds of senarios that clusters are > used in > > > but isnt the more ram you have the better? > > > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > > > Toon, > > > > > > > > Can you drop a line on how important RAM is for weather forecasting > in > > > > latest type of calculations you're performing? > > > > > > > > Thanks, > > > > Vincent > > > > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > > > Jim Lux wrote: > > > >> > > > >> Yep. And for good reason. Even a big DoD job is still tiny in > Nvidia's > > > >>> scale of operations. We face this all the time with NASA work. > > > >>> Semiconductor manufacturers have no real reason to produce special > purpose > > > >>> or customized versions of their products for space use, because > they can > > > >>> sell all they can make to the consumer market. More than once, I've > had a > > > >>> phone call along the lines of this: > > > >>> "Jim: I'm interested in your new ABC321 part." > > > >>> "Rep: Great. I'll just send the NDA over and we can talk about it." > > > >>> "Jim: Great, you have my email and my fax # is..." > > > >>> "Rep: By the way, what sort of volume are you going to be using?" > > > >>> "Jim: Oh, 10-12.." > > > >>> "Rep: thousand per week, excellent..." > > > >>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe > every > > > >>> year." > > > >>> "Rep: Oh..." > > > >>> {Well, to be fair, it's not that bad, they don't hang up on you.. > > > >>> > > > >> > > > >> Since about a year, it's been clear to me that weather forecasting > (i.e., > > > >> running a more or less sophisticated atmospheric model to provide > weather > > > >> predictions) is going to be "mainstream" in the sense that every > business > > > >> that needs such forecasts for its operations can simply run them > in-house. > > > >> > > > >> Case in point: I bought a $1100 HP box (the obvious target group > being > > > >> teenage downloaders) which performs the HIRLAM limited area model > *on the > > > >> grid that we used until October 2006* in December last year. > > > >> > > > >> It's about twice as slow as our then-operational 50-CPU Sun Fire > 15K. > > > >> > > > >> I wonder what effect this will have on CPU developments ... > > > >> > > > >> -- > > > >> Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 > 214290 > > > >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > >> At home: http://moene.indiv.nluug.nl/~toon/ > > > >> Progress of GNU Fortran: > http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > > >> > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf@beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > -- > > > Jonathan Aquilina > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080703/28ed5052/attachment.html From hvidal at tesseract-tech.com Thu Jul 3 05:43:35 2008 From: hvidal at tesseract-tech.com (H.Vidal, Jr.) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> Message-ID: <486CC977.3000800@tesseract-tech.com> In my experience with 3D, the renderer is of particular note. For example, it is entirely possible to model with one piece of software, export geometry, and render with another piece of software. And the production team will probably have very specific ideas of which renderer is what they want, especially since tools such as RenderMan are themselves programmable, and cannot just be 'switched' with another rendering tool. And so, based on your questions, it sounds like you really need to study, understand, and educate yourself on your proposed product or service before thinking about batch rendering. You are, as Americans would say, putting the cart before the horse. So, as suggested, go study the 'front end' of the problem in much greater detail, talk to potential customers, understand more what you are intending to do, then come back and find out about clusters. The Beowulf mailing list has lots of archived comments and questions on this topic. That's a hint..... If you are just acting as a hobbyist or moving along a learning curve, go look at open source rendering tools. If you are going to sell services on open source rendering tools, see comments above. In any case, google and plain reading are your friends.... Good luck. hv Jon Aquilina wrote: > now what happens if someone comes to me for rendering services and they > used maya will the maya file be able to use the software mentioned above > or would i need some other software for that? > > On 7/2/08, *Bernard Li* > > wrote: > > On Wed, Jul 2, 2008 at 4:32 AM, Perry E. Metzger > wrote: > > > "Jon Aquilina" > writes: > >> if i use blender how nicely does it work in a cluster? > > > > I believe it works quite well. > > As far as I know blender does not have any built-in "clustering" > capabilities. But what you do is render different frames on different > cores (embarrassingly parallel) using a queuing/scheduling system. > DrQueue seems to be quite popular with the rendering folks: > > http://drqueue.org/cwebsite/ > > Cheers, > > Bernard > > > > > -- > Jonathan Aquilina > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From prentice at ias.edu Thu Jul 3 06:09:27 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <87y74lfabq.fsf@snark.cb.piermont.com> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> Message-ID: <486CCF87.70206@ias.edu> Perry E. Metzger wrote: > "Jon Aquilina" writes: >> my idea is more of for my thesis. > > If you're trying to do 3d animation on the cheap and you want > something that's already cluster capable, I'd try Blender. It is open > source and it has already made some reasonable length movies. Not > being an animation type, I know nothing about how nice it is compared > to commercial products, but it is hard to beat the price. > > Perry And it's been around for a while, so it should be very mature. I don't know anything about rendering, but I downloaded blender after reading an article in Linux Journal in 1999, and it was mature back then. I only knew enough to run the demo. It took HOURS to render on my 486! I guess a link would be helpful, so here's one: http://www.blender.org/ -- Prentice From prentice at ias.edu Thu Jul 3 06:19:31 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <20080702084458.GA12879@gretchen.aei.uni-hannover.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> <20080702084458.GA12879@gretchen.aei.uni-hannover.de> Message-ID: <486CD1E3.4080204@ias.edu> Henning Fehrmann wrote: > On Wed, Jul 02, 2008 at 09:19:50AM +0100, Tim Cutts wrote: >> On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote: >> >>> OK, we have 1342 nodes which act as servers as well as clients. Every >>> node exports a single local directory and all other nodes can mount this. >>> >>> What we do now to optimize the available bandwidth and IOs is spread >>> millions of files according to a hash algorithm to all nodes (multiple >>> copies as well) and then run a few 1000 jobs opening one file from one >>> box then one file from the other box and so on. With a short autofs >>> timeout that ought to work. Typically it is possible that a single >>> process opens about 10-15 files per second, i.e. making 10-15 mounts per >>> second. With 4 parallel process per node that's 40-60 mounts/second. >>> With a timeout of 5 seconds we should roughly have 200-300 concurrent >>> mounts (on average, no idea abut the variance). >> Please tell me you're not serious! The overheads of just performing the NFS mounts are going to kill you, never mind all the network traffic going >> all over the place. >> >> Since you've distributed the files to the local disks of the nodes, surely the right way to perform this work is to schedule the computations so that >> each node works on the data on its own local disk, and doesn't have to talk networked storage at all? Or don't you know in advance which files a >> particular job is going to need? > > Yes, this is the problem. The amount of files is too big to store it > everywhere (few TByte and 50 million files). Mounting a view NFS server does not provide > the bandwidth. > On the other hand, the coreswitch should be able to handle the flows non > blocking. We think that nfs mounts are the fastest possibility to > distribute the demanded files to the nodes. > > Henning Sounds like you need a parallel filesystem of some sort. Have you looked at that option? I know, they cost $$$$. -- Prentice From prentice at ias.edu Thu Jul 3 06:38:11 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: <486CD643.1050904@ias.edu> Tim Cutts wrote: > > On 2 Jul 2008, at 6:06 am, Mark Hahn wrote: > >>>> I was hoping for some discussion of concrete issues. for instance, >>>> I have the impression debian uses something other than sysvinit - >>>> does that work out well? >>>> >>> Debian uses standard sysvinit-style scripts in /etc/init.d, >>> /etc/rc0.d, ... >> >> thanks. I guess I was assuming that mainstream debian was like ubuntu. > > It's sort of the other way around. Remember that Ubuntu is based off a > six-monthly snapshot of Debian's testing track, which is why Hardy looks > a lot more like the upcoming Debian Lenny than it does like Debian Etch. > >> interesting - I wonder why. the main difference would be that the rpm >> format encodes dependencies... > > The difficulty is that many ISVs tend to do a fairly terrible job of > packaging their applications as RPM's or DEB's, for example creating > init scripts which don't obey the distribution's policies, or making > willy-nilly modifications to configuration files all over the place, > even in other packages (which in the Debian world is a *big* no-no, > that's why many Debian/Ubuntu packages have now moved to the conf.d type > of configuration directory, so that other packages can drop in little > independent snippets of configuration) > > I have seen, for example, .deb packages from a Large Company With Which > We Are All Familiar which essentially attempted to convert your system > into a Red Hat system by moving all your init scripts around and > whatnot, so once you'd installed this abomination, you'd totally wrecked > the ability of many of the main distro packages to be updated ever > again. Oh, and of course uninstalling the package didn't put anything > back the way it had been before. > > Like you, I tend to use tarballs if they are available, and if I want to > turn them into packages I do it myself, and make sure they are policy > compliant for the distro. > > So this, while not a statement in favour of either flavour of distro, is > definitely a warning to be very wary of what packages that have come > from sources other than the distro itself might do (which of course, > you'd be wary of anyway for security reasons). > > Tim > > Here's another reason to use tarballs: I have /usr/local shared to all my systems with with NFS. If want to install the lastest version of firefox, you can just do this: cd /usr/local tar zxvf firefox-x.xxx.tar.gz cd /usr/local/bin ln -s ../firefox-x.xxx/firefox . Now all users can use the latest version of firefox (/usr/local/bin is in their path, and comes before /usr/bin, usr/X11R6/bin, etc.) With RPM, deb, or whatever, I'd have to use func or ssh and a shell script w/ a loop to install on all systems (assuming nightly 'yum update' cron job won't work in this case) This is incredibly helpful with Python, Perl, R, and other languages which have additional modules or libraries. Installing additional modules can be very easy (CPAN module for perl, for example). These modules aren't included in RPM format (that I know of), and when you upgrade, perl or python, the RPMs clobber whatever modules you installed in /usr. Compiling Perl can be time consuming vs. just installing the RPM, but once installed, if I run '/usr/local/bin/perl -MCPAN -e shell' as root, I can install all the perl modules needed just once, and they won't be clobbered by an RPM update. In the end, this is much more efficient. And CPAN manages dependencies automatically, too. We use RT where I work, which requires a few Perl modules. On Friday, June 13 (Yes, Friday the 13! - it figures. ). I had to stop and restart our web server that provides RT. The perl packages had recently been updated. When apache restarted, it couldn't find the necessary perl modules, and RT wouldn't function. It took me HOURS to track the problem down to a couple missing perl modules. -- Prentice From prentice at ias.edu Thu Jul 3 07:01:04 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> <20080702125625.GE47386@gby2.aoes.com> Message-ID: <486CDBA0.7000403@ias.edu> Jon Aquilina wrote: > like you said in regards to maya money is a factor for me. if i do > descide to setup a rendering cluster my problem is going to be finding > someone who can make a small video in blender for me so i can render it. Blender should come with a few small scene files you can render. It did about 10 years a go when I tinkered with it. If not, I'm sure someone in the Blender community would be willing to share with you. -- Prentice From perry at piermont.com Thu Jul 3 07:04:19 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: dealing with lots of sockets In-Reply-To: <127E7A7E-F115-4349-8DB8-568FC882EB7B@sicortex.com> (Lawrence Stewart's message of "Wed\, 2 Jul 2008 20\:48\:32 -0400") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> <127E7A7E-F115-4349-8DB8-568FC882EB7B@sicortex.com> Message-ID: <87vdzncbdo.fsf@snark.cb.piermont.com> Lawrence Stewart writes: > My obligatory internet startup wrote a new single-threaded single- > process web server based on select(2) with careful attention to the > blocking or not nature of the kernel calls and were able to handle > some hundreds of connections per second on the same hardware and > over 1000 open connections before breaking the stack. Alas it was > never made open source and the company is gone. There are others out there now, so the one can find a reasonable event driven http server pretty easily. > More recently at SiCortex, We've been using libevent to write single > threaded applications that do multithreaded things. On our 16 > megabyte 70 MHz freescale embedded boot processors, this is very > handy for reducing the memory footprint. On the x86 front end, a > single process has no difficulty multiplexing 1000 streams of > console data this way. I'd hate to have a process for each one of > those! You couldn't possibly manage a process for every one of those -- the only way to get the sort of performance you're talking about is with events. You picked right with events. > So if anyone wants to try an easy to use event library, I can > recommend libevent. The learning curve is modest. It does require > a little turning inside out to do things like have a tftp client as > a libevent task but its not bad. Libevent was the result of my describing to Niels Provos the way we wrote ticker plants and trading software at a particular hedge fund I was at in the early 1990s. We used libXt, the X toolkit library, as our event driven programming environment on SERVERS. It turned out to work rather well. I was at the Atlanta Linux Showcase many years ago when Niels was a grad student and he presented a paper showing how much better events were than threads or other methods for managing large loads. I explained to him afterwards what we had done, and he proceeded to write a pretty amazing piece of open source software. I don't take any credit for it at all, but I am happy that something came out of my experiences, because the software we built at that hedge fund also got lost to history, just as your http server was. It is good that the ideas survived, at least. Perry -- Perry E. Metzger perry@piermont.com From prentice at ias.edu Thu Jul 3 07:05:48 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: References: Message-ID: <486CDCBC.8030706@ias.edu> Mark Kosmowski wrote: > I think I have come to a compromise that can keep me in business. > Until I have a better understanding of the software and am ready for > production runs, I'll stick to a small system that can be run on one > node and leave the other two powered down. I've also applied for an > adjunt instructor position at a local college for some extra cash and > good experience. When I'm ready for production runs I can either just > bite the bullet and pay the electricity bill or seek computer time > elsewhere. Mark, For MPI testing/debugging, you can create a few virtual machine on one node using VWware or Xen. VMWare is free, unless you want all the bells and whistles. This would be lousy performance for production runs, but would be great for debugging MPI problems in your code, and save you energy. Of course, this wouldn't help with hardware optimizations. -- Prentice From landman at scalableinformatics.com Thu Jul 3 07:20:45 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <486CD643.1050904@ias.edu> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> Message-ID: <486CE03D.40901@scalableinformatics.com> Prentice Bisbal wrote: > Here's another reason to use tarballs: I have /usr/local shared to all eeek!! something named local is shared??? FWIW: we do the same thing, but put everything into /apps, and all nodes have mounted /apps ... requires a little ./configure -prefix=/apps/... magic, but it works well. > my systems with with NFS. If want to install the lastest version of > firefox, you can just do this: > > cd /usr/local > > tar zxvf firefox-x.xxx.tar.gz > > cd /usr/local/bin > > ln -s ../firefox-x.xxx/firefox . > > Now all users can use the latest version of firefox (/usr/local/bin is > in their path, and comes before /usr/bin, usr/X11R6/bin, etc.) Oddly enough, I am not a huge fan of dumping lots of binaries into one path. Part of the reason is the package management one ... all you need is one renegade package and a packager that things [s]he is smart, and ... > With RPM, deb, or whatever, I'd have to use func or ssh and a shell > script w/ a loop to install on all systems (assuming nightly 'yum > update' cron job won't work in this case) If you don't know about pdsh ... it will increase your karma. We install it on every cluster we build. Makes life soooo much easier. > This is incredibly helpful with Python, Perl, R, and other languages > which have additional modules or libraries. Installing additional > modules can be very easy (CPAN module for perl, for example). These > modules aren't included in RPM format (that I know of), and when you > upgrade, perl or python, the RPMs clobber whatever modules you installed > in /usr. Yup. Bioperl is a great example. Of course to install that shared, you need perl/modules installed shared. > > Compiling Perl can be time consuming vs. just installing the RPM, but > once installed, if I run '/usr/local/bin/perl -MCPAN -e shell' as root, > I can install all the perl modules needed just once, and they won't be > clobbered by an RPM update. In the end, this is much more efficient. And > CPAN manages dependencies automatically, too. We usually build our own Perl these days. Having had some interesting ... experiences ... with vendor compiled versions, we decided to forgo their "assistance" and do it ourselves. Seems to work much better. And we have the process (including the module builds) automated quite nicely now. We use it for SICE ... er ... DragonFly. > > We use RT where I work, which requires a few Perl modules. On Friday, > June 13 (Yes, Friday the 13! - it figures. ). I had to stop and restart > our web server that provides RT. The perl packages had recently been > updated. When apache restarted, it couldn't find the necessary perl > modules, and RT wouldn't function. It took me HOURS to track the problem > down to a couple missing perl modules. Yup. This is why we use a different tree than the vendor supplied ones. We can tie back into the vendor supplied web server .... or use our own (usually do the latter). Upgrades can be a crap shoot, even on the best of intentioned systems. Right now we have run head first into a Ubuntu problem with OpenVPN on a system, where we upgraded the server after the OpenSSL fiasco, and suddenly CRLs no longer worked. Fixed a config file by hand. Next upgrade? Same bug. Sigh. Package management is good ... for ... um ... er ... what was that again? Making sure your stuff doesn't break when you update/upgrade? Oh. Maybe they will get around to making sure that actually is the case? FWIW: we have seen the *same* sorts of problem with RPM, apt, yum, suse's monstrosities (zmd and others), ... they are all broken in subtle ways that most folks don't run into. Its only when you have a system you needed to make specific changes to, that these changes get lost on the next "up"grade. We need more of a change management system. Mercurial for systems config. Grrr.... Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From laytonjb at charter.net Thu Jul 3 07:26:28 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: <486CDCBC.8030706@ias.edu> References: <486CDCBC.8030706@ias.edu> Message-ID: <486CE194.2060503@charter.net> Prentice Bisbal wrote: > Mark Kosmowski wrote: > > >> I think I have come to a compromise that can keep me in business. >> Until I have a better understanding of the software and am ready for >> production runs, I'll stick to a small system that can be run on one >> node and leave the other two powered down. I've also applied for an >> adjunt instructor position at a local college for some extra cash and >> good experience. When I'm ready for production runs I can either just >> bite the bullet and pay the electricity bill or seek computer time >> elsewhere. >> > > Mark, > > For MPI testing/debugging, you can create a few virtual machine on one > node using VWware or Xen. VMWare is free, unless you want all the bells > and whistles. > You don't need to go this far. Just set up the hostfile to use the same host name several times. Just make sure you don't start swapping :) Jeff From tjrc at sanger.ac.uk Thu Jul 3 07:31:53 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <486CD643.1050904@ias.edu> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> Message-ID: <3BC700F2-A6AA-4659-B8E9-5E53398FCB55@sanger.ac.uk> On 3 Jul 2008, at 2:38 pm, Prentice Bisbal wrote: > Here's another reason to use tarballs: I have /usr/local shared to all > my systems with with NFS. Heh. Your view of local is different from mine. On my systems /usr/ local is local to the individual system. We do have NFS mounted software of the kind you describe, but we stopped putting it in /usr/ local because users got confused thinking it was really local to the machine. We now have a separate automounted /software directory for all that stuff. > With RPM, deb, or whatever, I'd have to use func or ssh and a shell > script w/ a loop to install on all systems (assuming nightly 'yum > update' cron job won't work in this case) Well, you don't, actually. You can maintain a local repository of your custom packages, and then use something like cfengine or puppet to make sure everything is kept up to date. I need the cfengine stuff anyway to keep various configuration files in sync, so extending it to package management was a no-brainer. > > This is incredibly helpful with Python, Perl, R, and other languages > which have additional modules or libraries. Installing additional > modules can be very easy (CPAN module for perl, for example). These > modules aren't included in RPM format (that I know of), and when you > upgrade, perl or python, the RPMs clobber whatever modules you > installed > in /usr. Yes, I agree, and that's what we do here /software/bin/perl is our supported perl version. But you're not correct about the CPAN modules being clobbered, at least on Debian. Debian's perl packages are configured such that locally installed CPAN modules go into a different tree from the package's own versions of the modules, so yours don't get clobbered on upgrade. And if you really do insist on changing a file which belongs to a package, you can still tell Debian to leave it alone on package upgrade by marking it as diverted with 'dpkg-divert'. The debian guys really did put a lot of thought into how dpkg works. > > Compiling Perl can be time consuming vs. just installing the RPM, but > once installed, if I run '/usr/local/bin/perl -MCPAN -e shell' as > root, > I can install all the perl modules needed just once, and they won't be > clobbered by an RPM update. In the end, this is much more efficient. > And > CPAN manages dependencies automatically, too. I agree. In the case of perl, you're absolutely right. > > We use RT where I work, which requires a few Perl modules. On Friday, > June 13 (Yes, Friday the 13! - it figures. ). I had to stop and > restart > our web server that provides RT. The perl packages had recently been > updated. When apache restarted, it couldn't find the necessary perl > modules, and RT wouldn't function. It took me HOURS to track the > problem > down to a couple missing perl modules. *shrug* I use RT as well, but it's pre-packaged for Debian, so I just use their version and don't have to worry about the dependencies. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From prentice at ias.edu Thu Jul 3 07:55:39 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] /usr/local over NFS is okay, Joe In-Reply-To: <486CE03D.40901@scalableinformatics.com> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> <486CE03D.40901@scalableinformatics.com> Message-ID: <486CE86B.90104@ias.edu> Joe Landman wrote: > > > Prentice Bisbal wrote: > >> Here's another reason to use tarballs: I have /usr/local shared to all > > eeek!! something named local is shared??? Nothing wrong with that. "local" doesn't necessarily mean local to the physical machine. It can mean for the local site. I do everything according to standards, and adhere to them strictly. Sharing /usr/local is is actually codified in the FHS: 4.8.2. /usr/local : Local hierarchy 4.8.2.1. Purpose The /usr/local hierarchy is for use by the system administrator when installing software locally. It needs to be safe from being overwritten when the system software is updated. It may be used for programs and data that are shareable amongst a group of hosts, but not found in /usr. Locally installed software must be placed within /usr/local rather than /usr unless it is being installed to replace or upgrade software in /usr. You can share out /opt, too, but I use /opt for system that's installed locally on a machine and not exported to others. I find this is easier than putting it in /usr/local, since most packaged 3rd party software insist on going in /opt, anwyway. > > FWIW: we do the same thing, but put everything into /apps, and all nodes > have mounted /apps ... > > requires a little ./configure -prefix=/apps/... My configure kung-fu is very strong. I usually do this, so I can install multiple versions of the same software: ./configure --prefix/usr/local/foo-xx.yy --exec-prefix=/usr/local/foo-zxx.yy/x86 If compiling for x86_64, then --exec-prefix=/usr/local/foo-zxx.yy/x86_64. I have dozens of applications compiled for 32-bit and 64-bit on the same /usr/local. I just put 64-bit binaries (actually symlinks) in /usr/local/bin64, and make sure that comes first in the path on 64-bit systems (ditto for lib64, etc.) I do lots of other hocus-pocus, but I'm digressing enough already. > magic, but it works well. > >> my systems with with NFS. If want to install the lastest version of >> firefox, you can just do this: >> >> cd /usr/local >> >> tar zxvf firefox-x.xxx.tar.gz >> >> cd /usr/local/bin >> >> ln -s ../firefox-x.xxx/firefox . >> >> Now all users can use the latest version of firefox (/usr/local/bin is >> in their path, and comes before /usr/bin, usr/X11R6/bin, etc.) > > Oddly enough, I am not a huge fan of dumping lots of binaries into one > path. Part of the reason is the package management one ... all you need > is one renegade package and a packager that things [s]he is smart, and ... I don't dump the binaries into one path. I put symlinks into /usr/local/bin{,64}. All the binaries go into /usr/local/foo-xx.yy and stay there: ./configure --prefix=/usr/local/foo-xx.yy make make install cd /usr/local ln -s foo-xx.yy foo #this makes the non-versioned dir the default # just follow me here, okay? cd /usr/local/foo/bin for file in *; do ln -s ../foo/bin/$file /usr/local/bin/$file; done # do the same for lib, include, man,... # If I want to have multiple versions of foo available: cd /usr/local/foo-xx.yy for file in *; do ln -s ../foo-xx.yy/bin/${file} \ /usr/local/bin/${file}-xx.yy; done Users can call the latest or default version by calling 'foo'. If they want an earlier version, they call foo-. If I want to delete foo-xx.yy, I just do rm /usr/local/foo-xx.yy, and then delete the broken links in /usr/local{bin,lib,incude,man}. This can easily be done before deleting the install dir with scripts. If the links are left around, they take up little disk space, since they are only inodes. If I keep an earlier version of foo around, I change /usr/local/foo to point to it. From prentice at ias.edu Thu Jul 3 07:57:52 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: <486CE194.2060503@charter.net> References: <486CDCBC.8030706@ias.edu> <486CE194.2060503@charter.net> Message-ID: <486CE8F0.7010602@ias.edu> > You don't need to go this far. Just set up the hostfile to use the same > host name several times. Just make sure you don't start swapping :) > > Jeff > Unless the problem is configuring interhost communications correctly. -- Prentice From prentice at ias.edu Thu Jul 3 08:10:50 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <3BC700F2-A6AA-4659-B8E9-5E53398FCB55@sanger.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> <3BC700F2-A6AA-4659-B8E9-5E53398FCB55@sanger.ac.uk> Message-ID: <486CEBFA.7080902@ias.edu> Tim Cutts wrote: > > On 3 Jul 2008, at 2:38 pm, Prentice Bisbal wrote: > >> Here's another reason to use tarballs: I have /usr/local shared to all >> my systems with with NFS. > > Heh. Your view of local is different from mine. On my systems > /usr/local is local to the individual system. We do have NFS mounted > software of the kind you describe, but we stopped putting it in > /usr/local because users got confused thinking it was really local to > the machine. We now have a separate automounted /software directory for > all that stuff. See my other post. The FHS says it's okay for both /opt and /usr/local to be shared over NFS, but I wouldn't do both. For me /usr/local = NFS share, /opt = local to machine. Why do users need to know what's local and what isn't? All that matters is they need to know the path to a file. (It's logical location, and not it's physical location). That's the beauty of the Unix filesystem hierarchy: everything is arranged logically, not physically. No drive letters, etc. In a properly configured environment, things should just work for the users. I'm speaking in general terms, for HPC where disk or network I/O can be significant factors physical location is important. But that's usually for *data*, not the binary running, which is usually read once, and stays in memory for the remainder of it's execution. -- Prentice From landman at scalableinformatics.com Thu Jul 3 08:24:23 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] /usr/local over NFS is okay, Joe In-Reply-To: <486CE86B.90104@ias.edu> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> <486CE03D.40901@scalableinformatics.com> <486CE86B.90104@ias.edu> Message-ID: <486CEF27.8090507@scalableinformatics.com> Prentice Bisbal wrote: > Joe Landman wrote: >> >> Prentice Bisbal wrote: >> >>> Here's another reason to use tarballs: I have /usr/local shared to all >> eeek!! something named local is shared??? > > Nothing wrong with that. "local" doesn't necessarily mean local to the > physical machine. It can mean for the local site. I do everything Yeah, it is ambiguous to a degree, but I figure that something named /local is actually going to be physically local. It helps tremendously when a user calls up with a problem, say that they can't see a file they placed in /local/... on all nodes. Usually they get quiet for a moment after saying that aloud, and then say "oh, never mind". :) [...] > I don't dump the binaries into one path. I put symlinks into > /usr/local/bin{,64}. All the binaries go into /usr/local/foo-xx.yy and > stay there: We used to do this, but things kept getting overwritten by zealous package management tools. So we started using modules and showing people how to add paths by hand if they were adamant about not using modules ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From eagles051387 at gmail.com Thu Jul 3 08:32:16 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <486CC977.3000800@tesseract-tech.com> References: <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> <486CC977.3000800@tesseract-tech.com> Message-ID: i have a little bit of clustering experience.if anything im still contemplating making a kubuntu derivative that is geared towards rendering clusters with the software and what not. i just dont have any experience with whats available for the rendering cluster market and linux that is why im asking all these questions. i did do some googling the other day only to find a list of commercial products. On Thu, Jul 3, 2008 at 2:43 PM, H.Vidal, Jr. wrote: > In my experience with 3D, the renderer is of particular note. > For example, it is entirely possible to model with one piece > of software, export geometry, and render with another piece > of software. And the production team will probably have very > specific ideas of which renderer is what they want, especially > since tools such as RenderMan are themselves programmable, and > cannot just be 'switched' with another rendering tool. > > And so, based on your questions, it sounds like you really need > to study, understand, and educate yourself on your proposed product > or service before thinking about batch rendering. You are, > as Americans would say, putting the cart before the horse. > > So, as suggested, go study the 'front end' of the problem in much > greater detail, talk to potential customers, understand more > what you are intending to do, then come back and find out about > clusters. The Beowulf mailing list has lots of archived comments > and questions on this topic. That's a hint..... > > If you are just acting as a hobbyist or moving along a learning > curve, go look at open source rendering tools. If you are going > to sell services on open source rendering tools, see comments > above. In any case, google and plain reading are your friends.... > > Good luck. > > hv > > Jon Aquilina wrote: > >> now what happens if someone comes to me for rendering services and they >> used maya will the maya file be able to use the software mentioned above or >> would i need some other software for that? >> >> On 7/2/08, *Bernard Li* > >> wrote: >> >> On Wed, Jul 2, 2008 at 4:32 AM, Perry E. Metzger > > wrote: >> >> > "Jon Aquilina" > > writes: >> >> if i use blender how nicely does it work in a cluster? >> > >> > I believe it works quite well. >> >> As far as I know blender does not have any built-in "clustering" >> capabilities. But what you do is render different frames on different >> cores (embarrassingly parallel) using a queuing/scheduling system. >> DrQueue seems to be quite popular with the rendering folks: >> >> http://drqueue.org/cwebsite/ >> >> Cheers, >> >> Bernard >> >> >> >> >> -- >> Jonathan Aquilina >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080703/ff172f51/attachment.html From kus at free.net Thu Jul 3 08:53:03 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] MPI: over OFED and over IBGD Message-ID: Is there some MPI realization/versions which may be installed one some nodes - to work over Mellanox IBGD 1.8.0 (Gold Distribution) IB stack and on other nodes - for work w/OFED-1.2 ? Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry Moscow From jlb17 at duke.edu Thu Jul 3 09:09:49 2008 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486B9D4E.80405@ias.edu> <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> Message-ID: On Thu, 3 Jul 2008 at 9:34am, Tim Cutts wrote > On 2 Jul 2008, at 4:22 pm, Prentice Bisbal wrote: >> B. Red Hat has done such a good job of spreading FUD about the other >> Linux distros, management has a cow if you tell them you're installing >> something other than RH. Erm, do you have any examples of that? All I see is RH a) trying to sell their product (nothing wrong with that) and b) in general, being a pretty good member of the OSS community. > Fedora did not exist, and I'm still not sure how separate from Red Hat Fedora > and CentOS really are, but that's probably just my ignorance. The fact that CentOS is in no way officially associated with Red Hat. At all. They use the freely available RHEL SRPMs to build the distribution, and they report bugs upstream when they find them. But that's it. As for Fedora, see . -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From Shainer at mellanox.com Thu Jul 3 09:41:01 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] MPI: over OFED and over IBGD In-Reply-To: Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F013422B0@mtiexch01.mti.com> Mikhail Kuzminsky wrote: > Is there some MPI realization/versions which may be installed > one some nodes - to work over Mellanox IBGD 1.8.0 (Gold > Distribution) IB stack and on other nodes - for work w/OFED-1.2 ? IBGD is out of date, and AFAIK none of the latest versions of the various MPI were tested against it. I would recommend to update the install to OFED from IBGD, and if you need some help let me know. If you must keep it, than MVAPICH 0.9.6 might work. Gilad. From Bogdan.Costescu at iwr.uni-heidelberg.de Thu Jul 3 09:45:31 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Robert G. Brown wrote: > if you try to start up a second daemon on that port you'll get a > EADDRINUSE on the bind While we talk about theoretical possibilities, this statement is not always true. You could specify something else than INADDR_ANY here: > serverINETaddress.sin_addr.s_addr = htonl(INADDR_ANY); /* Accept all */ or bind it to a specific network interface (SO_BINDTODEVICE). Then you can bind a second daemon to the same port, but with a different (and again not INADDR_ANY) local address or network interface. Many daemons can do this nowadays (named, ntpd, etc.). -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From rgb at phy.duke.edu Thu Jul 3 09:44:41 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: dealing with lots of sockets In-Reply-To: <87mykz4vyo.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> <87mykz4vyo.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > Seeing is believing. There are lots of good papers out there on > concurrency strategies for systems with vast numbers of sockets to > manage, and there is no doubt what the answer is -- threads suck > compared to events, full stop. Event systems scale linearly for far > longer. Sure, but: > It depends. If you're doing something where there is going to be one > socket talking to the system a tiny percentage of the time, why would > you bother building an event driven server? If you're building Or many sockets, but with a task granularity that makes your possibly megaclock/millisecond task switching overhead irrelevant. I'd have to rebuild and rerun lmbench to get an accurate measure of the current context switch time, but milliseconds seems far too long. That sounds like the inverse of the timeslice, not the actual CS time, which I'm pretty sure has been MICROseconds, not milliseconds, since back in the 2.0 kernel on e.g. 200 MHz hardware. My laptop is clocking over 2000 CS's/second sitting nearly idle -- that's just the "noise" of the system's normal interactive single user function, and this when its clock is at its idling 800 MHz (out of 2.5 GHz). On the physics department home directory (NFS) server I'm clocking only 7000-7500 CS/sec at a load average of 0.3 (dual core opteron at 2.8 GHz). Since nfsstat doesn't seem to do rates yet (and has int counters instead of long long uints, grrr) it is a bit difficult to see exactly what this is derived from in real time as far as load goes, but almost all of it seems to be the processing of interrupts, as in the interrupt count and the context switch count go in close parallel. Now, I'm trying to understand the advantage you describe, so bear with me. See what you think of the following: The kernel processes interrupts as efficiently as possible, with upper half and lower half handlers, but either one requires that the CPU stop userspace tasks and load up the kernel's interrupt handler, which requires moving in and out of kernel mode. We aren't going to quibble about a factor of two here and I think that this is well within a factor of two of a context switch time as most of the state of the CPU still has to be saved so I'm calling it a CS even if it is maybe 80% of a CS as far as time goes, depending on how badly one is thrashing the caches. Network based requests are associated with packets from different sources; packets require interrupts to process, interrupts from distinct sources require context switches to process (do they not? -- I'm not really sure here but I recall seeing context switch counts generally rise roughly linearly with interrupt rates even on single threaded network apps) so I would EXPECT the context switch load imposed by a network app to be within a LINEAR factor of two to four independent of whether it was run as a fork or run via events with one exception, described below. I'm estimating the load as packet arrives (I,CS), app gets CPU (CS), select on FD triggers it's "second stage interrupt/packet handler" which computes a result and writes to the network (I,CS), causes the process to block on select, kernel does next task (CS). So I count something like two interrupts and four context switches per network transaction for a single, single threaded network server application handling a single, single packet request with a single, single packet reply. If it has to go to disk, add at least one interrupt/context switch. So it is four or five context switches, some of which may be lighter weight than others, and presumes that the process handling the connection already exists. If the process was created by a fork or as a thread, there is technical stuff about whether it is a kernel thread (expensive) or user thread (much lighter weight, in fact, much more like a procedure call since the forked processes share a memory space and presumably do NOT need to actually save state when switching between them). They still might require a large chunk of a CS time to switch between them because I don't know how the kernel task switcher manages select statements on the entire cluster of children associated with the toplevel parent -- if it sweeps through them without changing context there is one level of overhead, if it fully changes context there is another, but again I think we're talking 10-20% differences. Now, if it is a SINGLE process with umpty open file descriptors and an event loop that uses SOME system call -- and I don't quite understand how a userspace library can do anything but use the same systems calls I could implement in my own code to check for data to be read on a FD, e.g. select or some sort of non-blocking IO poll -- each preexisting, persistent connection (open FD) requires at least 2-3 of the I/CS pairs in order to handle a request. In other words, it saves the CS associated with switching the toplevel task (and permits the kernel to allocate timeslices to its task list with a larger granularity, saving the cost of going in/out of kernel mode to initiate the task/context switch). This saves around two CS out of four or five, a factor of around two improvement. Not that exciting, really. The place where you get really nailed by forks or threads is in their creation. Creating a process is very expensive, a couple of orders of magnitude more expensive than switching processes, and yeah, even order ms. So if you are handling NON-persistent connections -- typical webserver behavior: make a connection, send a request for a file, receive the file requested, break the connection -- handling this with a unique fork or thread per request is absolute suicide. So there a sensible strategy is to pre-initiate enough threads to be able to handle the incoming request stream round-robin, so that as each thread takes a request, processes it, and resumes listening for the next request. This requires the overhead of creating a FD (inevitable for each connection), dealing with the interrupt/CSs required to process the request and deliver the data, then close/free the FD. If the number of daemons required to process connections at incoming saturation is small enough that the overhead associated with processing the task queue doesn't get out of hand, this should scale very nearly as well as event processing, especially if the daemons are a common fork and share an address space. The last question is just how efficiently the kernel processes many blocked processes. Here I don't know the answer, and before looking it up I'll post the question here where probably, somebody does;-) If the connections are PERSISTENT -- e.g. imap connections forked by a mail server for mail clients that connect and stay connected for hours or days -- then as a general rule there will be no I/O waiting on the connections because humans type or click slowly and erratically, mostly a poissonian load distribution. If the kernel has a way of flagging applications that are blocked on a select on an FD without doing an actual context switch into the application, the scheduler can rip through all the blocked tasks without a CS per task, at an overhead rate within a CS or two per ACTIVE task (one where there IS I/O waiting) of the hyperefficient event-driven server that basically stays on CPU except for when the CPU goes back to the kernel anyway to handle the packet stream during interrupts and to advance the timer and so on. I don't KNOW if the kernel manages runnable or blocked at quite this level -- it does seem that there are fields in the task process table that flag it, though, so I'd guess that it does. It seems pretty natural to skip blocked processes without an actual CS in and out just to determine that they are blocked, since many processes spend a lot of time waiting on I/O that can take ms to unblock (e.g. non-cached disk I/O). So I'm not certain that having a large number of idle, blocked processes (waiting on I/O on a FD with a select, for example) is a problem with context switches per se. > something to serve files to 20,000 client machines over persistent TCP > connections and the network interface is going to be saturated, hell > yes, you should never use 20,000 threads for that, write the thing > event driven or you'll die. Here there are a couple of things. One is that 20K processes MIGHT take 20K context switches just to see if they are blocked on I/O. If they do, then you are definitely dead. 20K processes also at the very least require 20K entries in the kernel process table, and even looping over them in the scheduler to check for an I/O flag with no CS is going to start to take time and eat a rather large block of memory and maybe even thrash the cache. So I absolutely agree, 20K mostly-idle processes on a running system -- even a multicore with lots of memory -- is a bad idea even if they are NOT processing network requests. Fortunately, this is so obvious that I don't think anybody sane would ever try to do this. Second, 20K NON-persistent connections on an e.g. webserver would be absolute insanity, as it adds the thread creation/destruction overhead to the cost of processing the single-message-per-connection interrupts. It just wouldn't work, so people wouldn't do that. IIRC there were a few daemons that did that back in the 80's (when I was managing Suns) and there were rules of thumb on running them. "Don't" is the one I recall, at least if one had more than a handful of hosts connecting. Running 10-20 parallel daemons might work, and people do that -- httpd, nfsd. Running an event driven server daemon (or parallel/network application) would work, and people do that -- pvmd does that, I believe. Which one works the best? I'm perfectly happy to believe that the event driven server could manage roughly twice as many make/break single message connections as a pile of daemons, if the processes aren't bottlenecked somewhere other than at interrupt/context switches. If we assume that at CS takes order of 1-10 usec on a modern system, and it takes a SMALLER amount of time to do the processing associated with a request, then you'll get the advantage. If each request takes (say) order of 100 usec to handle, then you'll be bottlenecked at less than 10,000 requests per second anyway, and I don't think that you'd see any advantage at all, although this depends strongly on whether or not one can block all the daemons somewhere OTHER than the network. The question then is -- what kind of traffic is e.g. an NFS server or a mail server as opposed to a web server? NFS service requires (typically) at least an fstat per file, and may or may not require physical disk access with millisecond scale latencies. Caching reduces this by orders of magnitude, but some patterns of access (especially write access or a nasty mix of many small requests -- latency bound disk accesses) don't necessarily cache well. It is not at all clear, then, that an event driven NFS server would ultimately scale out better than a small pile of NFS daemons as the bottleneck could easily end up being the disk, not the context switch or interrupt burden associated with the network. Mail servers ditto, as they too are basically file servers, but ones for which caching is of little or no advantage. Event driven servers might get you the ability to support as much as a factor of two more connections without dying, but it is more likely that other bottlenecks would kill your performance at about the same number of connections either way. To bring the whole thing around OT again, a very reasonable question is what kind of application one is likely to encounter in parallel computing and which of the three models discussed (forking per connection, forking a pile of daemons to handle connections round robin, single server/daemon handling a table of FDs) is likely to be best. I'd argue that in the case of parallel computing it is ALMOST completely irrelevant -- all three would work well. If one starts up a single e.g. pvmd or lamd, which forks off connected parallel applications on request, then typically there will a) only be roughly 1 such fork per core per system, because the system will run maximally efficiently if it can just keep streaming memory streaming in and out of L1 and L2; b) they will have a long lifetime, so the cost of the fork itself is irrelevant -- a ms out of hours to days of computing; c) internally the applications are already written to be event driven, in the sense that they maintain their own tables of FDs and manage I/O either at the level of the toplevel daemons (who then provide it as streams to the applications) or within the applications themselves via library calls and structures. I THINK PVM is more the former model and MPI the latter, but there are many MPIs. For other associated cluster stuff -- a scheduler daemon, an information daemon such as xmlsysd in wulfstat -- forking vs non-forking for persistent connections (ones likely to last longer than minutes) is likely to be irrelevantly different. Again, pay the ms to create the fork, pay 6 interrupt/context switches instead of 4 or 5 per requested service with a marginal cost of maybe 10 usec, and unless one is absolutely hammering the daemon and the work done by the daemon has absolutely terrible granularity (so it is only DOING order of 10 or 100 usec of work per return) it is pretty ignorable, especially on a system that is PRESUMABLY spending 99% of its time computing and the daemon is basically handling out of band task monitoring or control services. > It is all about the right tool for the job. Apps that are all about > massive concurrent communication need events. Apps that are about very > little concurrent communication probably don't need them. Absolutely, but do they need libevents, or do they simply need to be sensibly written to manage a table of fds and selects or nonblocking polls? I've grabbed the source for libevents and am looking through it, but again, it seems to me that it is limited to using the systems calls the kernel provides to handle I/O on open FDs, and if so the main reason to use a library rather than the calls directly is ease of coding, not necessarily efficiency. Usually the code would be more efficient if you did the same thing(s) inline, would it not? The one thing I completely agree with is that one absolutely must remain aware of the high cost of creating and destroying threads/processes. Forking is expensive, and forking to handle a high-volume stream of transient connections is dumb. So dumb that it doesn't work, so nobody does this, I think. At least, not for long. > More the former, not the latter. Event driven programming typically > uses registered callbacks that are triggered by a central "Event Loop" > when events happen. In such a system, one never blocks for anything -- > all activity is performed in callbacks, and one simply returns from a > callback if one can't proceed further. The programming paradigm is > quite alien to most people. Fair enough, because most people don't write heavily parallel applications (which includes applications with many parallel I/O streams, not just HPC). But people who do fairly quickly learn to work out the scaling and overhead, do they not, at least "well enough" to achieve some level of performance? Otherwise the applications just fail to work and people don't use them. Evolution in action...;-) This has been a very informative discussion so far, at least for me. Even if my estimates above are all completely out of line and ignore some key thing, all that means is I'll learn even more. The one thing that I wish were written with some sort of internal scheduler/kernel and event mechanism from the beginning is X. It has its own event loop, but event-driven callbacks all block -- there is no internal task parallelism. It is a complete PITA to write an X application that runs a thread continuously but doesn't block the operation of the GUI -- one has to handle state information, use a separate thread, or invert all sorts of things from the normal X paradigm of click and callback. That is, most X apps are written to be serial and X itself is designed to support serial operation, but INTERESTING X apps are parallel, where the UI-linked I/O channels have to be processed "independently" within the X event loop while a separate thread is doing a task loop of interesting work. AFAIK, X only supports its own internal event loop and has horrible kludges to get the illusion of task parallelism unless one just forks a separate thread for the running "work" process and establishes shared state stuctures and so on so that the UI callbacks can safely affect work going on in the work-loop thread without blocking it. > I'd read the libevent man page to get a vague introduction. There doesn't seem to be one in the source tarball I downloaded. Only event.3 and evdns.3, neither of which are terribly informative. In fact, the documentation sucks. There is more space on the website devoted to pictures of good vs terrible scaling with/without libevent than there is documentation of how it works or how to use it, and of course it is difficult to know if the figures are straw men or fair comparisons. There are a few chunks of sample code in samples. I'll take a look and see what I can see when I have time. I'm working on an X-based GUI-controlled application and do have a forking daemon (xmlsysd) that so far seems to work fine at the level of traffic it was designed for and is likely bottlenecked someplace other than CSs long before they become a problem, but this conversation has convinced me that I could rewrite at least the latter in a way that is more efficient even if I do leave it forking per connection (or using xinetd, as it usually does now:-). It is a monitoring daemon, and is fairly lightweight now because one doesn't want to spend resources watching a cluster spend resources. If I redesigned it along lines suggested by the analysis above, I could permit it to manage many more connections with one part of its work accomplished with roughly constant overhead, where now the overhead associated with that work scales linearly with the number of connections. rgb > > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From landman at scalableinformatics.com Thu Jul 3 09:49:00 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486B9D4E.80405@ias.edu> <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> Message-ID: <486D02FC.9040109@scalableinformatics.com> Joshua Baker-LePain wrote: > On Thu, 3 Jul 2008 at 9:34am, Tim Cutts wrote >> On 2 Jul 2008, at 4:22 pm, Prentice Bisbal wrote: > >>> B. Red Hat has done such a good job of spreading FUD about the other >>> Linux distros, management has a cow if you tell them you're installing >>> something other than RH. > > Erm, do you have any examples of that? All I see is RH a) trying to > sell their product (nothing wrong with that) and b) in general, being a > pretty good member of the OSS community. The only bad things I have seen RH do are their refusal to support good file systems (as the size of disks hit 2GB at the end of the year, this is going to bite them harder than it is now), and some of the choices they have made in their kernel. Other than that, they have been a good OSS citizen. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From kus at free.net Thu Jul 3 10:01:43 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] MPI: over OFED and over IBGD In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F013422B0@mtiexch01.mti.com> Message-ID: In message from "Gilad Shainer" (Thu, 3 Jul 2008 09:41:01 -0700): >Mikhail Kuzminsky wrote: > >> Is there some MPI realization/versions which may be installed >> one some nodes - to work over Mellanox IBGD 1.8.0 (Gold >> Distribution) IB stack and on other nodes - for work w/OFED-1.2 ? >IBGD is out of date, and AFAIK none of the latest versions of the >various MPI were tested against it. It's clear, but I didn't ask about *LATEST* MPI versions ;-) >I would recommend to update the >install to OFED from IBGD, and if you need some help let me know. Thanks you very much for your help ! > If you >must keep it Yes. There is some russian romance w/the words : "You can't understand, you can't understand, you can't understand my sorrows" :-)) >, than MVAPICH 0.9.6 might work. Eh, I used 0.9.5 and 0.9.9 :-) Now will see mvapich archives. Thanks ! Mikhail > >Gilad. > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf From mark.kosmowski at gmail.com Thu Jul 3 10:13:12 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: energy costs and poor grad students Message-ID: > Prentice Bisbal wrote: > > Mark Kosmowski wrote: > > > > > >> I think I have come to a compromise that can keep me in business. > >> Until I have a better understanding of the software and am ready for > >> production runs, I'll stick to a small system that can be run on one > >> node and leave the other two powered down. I've also applied for an > >> adjunt instructor position at a local college for some extra cash and > >> good experience. When I'm ready for production runs I can either just > >> bite the bullet and pay the electricity bill or seek computer time > >> elsewhere. > >> > > > > Mark, > > > > For MPI testing/debugging, you can create a few virtual machine on one > > node using VWware or Xen. VMWare is free, unless you want all the bells > > and whistles. > > > > You don't need to go this far. Just set up the hostfile to use the same > host name several times. Just make sure you don't start swapping :) > > Jeff > My problem is RAM. I'm using stable codes and not doing much programming of my own, other than to tweak output formats to suit my needs. I've come up with some solutions. First, I'll spend some time this weekend moving files around and physically swapping DIMMs (I'm gonna have sore thumbs again :( ) to get one machine with somewhere between 8 and 16 Gb. After I do the file transferring I can then run just one workstation with a big amount of RAM. This amount of RAM should keep me in business for even most of my production runs until I get to a certain size of system to be studied. Next, tomorrow I am going to install some laminate flooring for my parents and will endeavor to extort a new laptop out of them - mine no longer communicated to the LCD screen - tearing it apart to see what is wrong was going to be my first unemployment project but the went and extended my employment contract another 6 months. Step three - time to upgrade the entertainment machine to a 64-bit dual core system from the 32-bit ancient chip it has. I'll try to get this to 6 or 8 Gb RAM - then if needed I can use it as a half-node as needed to supplement the workstation without firing up more machines. This will leave two machines powered down and hopefully half the computing power usage. Maybe it would be a better idea to just buy more RAM for the two HDAMA opteron systems and use one of those as a part-time entertainment / vmware windows machine by just getting a PCI-X video card (I'm asking on the OpenSUSE forum whether the HDAMA PCI-X slots will run a PCI-X video card - feel free to comment on this here too). I'll make a RAM inventory this weekend and post the results. If I can get one of these systems to 12 - 16 Gb and the other to 8 - 12 Gb, this may be the best choice. Time to learn vmware and wine. Right now, my CoW (cluster of workstations from RGB's book) uses the oldest 64-bit machine as a "head node". This machine is the slowest and only has 4 DIMM slots, one of which I'm having difficulties with, so this machine is definitely going down. The other two nodes use an HDAMA mother board, each with 8 DIMM slots - I have 2 Gb and 1 Gb DIMMs on hand. I'm thinking that perhaps the best thing to do is just physically move the data drive to the machine slated to be a full-time calculator (the one with the most RAM) and then fix paths as needed. Someone suggested downclocking - would downclocking a step or two in BIOS find a sweet spot as far as speed per energy unit similar to driving in the 45 - 55 mile per hour range? I hope no one is too upset I'm doing this planning on list. From Bogdan.Costescu at iwr.uni-heidelberg.de Thu Jul 3 10:13:33 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: dealing with lots of sockets In-Reply-To: <87mykz4vyo.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> <87mykz4vyo.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > Event driven programming typically uses registered callbacks that > are triggered by a central "Event Loop" when events happen. In such > a system, one never blocks for anything -- all activity is performed > in callbacks, and one simply returns from a callback if one can't > proceed further. And here is one of the problems that event driven programming can't really solve: separation between the central event loop and the code to run when events happen. fork() allows the newly created process to proceed at its own will and possibly doing its own mistakes (like buffer overflows) in its own address space - the parent process is not affected in any way and this allows f.e. daemons to run their core loop with administrative priviledges while the real work can be done as a dumb user. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From shaeffer at neuralscape.com Thu Jul 3 10:21:04 2008 From: shaeffer at neuralscape.com (Karen Shaeffer) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: References: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486B9D4E.80405@ias.edu> <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> Message-ID: <20080703172104.GC21836@synapse.neuralscape.com> On Thu, Jul 03, 2008 at 12:09:49PM -0400, Joshua Baker-LePain wrote: > On Thu, 3 Jul 2008 at 9:34am, Tim Cutts wrote > >On 2 Jul 2008, at 4:22 pm, Prentice Bisbal wrote: > > >>B. Red Hat has done such a good job of spreading FUD about the other > >>Linux distros, management has a cow if you tell them you're installing > >>something other than RH. > > Erm, do you have any examples of that? All I see is RH a) trying to sell > their product (nothing wrong with that) and b) in general, being a pretty > good member of the OSS community. > > >Fedora did not exist, and I'm still not sure how separate from Red Hat > >Fedora and CentOS really are, but that's probably just my ignorance. The > >fact that > > CentOS is in no way officially associated with Red Hat. At all. They use > the freely available RHEL SRPMs to build the distribution, and they report > bugs upstream when they find them. But that's it. > > As for Fedora, see . Hi, Red Hat is indeed an exemplary member of the OSS community. They never violate licenses. What they do do, is they take every advantage they can to differentiate themselves. This results in very unfriendly distributions to modify and customize -- Red Hat wants service contracts that are not really compatible with such activity anyway. Its by design. And quite effective. Fedora Core is dominated by Red Hat employees. It is for all intents and purposes a hybrid distribution that is semi open to the public. It is definitely a beta distribution for RHEL. CentOS is not related to Red Hat in any way. These folks just use the GPL to produce a free clone of RHEL releases. And I might add, it is very well maintained. I highly recommend it to folks who are going to modify and customize RHEL, because your RHEL service contract won't permit that anyway. If you are interested in such issues, you might want to pay attention to the recent and ongoing discussion about systemtap (A Red Hat managed project.) And the consternation of other folks in the OSS community at the difficulty in working with the project independent of Red Hat. In this case, we are talking about the kernel development community. Just follow the thread from here: https://lists.linux-foundation.org/pipermail/ksummit-2008-discuss/2008-June/000149.html Actually a very interesting thread, dealing with more than systemtap. Thanks, Karen -- Karen Shaeffer Neuralscape, Palo Alto, Ca. 94306 shaeffer@neuralscape.com http://www.neuralscape.com From rgb at phy.duke.edu Thu Jul 3 10:45:44 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: On Thu, 3 Jul 2008, Bogdan Costescu wrote: > On Wed, 2 Jul 2008, Robert G. Brown wrote: > >> if you try to start up a second daemon on that port you'll get a EADDRINUSE >> on the bind > > While we talk about theoretical possibilities, this statement is not always > true. You could specify something else than INADDR_ANY here: > >> serverINETaddress.sin_addr.s_addr = htonl(INADDR_ANY); /* Accept all */ > > or bind it to a specific network interface (SO_BINDTODEVICE). Then you can > bind a second daemon to the same port, but with a different (and again not > INADDR_ANY) local address or network interface. Many daemons can do this > nowadays (named, ntpd, etc.). Sure. I meant on a single wire, single IP number (and included sample code), not that you couldn't offer network services on more than one network from a single machine. Ultimately, raw networking is really difficult, and I'll freely admit that even though I've WRITTEN some network apps, I'm far from expert. I code with Stevens in one hand and examples in the other, typing with my nose, and pray. So any of y'all that have written a lot of networking code will have direct experience of edges I have not yet explored. This is one reason that people use PVM and MPI and so on. It can be argued (and has been argued on this list IIRC) that raw networking code will always result in a faster parallel program, all things being equal, because encapsulating it in higher level abstractions always comes at a cost (even though in many practical cases the people who wrote those abstractions were better parallel coders than the person trying to write the code anyway, so things are not equal, so the resulting code will be MORE efficient than what one would get unless one worked really hard and learned all the tricks used in the library to where one could go beyond them). It is pretty easy to write a single task server. There is template code for it. It isn't horribly difficult to write a forking server. There is template code for it. As you go up in complexity and expected load beyond where these will work, you have to learn, and you will find it harder and harder to find good, simple, templated code to start your project with. Such is life. And it isn't easy to learn. Few people teach it because few people know it. There aren't a lot of good books on it that I know of (outside of Stevens). One has to learn the hardest of ways; on the job, by doing, by making mistakes. I certainly have learned the modest bit that I know on my own, and haven't had any need to go beyond it (yet) to the next level. And if God is good to me, I will never have to design a 20 Kconnection webserver, and can die in peace in my state of relative ignorance...;-) rgb > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From perry at piermont.com Thu Jul 3 10:59:06 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: dealing with lots of sockets In-Reply-To: (Bogdan Costescu's message of "Thu\, 3 Jul 2008 19\:13\:33 +0200 \(CEST\)") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> <87mykz4vyo.fsf@snark.cb.piermont.com> Message-ID: <8763rmn91x.fsf@snark.cb.piermont.com> Bogdan Costescu writes: > On Wed, 2 Jul 2008, Perry E. Metzger wrote: > >> Event driven programming typically uses registered callbacks that >> are triggered by a central "Event Loop" when events happen. In such >> a system, one never blocks for anything -- all activity is performed >> in callbacks, and one simply returns from a callback if one can't >> proceed further. > > And here is one of the problems that event driven programming can't > really solve: separation between the central event loop and the code > to run when events happen. I don't understand what you mean. > fork() allows the newly created process to proceed at its own will > and possibly doing its own mistakes (like buffer overflows) in its > own address space - the parent process is not affected in any way > and this allows f.e. daemons to run their core loop with > administrative priviledges while the real work can be done as a dumb > user. Oh, that's not an issue at all. For example, say you wanted to run an SMTP daemon as a pure event app but you don't want it to run as root. So, you're screwed because you can't open port 25 as a normal user, right? Well, you can either change privs after opening 25, or you can use fd passing to pass open file descriptors between a small rootly process and the mail processing event driven process. Anyway, yah, bugs are a problem. If you have a bug in an event driven system you bring down 10,000 connections at once instead of 1. You do indeed have to be confident your code doesn't suck. -- Perry E. Metzger perry@piermont.com From perry at piermont.com Thu Jul 3 11:01:07 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486C6660.5070705@aei.mpg.de> (Carsten Aulbert's message of "Thu\, 03 Jul 2008 07\:40\:48 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> Message-ID: <87wsk2lue4.fsf@snark.cb.piermont.com> Carsten Aulbert writes: > A solution proposed by the nfs guys is pretty simple: > > Change the values of > /proc/sys/sunrpc/{min,max}_resvport > appropriately. But they don't know which ceiling will be next. But we > will test it. What about my kernel patch to use unprived ports? Did you try it? Perry From jlb17 at duke.edu Thu Jul 3 11:09:00 2008 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <486D02FC.9040109@scalableinformatics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486B9D4E.80405@ias.edu> <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> <486D02FC.9040109@scalableinformatics.com> Message-ID: On Thu, 3 Jul 2008 at 12:49pm, Joe Landman wrote > The only bad things I have seen RH do are their refusal to support good file > systems (as the size of disks hit 2GB at the end of the year, this is going > to bite them harder than it is now), and some of the choices they have made > in their kernel. Other than that, they have been a good OSS citizen. I definitely agree that some of their kernel decisions are... odd. I'm wondering, though, why you think 2TB drives in specific are going to bite them harder than the 1TB models out now. ext3 goes up to 16TB these days, and 2 marketing TB < 2TiB, so you'll still be able to boot off those drives. -- Joshua Baker-LePain QB3 Shared Cluster Sysadmin UCSF From prentice at ias.edu Thu Jul 3 11:09:27 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486B9D4E.80405@ias.edu> <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> Message-ID: <486D15D7.40006@ias.edu> Joshua Baker-LePain wrote: > On Thu, 3 Jul 2008 at 9:34am, Tim Cutts wrote >> On 2 Jul 2008, at 4:22 pm, Prentice Bisbal wrote: > >>> B. Red Hat has done such a good job of spreading FUD about the other >>> Linux distros, management has a cow if you tell them you're installing >>> something other than RH. > > Erm, do you have any examples of that? All I see is RH a) trying to > sell their product (nothing wrong with that) and b) in general, being a > pretty good member of the OSS community. You say tomato, I say, uhhh, tomato. > >> Fedora did not exist, and I'm still not sure how separate from Red Hat >> Fedora and CentOS really are, but that's probably just my ignorance. >> The fact that > > CentOS is in no way officially associated with Red Hat. At all. They > use the freely available RHEL SRPMs to build the distribution, and they > report bugs upstream when they find them. But that's it. True, but CentOS is irrefutably tied to RH. If RHEL disappears, so will CentOS, -- Prentice From prentice at ias.edu Thu Jul 3 11:13:46 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: <486CFF55.7080705@charter.net> References: <486CDCBC.8030706@ias.edu> <486CE194.2060503@charter.net> <486CE8F0.7010602@ias.edu> <486CFF55.7080705@charter.net> Message-ID: <486D16DA.90202@ias.edu> Jeffrey B. Layton wrote: > Prentice Bisbal wrote: >>> You don't need to go this far. Just set up the hostfile to use the same >>> host name several times. Just make sure you don't start swapping :) >>> >>> Jeff >>> >>> >> >> Unless the problem is configuring interhost communications correctly. >> > > Then how does using VM's fix this problem? I'm not sure I understand you > comment. > I was thinking like a SysAdmin, not a developer. I've had plenty of experiences where nodes aren't communicating b/c someone hosed up the machines file, .rhosts file, and other stuff like that. I guess it's not really relevant if your focusing only on developing an MPI app, not not administering a cluster. -- Prentice PS - I just realized who I was talking to. Thanks for the articles on parallel filesystems. Very good. I read them all. From prentice at ias.edu Thu Jul 3 11:22:11 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486B9D4E.80405@ias.edu> <05A873CF-6D66-4B3B-9E63-B74B7D36D10B@sanger.ac.uk> Message-ID: <486D18D3.6090207@ias.edu> Tim Cutts wrote: > > On 3 Jul 2008, at 5:09 pm, Joshua Baker-LePain wrote: > >> On Thu, 3 Jul 2008 at 9:34am, Tim Cutts wrote >>> On 2 Jul 2008, at 4:22 pm, Prentice Bisbal wrote: >> >>>> B. Red Hat has done such a good job of spreading FUD about the other >>>> Linux distros, management has a cow if you tell them you're installing >>>> something other than RH. >> >> Erm, do you have any examples of that? All I see is RH a) trying to >> sell their product (nothing wrong with that) and b) in general, being >> a pretty good member of the OSS community. > > I agree - I've never seen FUD from Red Hat, but then I don't have much > to do with them. > A) I remember reading propaganda (you might call it advertising) saying that RHEL was the only Linux stable and robust enough for enterprise applications. I don't keep all my old Linux Journals or archives of www.redhat.com, so I can't provide concrete examples. Surely someone else must have read stuff like that. Anyone? Anyone... Beuller? I started on Red Hat, and still use RH and CentOS and other RH derivatives, I'm not a fanatic of some other distro (or any distro, for that matter) B) Is making a name for yourself by giving away Red Hat Linux, then suddenly pulling it off the market and trying to force people to pay hundreds of dollars for your "enterprise" version being a good member? I know they give away the SRPMS (because they HAVE to), but have you ever tried rebuilding them on your own? It's not trivial. And Fedora, well... it has it's detractors. -- Prentice From lindahl at pbm.com Thu Jul 3 13:40:13 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:27 2010 Subject: [Beowulf] A press release In-Reply-To: <486CD643.1050904@ias.edu> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> Message-ID: <20080703204012.GA29534@bx9.net> On Thu, Jul 03, 2008 at 09:38:11AM -0400, Prentice Bisbal wrote: > Here's another reason to use tarballs: I have /usr/local shared to all > my systems with with NFS. If want to install the lastest version of > firefox, you can just do this: FWIW, the "rpm way" to do this is (ok, there's more than one way): * throw the rpm into your local repo, run createrepo * pdsh yum -y update Given that 99% of your software is RPMs, having 1% different can be a pain. As long as you can get rpms, of course. And you can avoid yet another NFS filesystem -- I have none at my company, which reduces the monitoring and fixing that I need to do. I'll also note that properly-configured local perl packages are not installed in a place where RPMs smash them (/usr/lib/perl5/vendor_perl vs. /usr/lib/perl5/site_perl). And you can find many perl rpms at rpmforge and atrpms, with accurate dependency info. -- greg From lindahl at pbm.com Thu Jul 3 13:50:09 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <20080703205008.GB29534@bx9.net> On Thu, Jul 03, 2008 at 01:45:44PM -0400, Robert G. Brown wrote: > This is one reason that people use PVM and MPI and so on. It can be > argued (and has been argued on this list IIRC) that raw networking code > will always result in a faster parallel program, all things being equal, > because encapsulating it in higher level abstractions always comes at a > cost Yes, but there is plenty of proof that MPI can be extremely low overhead -- see InfiniPath MPI. Now if you insist on using TCP for your MPI and your "bare metal" code, you may beat your MPI because it's not so good at TCP. But it's probably cheaper to fix the MPI than re-invent the wheel many times. There's plenty of good code you can study for writing good network servers -- the original irc server is a pretty good event-driven non-blocking program. strace is your friend, too, that's the way you can see that MPICH-1's TCP driver isn't so hot. -- greg From gdjacobs at gmail.com Thu Jul 3 13:55:14 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: dealing with lots of sockets In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> <87skus874a.fsf_-_@snark.cb.piermont.com> <87mykz4vyo.fsf@snark.cb.piermont.com> Message-ID: <486D3CB2.1090903@gmail.com> Bogdan Costescu wrote: > On Wed, 2 Jul 2008, Perry E. Metzger wrote: > >> Event driven programming typically uses registered callbacks that are >> triggered by a central "Event Loop" when events happen. In such a >> system, one never blocks for anything -- all activity is performed in >> callbacks, and one simply returns from a callback if one can't proceed >> further. > > And here is one of the problems that event driven programming can't > really solve: separation between the central event loop and the code to > run when events happen. fork() allows the newly created process to > proceed at its own will and possibly doing its own mistakes (like buffer > overflows) in its own address space - the parent process is not affected > in any way and this allows f.e. daemons to run their core loop with > administrative priviledges while the real work can be done as a dumb user. Which is the reason why djbdns, qmail and postfix do things the way they do. Sendmail X will be going this way. From rgb at phy.duke.edu Thu Jul 3 14:06:03 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <20080703204012.GA29534@bx9.net> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> <20080703204012.GA29534@bx9.net> Message-ID: On Thu, 3 Jul 2008, Greg Lindahl wrote: > On Thu, Jul 03, 2008 at 09:38:11AM -0400, Prentice Bisbal wrote: > >> Here's another reason to use tarballs: I have /usr/local shared to all >> my systems with with NFS. If want to install the lastest version of >> firefox, you can just do this: > > FWIW, the "rpm way" to do this is (ok, there's more than one way): > > * throw the rpm into your local repo, run createrepo > * pdsh yum -y update > > Given that 99% of your software is RPMs, having 1% different can be a > pain. As long as you can get rpms, of course. And you can avoid yet > another NFS filesystem -- I have none at my company, which reduces > the monitoring and fixing that I need to do. ...and it can break the hell out of the elaborate dependency system if you go installing random libraries in e.g. /usr/local, or worse, overwrite an installed rpm in /usr with a different version. Entropy is a serious enemy to scalable sysadmin. The point of package management is to avoid it, and stay on the thin edge of optimally scalable LAN administration. > I'll also note that properly-configured local perl packages are not > installed in a place where RPMs smash them (/usr/lib/perl5/vendor_perl > vs. /usr/lib/perl5/site_perl). And you can find many perl rpms at > rpmforge and atrpms, with accurate dependency info. We TRY to install "everything" in rpm format. It is pretty easy to wrap up even scripts and third party stuff as an rpm, and the extra work is repaid N times over when you drop the rpm into a repo and it is installed/updated N times automagically. rgb > > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From raysonlogin at gmail.com Thu Jul 3 09:57:29 2008 From: raysonlogin at gmail.com (Rayson Ho) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: Message-ID: <73a01bf20807030957m2c1f6c6dm4869317395dc2a06@mail.gmail.com> The whole Big Buck Bunny movie was rendering on a Grid Engine cluster (aka. network.com). Big Buck Bunny is open content, and the software used to create the film is opensource. http://en.wikipedia.org/wiki/Big_Buck_Bunny http://www.bigbuckbunny.org/ Rayson On Tue, Jul 1, 2008 at 5:39 AM, Jon Aquilina wrote: > does anyone know of any rendering software that will work with a cluster? > > -- > Jonathan Aquilina > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From toon at moene.indiv.nluug.nl Tue Jul 1 12:38:36 2008 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: "hobbyists" still OT In-Reply-To: <1953527533.79861214451795299.JavaMail.root@zimbra.vpac.org> References: <1953527533.79861214451795299.JavaMail.root@zimbra.vpac.org> Message-ID: <486A87BC.8040202@moene.indiv.nluug.nl> Chris Samuel wrote: > ----- "Prentice Bisbal" wrote: > >> The United States alone produces enough grain to feed the entire >> world. > > It is probably worth pointing out that, as a recent > New Scientist article mentioned, a major part for the > rise in grain prices is due the rising demand for meat > from around the world. One marketeer to another: "New is an old word; find a new word for it." > This is, of course, a very inefficient conversion method > of solar power into human food. The 70s called - they want their argument back. Sigh, I am too old for this world. -- Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.indiv.nluug.nl/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html From toon at moene.indiv.nluug.nl Tue Jul 1 13:48:34 2008 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Fri Mar 19 01:07:28 2010 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <486A1F7B.9080408@tamu.edu> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <1214864562.6912.29.camel@Vigor13> <486A1F7B.9080408@tamu.edu> Message-ID: <486A9822.7000902@moene.indiv.nluug.nl> Gerry Creager wrote: > In the US, at least for academic institutions and hobbyists, surface and > upper air observations of the sort you describe are generally available > for incorporation into models for data assimilation. Models are > generally forced and bounded using model data from other atmospheric > models, also available. As I understand it from colleagues in Europe, > getting similar data over there is more problemmatical. Exactly ! And what happens in Europe is that companies take the freely available US data, use it to compete with US companies, and disregard the (meteorological superior) ECMWF data, because it is not free. A colleague of mine held some very unpopular talks in Reading, England, about this (according to his figures, 99 % of the meteorological data used in Europe originates from the US). -- Toon Moene - e-mail: toon@moene.indiv.nluug.nl - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.indiv.nluug.nl/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html From gregory.warnes at rochester.edu Tue Jul 1 17:50:10 2008 From: gregory.warnes at rochester.edu (Gregory Warnes) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <20080701193721.B6843B5404D@mx2.its.rochester.edu> Message-ID: On 7/1/08 3:25PM , "Mark Hahn" wrote: >> > Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. Some >> > distros use good kernels, some do not. I won't mention who I think is in >> the >> > latter category. > > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - > does that work out well? > Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, ... > is it a problem getting commercial > packages (pathscale/pgi/intel compilers, gaussian, etc) to run? > I?ve never had any major problems. Most linux vendors supply both RPM?s and .tar.gz installers, and I generally have better luck with the latter, even on RPM based systems anyway. > > the couple debian people I know tend to have more ideological motives > (which I do NOT impugn, except that I am personally more swayed by > practical, concrete reasons.) > My ?conversion? to use of Debian had little to do with ideological motives, and a lot more to do with minimizing the amount of time I had to take away from my research to support the Linux clusters I was maintaining at the time. Side note, one very nice thing about debian is the ability to upgrade a system in-place from one O/S release to another via apt-get dist-upgrade Much nicer than reinstalling the O/S as seems to be (used to be?) the norm with RPM-based systems -Greg > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Gregory R. Warnes, Ph.D Program Director Center for Computational Arts, Sciences, and Engineering University of Rochester Tel: 585-273-2794 Fax: 585-276-2097 Email: gregory.warnes@rochester.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080701/140e0e18/attachment.html From gregory.warnes at rochester.edu Tue Jul 1 22:29:39 2008 From: gregory.warnes at rochester.edu (Gregory Warnes) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <20080702050726.81FAA7343C2@mx6.its.rochester.edu> Message-ID: On 7/2/08 1:06AM , "Mark Hahn" wrote: >> > I?ve never had any major problems. Most linux vendors supply both RPM?s >> and >> > .tar.gz installers, and I generally have better luck with the latter, even >> > on RPM based systems anyway. > > interesting - I wonder why. the main difference would be that the rpm > format encodes dependencies... > The basic problem is that when folks build the .tar.gz files, they usually do a good job of explaining the dependencies and how to resolve them, while the equivalent RPM installer simply lists the dependencies with no hints about what packages are needed and where to get them. > > >>> >> the couple debian people I know tend to have more ideological motives >>> >> (which I do NOT impugn, except that I am personally more swayed by >>> >> practical, concrete reasons.) >>> >> >> > My ?conversion? to use of Debian had little to do with ideological motives, >> > and a lot more to do with minimizing the amount of time I had to take away >> > from my research to support the Linux clusters I was maintaining at the >> > time. > > again interesting, thanks. what sorts of things in rpm-based distros > consumed your time? > Well, a key component was obtaining, installing, and keeping open-source software components of the system up to date. Most other tasks were pretty equivalent, although things are organized somewhat differently between linux variants. In addition to automatically resolving dependencies for new packages, it keeps track of the dependencies of existing packages. This if one asks for package X that depends on library Y version N, but library Y version M >> > Side note, one very nice thing about debian is the ability to upgrade a >> > system in-place from one O/S release to another via >> > >> > apt-get dist-upgrade >> > >> > Much nicer than reinstalling the O/S as seems to be (used to be?) the norm >> > with RPM-based systems > > I've done major version upgrades using rpm, admittedly in the pre-fedora > days. it _is_ a nice capability - I'm a little surprised desktop-oriented > distros don't emphasize it... > On fundimental difference in philospohy explains both the fundimental differences between RPM and debian packages, and the reason for the lack of emphasis of in-place upgrades of desktop distros: vendor income. It is not in Red Hat?s financial interest to make it easy to upgrade a system in-place by an automated tool. They make money by selling new O/S versions. Consequently, Red Hat explicitly designed the RPM format to discourage in-place upgrades. The debian community, on the other hand, was and is run fundimentally by system administrators, whose best interest centers around minimizing the amount of time necessary to keep systems up to date. Consequently, debian?s package system was designed explicitly to make installation and updating of packages as painless as possible for the admin. Of course, other pressures have forced deviations from these fundimental viewpoints, but the patterns are still clearly visible. -Greg -- Gregory R. Warnes, Ph.D Program Director Center for Computational Arts, Sciences, and Engineering University of Rochester Tel: 585-273-2794 Fax: 585-276-2097 Email: gregory.warnes@rochester.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080702/ba67c773/attachment.html From steffen.grunewald at aei.mpg.de Wed Jul 2 00:01:13 2008 From: steffen.grunewald at aei.mpg.de (Steffen Grunewald) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87fxqtuzh8.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> Message-ID: <20080702070113.GU11428@casco.aei.mpg.de> On Tue, Jul 01, 2008 at 04:21:55PM -0400, Perry E. Metzger wrote: > > Henning Fehrmann writes: > >> Thus, your problem sounds rather odd. There is no obvious reason you > >> should be limited to 360 connections. > >> > >> Perhaps your problem is not what you think it is at all. Could you > >> explain it in more detail? > > > > I guess it has also something to do with the automounter. I am not able > > to increase this number. > > But even if the automounter would handle more we need to be able to > > use higher ports: > > netstat shows always ports below 1024. > > > > tcp 0 0 client:941 server:nfs > > > > We need to mount up to 1400 nfs exports. > > All NFS clients are connecting to a single port, not to a different > port for every NFS export. You do not need 1400 listening TCP ports on > a server to export 1400 different file systems. Only one port is > needed, whether you are exporting one file system or one million, just > as only one SMTP port is needed whether you are receiving mail from > one client or from one million. That's true for the server side, but not for the client side. Each client- server connection uses another (privileged) port *on the client* which is where the problem shows up. This particular setup comprises 1400 cluster nodes which all act as distributed storage. Files would be spread over all of them, and an application would sequentially access files (time series) which are located on different servers. (Call it NUSA, non-uniform storage architecture.) I guess it's time to go ahead and try a real cluster filesystem, or wait for NFS v4.1 to settle down. I understand that with several tens of TB a re-organisation of all data into a completely new tree would be tricky if not impossible. OTOH such things like glusterfs allow for building cluster fs's without moving data - gluster would just add a set of additional layers ("translators") on top of already existing physical fs's. I have followed glusterfs development for more than a year now, and while they are still working on their redundancy features, it should be useable for "quasi read-only" access. (Note that the underlying fs would be still accessible, for feeding data in; clients could have r/o access to the glusterfs namespace.) Version 1.4 is to be out in a couple of days. See www.gluster.org BTW, since I'm facing the same issue on a somewhat smaller scale, any other suggestion is appreciated. Cheers, Steffen (same institute, different location :) -- Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M?hlenberg 1, D-14476 Potsdam Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/ * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298} No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html From steffen.grunewald at aei.mpg.de Wed Jul 2 00:27:09 2008 From: steffen.grunewald at aei.mpg.de (Steffen Grunewald) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <486A6760.5010006@ias.edu> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <20080702072709.GX11428@casco.aei.mpg.de> On Tue, Jul 01, 2008 at 01:20:32PM -0400, Prentice Bisbal wrote: > And the Debian users can say the same thing about Red Hat users. Or SUSE > users. And if any still exist, the Slackware users could say the same > thing about the both of them. But then the Slackware users could also > point out that the first Linux distro was Slackware, so they are using > the one true Linux distro... Which isn't true. Don't you remember MCC Interim Linux, back in the old days of 0.95[abc] kernels? It didn't consist of tens of floppies (yet), but it *was* a distro. > If you want to have a religious war about which distro to use, go > somewhere else. I'm sure there are plenty of mailing lists and > newsgroups where I'm sure that happens every day. :-) > This is a mailing list about beowulf clusters, and the last time I > checked, you can create clusters using any Linux distribution you like, > or even non-Linux operating systems, such as IRIX, Solaris, etc. Even Windows... (duck & run) Steffen From softy.lofty.ilp at btinternet.com Wed Jul 2 02:20:09 2008 From: softy.lofty.ilp at btinternet.com (Ian Pascoe) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Small Distributed Clusters In-Reply-To: Message-ID: Hi all, Firstly before getting into the nitty gritty of my question, a bit of background. Myself and a friend are looking to set up initially two small clusters of 4 boxes each, using old surplus commodity hardware. The main purpose of the cluster is to hold data and perform calculations upon it - the data coming in from external sources. So far we've decided on a Ubuntu Server base with NFS linking the nodes together, and we're looking currently at how to perform the calculations - ie write our own software or adapt existing. However, the question I have relates to linking the two clusters together. For the majority of the time, they will be run automonously, but on occasions we believe they'll need to be run as a cohesive unit with jobs being passed between them, because we don't plan to duplicate the data across the clusters, but back up locally. Both will be connected to the Internet using ADSL and the limitation will be the upload speed of a maximum of 512Kbs. How would people suggest linking the two clusters together using a secure connection? Performance at this point is not in the equation, just the ability to securely connect. BTW Any thoughts too on a SQL Server that would cope well in this scenario? Thanks Ian From greg.byshenk at aoes.com Wed Jul 2 05:56:25 2008 From: greg.byshenk at aoes.com (Greg Byshenk) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <87wsk4ed20.fsf@snark.cb.piermont.com> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> Message-ID: <20080702125625.GE47386@gby2.aoes.com> On Wed, Jul 02, 2008 at 07:32:55AM -0400, Perry E. Metzger wrote: > "Jon Aquilina" writes: > > if i use blender how nicely does it work in a cluster? > I believe it works quite well. The "Helmer" minicluster uses blender, and appears to perform well. Also, Maya's 'muster' engine runs under Linux, and quite successfully. We use it in a mixed environment, where the render pool consists of both Windows workstations and Linux cluster nodes. Note, though, that like other commercial 3D products, Maya is expensive, and may not be suitable for a student project. -- Greg Byshenk From vernard at venger.net Wed Jul 2 07:57:43 2008 From: vernard at venger.net (Vernard Martin) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> Message-ID: <486B9767.1090205@venger.net> Jon Aquilina wrote: > my idea is more of for my thesis. if i am goign ot do anything like > this. vernard thanks for the link. whats it like in a cluster > environment? Ah. you are doing it for thesis work. Then money is probably very much a limiting factor. If you just need a renderer then you can try the PVMPOV which is a PVM enabled version of POVray (located at http://pvmpov.sourceforge.net) its a ray-tracer which is very computationally intensive but has one of the largest communities for help out there. It also has support for doing animation. There are many free animations that you can use to test it and possibly for your thesis as long as you give attribution there. There is also PVMegPOV and MPI-Povray as well although i'm not as personally familiar with them. Vernard Martin vcmarti@sph.emory.edu From eagles051387 at gmail.com Thu Jul 3 22:51:45 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <73a01bf20807030957m2c1f6c6dm4869317395dc2a06@mail.gmail.com> References: <73a01bf20807030957m2c1f6c6dm4869317395dc2a06@mail.gmail.com> Message-ID: kool ill have to try this stuff out on my old laptop which is now my testing machine. On Thu, Jul 3, 2008 at 6:57 PM, Rayson Ho wrote: > The whole Big Buck Bunny movie was rendering on a Grid Engine cluster > (aka. network.com). Big Buck Bunny is open content, and the software > used to create the film is opensource. > > http://en.wikipedia.org/wiki/Big_Buck_Bunny > http://www.bigbuckbunny.org/ > > Rayson > > > > On Tue, Jul 1, 2008 at 5:39 AM, Jon Aquilina > wrote: > > does anyone know of any rendering software that will work with a cluster? > > > > -- > > Jonathan Aquilina > > _______________________________________________ > > Beowulf mailing list, Beowulf@beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080704/fd094b5f/attachment.html From eagles051387 at gmail.com Thu Jul 3 23:58:31 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: References: <200806281735.m5SHZ8vS025843@bluewest.scyld.com> Message-ID: this is slightly off topic but im just wondering why spend thousands of dollars when u can just setup another server and backup everything to a raided hard drive array? On 7/2/08, Steve Cousins wrote: > > > > Just under 60MB/sec seems to be the maximum tape transport read/write >> limit. Pretty reliably the first write from the beginning of tape was a >> bit slower than writes started further into the tape. >> > > I believe LTO-3 is rated at 80 MB/sec without compression. Testing it on > our HP unit in an Overland library I get: > > WRITE: > > dd if=/dev/zero of=/dev/nst0 bs=512k count=10k > 10240+0 records in > 10240+0 records out > 5368709120 bytes (5.4 GB) copied, 71.8723 seconds, 74.7 MB/s > > READ: > > dd of=/dev/null if=/dev/nst0 bs=512k count=10k > 10240+0 records in > 10240+0 records out > 5368709120 bytes (5.4 GB) copied, 69.2487 seconds, 77.5 MB/s > > I used a 512K block size because that is what I use with our backups and it > has given optimal performance since the DLT-7000 days. > > Good luck, > > Steve > ______________________________________________________________________ > Steve Cousins, Ocean Modeling Group Email: cousins@umit.maine.edu > Marine Sciences, 452 Aubert Hall http://rocky.umeoce.maine.edu > Univ. of Maine, Orono, ME 04469 Phone: (207) 581-4302 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080704/a7d510a8/attachment.html From geoff at galitz.org Fri Jul 4 00:10:36 2008 From: geoff at galitz.org (Geoff Galitz) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: References: <200806281735.m5SHZ8vS025843@bluewest.scyld.com> Message-ID: <2BADCF250DCC4A6CAB307A8E4B0A4C32@geoffPC> Backing up to tape allows you to go back to a specific point in history. Particularly useful if you need to recover a file that has become corrupted or you need to rollback to a specific stage and you are unaware of that fact for a few days. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Jon Aquilina Sent: Freitag, 4. Juli 2008 08:59 To: cousins@umit.maine.edu Cc: beowulf@beowulf.org Subject: Re: [Beowulf] Re: OT: LTO Ultrium (3) throughput? this is slightly off topic but im just wondering why spend thousands of dollars when u can just setup another server and backup everything to a raided hard drive array? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080704/8349f6ff/attachment.html From geoff at galitz.org Fri Jul 4 00:23:04 2008 From: geoff at galitz.org (Geoff Galitz) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: References: <20080701193721.B6843B5404D@mx2.its.rochester.edu> Message-ID: <5753D20A3C7E4233B4FAF0D8670A0423@geoffPC> Just a nit: Most RPM based distros allow in-place upgrades between minor point releases using "yum update" or "yum upgrade" (they follow different rules on how to resolve obsolete packages). However, moving between major releases is still recommended via a CD or other non-in-place media, though there are people that have done it in-place you seriously risk inflicting harm to your system in this manner. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Gregory Warnes Sent: Mittwoch, 2. Juli 2008 02:50 To: Mark Hahn Cc: Beowulf Subject: Re: [Beowulf] A press release [stuff snipped] Side note, one very nice thing about debian is the ability to upgrade a system in-place from one O/S release to another via apt-get dist-upgrade Much nicer than reinstalling the O/S as seems to be (used to be?) the norm with RPM-based systems -Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080704/24e26468/attachment.html From carsten.aulbert at aei.mpg.de Fri Jul 4 00:26:04 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87wsk2lue4.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> <87wsk2lue4.fsf@snark.cb.piermont.com> Message-ID: <486DD08C.9070602@aei.mpg.de> Hi Perry, Perry E. Metzger wrote: > What about my kernel patch to use unprived ports? Did you try it? No sorry, this approach with just setting the limits seems much easier than installing 1300 new kernels ;) Sorry Carsten PS: With the new limits it *just* works. -- Dr. Carsten Aulbert - Max Planck Institut f?r Gravitationsphysik Callinstra?e 38, 30167 Hannover, Germany Fon: +49 511 762 17185, Fax: +49 511 762 17193 http://www.top500.org/system/9234 | http://www.top500.org/connfam/6/list/31 From eagles051387 at gmail.com Fri Jul 4 00:28:35 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: <2BADCF250DCC4A6CAB307A8E4B0A4C32@geoffPC> References: <200806281735.m5SHZ8vS025843@bluewest.scyld.com> <2BADCF250DCC4A6CAB307A8E4B0A4C32@geoffPC> Message-ID: would it be possible to back up to tape as well as raided hdd array? On 7/4/08, Geoff Galitz wrote: > > Backing up to tape allows you to go back to a specific point in history. > Particularly useful if you need to recover a file that has become corrupted > or you need to rollback to a specific stage and you are unaware of that fact > for a few days. > > > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] *On > Behalf Of *Jon Aquilina > *Sent:* Freitag, 4. Juli 2008 08:59 > *To:* cousins@umit.maine.edu > *Cc:* beowulf@beowulf.org > *Subject:* Re: [Beowulf] Re: OT: LTO Ultrium (3) throughput? > > > > this is slightly off topic but im just wondering why spend thousands of > dollars when u can just setup another server and backup everything to a > raided hard drive array? > > > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080704/d97a1b98/attachment.html From tjrc at sanger.ac.uk Fri Jul 4 01:17:17 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <5753D20A3C7E4233B4FAF0D8670A0423@geoffPC> References: <20080701193721.B6843B5404D@mx2.its.rochester.edu> <5753D20A3C7E4233B4FAF0D8670A0423@geoffPC> Message-ID: <2C7CE2AF-E2DB-444E-8F91-559062363FF7@sanger.ac.uk> On 4 Jul 2008, at 8:23 am, Geoff Galitz wrote: > > > Just a nit: > > > > Most RPM based distros allow in-place upgrades between minor point > releases > using "yum update" or "yum upgrade" (they follow different rules on > how to > resolve obsolete packages). However, moving between major releases > is still > recommended via a CD or other non-in-place media, though there are > people > that have done it in-place you seriously risk inflicting harm to > your system > in this manner. > But that's the whole point - why is that the case? It shouldn't be. If upgrading packages wrecks the system, then the package installation scripts are broken. They should spot the upgrade in progress and take appropriate action, depending on the previously installed version. This can be quite a detailed process for Debian packages, which is probably why they have fewer problems than Red Hat in this regard. See http://www.debian.org/doc/debian-policy/ch-maintainerscripts.html#s-mscriptsinstact if you're interested in how it works for Debian packages. OB clustering: For cluster nodes, we never do dist-upgrades, though. A reinstall from scratch is actually faster, so in the context of this list, the ability to upgrade in place isn't terribly important. A FAI install of a basic debian 4.0 image on a cluster node takes about two minutes, so there's not much point in going through the upgrade process, which takes considerably longer. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ajt at rri.sari.ac.uk Fri Jul 4 04:10:26 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Small Distributed Clusters In-Reply-To: References: Message-ID: <486E0522.2090600@rri.sari.ac.uk> Ian Pascoe wrote: > Hi all, > > Firstly before getting into the nitty gritty of my question, a bit of > background. > > Myself and a friend are looking to set up initially two small clusters of 4 > boxes each, using old surplus commodity hardware. The main purpose of the > cluster is to hold data and perform calculations upon it - the data coming > in from external sources. > > So far we've decided on a Ubuntu Server base with NFS linking the nodes > together, and we're looking currently at how to perform the calculations - > ie write our own software or adapt existing. Hello, Ian. Sharing files locally via NFS using UDP is fine, but although you can do NFS via TCP it's not recommended because it's an insecure protocol. You can tunnel it, but you might as well use "sshfs", which is what I do. > However, the question I have relates to linking the two clusters together. > For the majority of the time, they will be run automonously, but on > occasions we believe they'll need to be run as a cohesive unit with jobs > being passed between them, because we don't plan to duplicate the data > across the clusters, but back up locally. I suggest you have a look at "dsh" (Dancer's distributed shell) as a simple way to run programs across local and geographically separate nodes in your cluster. This is very simple, but works remarkably well, especially if you use SSH keys for password-less authentication. > Both will be connected to the Internet using ADSL and the limitation will be > the upload speed of a maximum of 512Kbs. Another issue, apart from the 'A' (Assymetric speed) if you're ADSL is that of setting up your routers to permit incoming connections on port 22, and having static IP addresses. This is straight forward, but does need to be done before your clusters can communicate. > How would people suggest linking the two clusters together using a secure > connection? Performance at this point is not in the equation, just the > ability to securely connect. I suggest linking them together via "sshfs" and "ssh/dsh", because doing things like SSI (Single System Image) or MPI (message Passing Interface) over ADSL will require many ports to be open/tunneled and will be slow. > BTW Any thoughts too on a SQL Server that would cope well in this scenario? I've tunneled MySQL via SSH and it works fine. You would be unwise to expose the ports used by SQL Server or MySQL to the Internet because they are very insecure. Of course, you could always use openVPN :-( Tony. -- Dr. A.J.Travis, | mailto:ajt@rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From ajt at rri.sari.ac.uk Fri Jul 4 04:44:41 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] /usr/local over NFS is okay, Joe In-Reply-To: <486CEF27.8090507@scalableinformatics.com> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> <486CE03D.40901@scalableinformatics.com> <486CE86B.90104@ias.edu> <486CEF27.8090507@scalableinformatics.com> Message-ID: <486E0D29.8060605@rri.sari.ac.uk> Joe Landman wrote: [...] > Yeah, it is ambiguous to a degree, but I figure that something named > /local is actually going to be physically local. It helps tremendously > when a user calls up with a problem, say that they can't see a file they > placed in /local/... on all nodes. Usually they get quiet for a moment > after saying that aloud, and then say "oh, never mind". :) > [...] Hello, Joe. Looks like some SunOS/Solaris veterans (like me) showing their colours here! Sun, as in the network is the computer, developed quite a lot of very good strategies for sharing files via NFS on diskless/dataless clients and this has been inherited in different ways by Linux distributions. In particular, Sun went out of their way to move a lot of things from /bin into /usr/bin precisely so it could be shared by NFS. I also agree with the widely used convention that '/usr/local' means local to the site, not the particular machine. It seems intuitively obvious to me that /local (i.e. in the root filesystem) is intended to be both local to a specific machine, and local to the site, whereas /usr/local is local to the site and may be shared via NFS but is not required to be. I have a bit of a problem with /opt, which is where 'optional' software is supposed to be installed. In the same way, that it is intuitively obvious to me that /opt is where optional software is installed on a specific machine, and /usr/opt may be shared via NFS but is not required to be shared. However, I have rather contradicted myself and done this on all our servers: /usr/local -> /opt/local I did this so I could use the /opt as a mount point in an NFS automounter map: It's not possible to automount /usr/local on /usr because, if you do, you hide the rest of /usr unless you use e.g. "unionfs" and that's a bit too much like hard work for me! Another reason I did this is to keep /usr/local out of the 'system' hierarchy, which makes upgrades easier because you don't need to worry about overwriting /usr/local during an upgrade installation. One thing that I value from my BSD/SunOS/Solaris days is /export, which is where ALL shared (exported) filesystems should be placed on NFS servers. I'm a real supporter of Debian/Ubuntu, but it drives me bonkers that Debian policy is to put home directories in /home. I put them in: /export/home And use /home as a mount point in an automounter map. This way machines can, in the well known BSD/Sun inspired way, share home directories: /home/hostname/username -> hostname:/export/home/username On a stand-alone host, I make a symbolic link: /home -> /export/home If, in future, this host needs to share home directories and mount other host's home directories, I then remove the symbolic link, install the automounter and use /home as the NFS mount point in the automounter map. Naturally, I don't always practice what I preach and recently I've been trying to work out to use the automounter the 'Debian' way ;-) So far I've not come up with anything that beats using /export/home! Tony. -- Dr. A.J.Travis, | mailto:ajt@rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From rgb at phy.duke.edu Fri Jul 4 05:13:06 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: References: Message-ID: On Wed, 2 Jul 2008, Gregory Warnes wrote: >> interesting - I wonder why. the main difference would be that the rpm >> format encodes dependencies... >> > The basic problem is that when folks build the .tar.gz files, they usually > do a good job of explaining the dependencies and how to resolve them, while > the equivalent RPM installer simply lists the dependencies with no hints > about what packages are needed and where to get them. Unless your RPM installer is yum, in which case it all simply works (or is no more trouble than it ever is to build a package so that it will simply work). > On fundimental difference in philospohy explains both the fundimental > differences between RPM and debian packages, and the reason for the lack of > emphasis of in-place upgrades of desktop distros: vendor income. It is not > in Red Hat?s financial interest to make it easy to upgrade a system in-place > by an automated tool. They make money by selling new O/S versions. > Consequently, Red Hat explicitly designed the RPM format to discourage > in-place upgrades. ???? Having been around when the founders (who live down the street, so to speak:-) gave talks at some of the old linux expos and on campus and so on, and recalling the early RH books and free distribution system, I think that this last statement is just nonsense. They didn't design it to discourage in place upgrades or encourage it -- they designed it to facilitate in place updates and the creation of a consistent and tested collection of packages, one that could be automatically installed. Kickstart rocked, and continues to rock. Dependency resolution for a la carte package installation sucked, and I do mean sucked, with RPMs until first yellow dog invented yup, and then Seth took over yup, hit a wall of sorts, and transmogrified it into yum. Yum, OTOH, rocks. You want a package, you say yum install package. How hard is that? rgb > > The debian community, on the other hand, was and is run fundimentally by > system administrators, whose best interest centers around minimizing the > amount of time necessary to keep systems up to date. Consequently, debian?s > package system was designed explicitly to make installation and updating of > packages as painless as possible for the admin. > > Of course, other pressures have forced deviations from these fundimental > viewpoints, but the patterns are still clearly visible. > > -Greg > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From tjrc at sanger.ac.uk Fri Jul 4 05:40:30 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] /usr/local over NFS is okay, Joe In-Reply-To: <486E0D29.8060605@rri.sari.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> <486CE03D.40901@scalableinformatics.com> <486CE86B.90104@ias.edu> <486CEF27.8090507@scalableinformatics.com> <486E0D29.8060605@rri.sari.ac.uk> Message-ID: <2B343F81-02F2-4285-997E-C94500D23FEB@sanger.ac.uk> On 4 Jul 2008, at 12:44 pm, Tony Travis wrote: > One thing that I value from my BSD/SunOS/Solaris days is /export, > which is where ALL shared (exported) filesystems should be placed on > NFS servers. I'm a real supporter of Debian/Ubuntu, but it drives me > bonkers that Debian policy is to put home directories in /home. I > put them in: > > /export/home If you want a Debian system to do that, just: sed -i -e 's:^DHOME=/home:DHOME=/export/home:' /etc/adduser.conf Job done. All users created after that will be in /export/home > Naturally, I don't always practice what I preach and recently I've > been trying to work out to use the automounter the 'Debian' way ;-) There is no automounter "Debian way", at least not in my view, and I maintain one of the automounter packages for them. :-) You're free to do whatever you like. am-utils does have an example configuration it can set up, but the package does not assume you're using it that way, and makes no demands on what you have automounted and where. I have two automount intercept points on my machines; /nfs for home directories and general data directories, and /software for the sort of common software that we've been discussing here. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From tjrc at sanger.ac.uk Fri Jul 4 05:45:18 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Small Distributed Clusters In-Reply-To: <486E0522.2090600@rri.sari.ac.uk> References: <486E0522.2090600@rri.sari.ac.uk> Message-ID: On 4 Jul 2008, at 12:10 pm, Tony Travis wrote: > I suggest you have a look at "dsh" (Dancer's distributed shell) as a > simple way to run programs across local and geographically separate > nodes in your cluster. This is very simple, but works remarkably > well, especially if you use SSH keys for password-less authentication. dsh is very good, I agree, as is clusterssh. They have overlapping, but distinct, purposes. clusterssh gives you multiple xterms but with a single input window so you type into all xterms simultaneously (although you can still put the focus in an individual xterm, and just type commands to that one machine alone). clusterssh is useful for those tasks which are more interactive. dsh is better for rapid parallel running of non-interactive commands on very large numbers of machines... clusterssh gets a little unwieldy with more than 30 or so machines at a time, even if you set the xterm font to eye-wateringly small and have a monitor the size of a football pitch. Both are available as packages in Ubuntu/Debian. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ajt at rri.sari.ac.uk Fri Jul 4 06:05:15 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] /usr/local over NFS is okay, Joe In-Reply-To: <2B343F81-02F2-4285-997E-C94500D23FEB@sanger.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> <486CD643.1050904@ias.edu> <486CE03D.40901@scalableinformatics.com> <486CE86B.90104@ias.edu> <486CEF27.8090507@scalableinformatics.com> <486E0D29.8060605@rri.sari.ac.uk> <2B343F81-02F2-4285-997E-C94500D23FEB@sanger.ac.uk> Message-ID: <486E200B.4060403@rri.sari.ac.uk> Tim Cutts wrote: > > On 4 Jul 2008, at 12:44 pm, Tony Travis wrote: > >> One thing that I value from my BSD/SunOS/Solaris days is /export, >> which is where ALL shared (exported) filesystems should be placed on >> NFS servers. I'm a real supporter of Debian/Ubuntu, but it drives me >> bonkers that Debian policy is to put home directories in /home. I put >> them in: >> >> /export/home > > If you want a Debian system to do that, just: > > sed -i -e 's:^DHOME=/home:DHOME=/export/home:' /etc/adduser.conf > > Job done. All users created after that will be in /export/home Hello, Tim. Thanks for the jungle tip, which I already know about... My point was that if you want peer2peer sharing of home directories in the timeless tradition of 4.xBSD and SunOS/Solaris, you need to do more than just decide that home directories should go in /export/home instead of /home. The convention I adopt is that *anything* exported via NFS goes in: /export However, this is not specified in the LFH (although "/export/usr" does appear in an example: http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/ The reason I'm posting this is to remind everyone that Sun worked out some good strategies for doing this sort of thing already and it's not a bad idea to put what you export/share via NFS into /export. >> Naturally, I don't always practice what I preach and recently I've >> been trying to work out to use the automounter the 'Debian' way ;-) > > There is no automounter "Debian way", at least not in my view, and I > maintain one of the automounter packages for them. :-) You're free to > do whatever you like. am-utils does have an example configuration it > can set up, but the package does not assume you're using it that way, > and makes no demands on what you have automounted and where. I have two > automount intercept points on my machines; /nfs for home directories and > general data directories, and /software for the sort of common software > that we've been discussing here. Yes, of course, we are free to put anything anywhere we want in Linux, but if you want other people to understand your conventions without long explanations then BSD/Sun have already set a pretty good example of how to go about it using /export and mount points like /home in automount maps. I'm just a little surprised that the LFH doesn't mention it :-) Tony. -- Dr. A.J.Travis, | mailto:ajt@rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From eagles051387 at gmail.com Fri Jul 4 06:47:31 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <5753D20A3C7E4233B4FAF0D8670A0423@geoffPC> References: <20080701193721.B6843B5404D@mx2.its.rochester.edu> <5753D20A3C7E4233B4FAF0D8670A0423@geoffPC> Message-ID: that also applies to the k/ubuntu as well it used to be you can edit the source list and do a complete dist upgrade. now that has change and requires the alternate installation cd. the first way was not worth the time because it broke stuff more then it was worth. the time spent on that could have been used for a totally clean install. the new method i have yet to try but from what i gather the wiki on it nereds some improvements since its rather ambiguous as to how to go about upgrading On 7/4/08, Geoff Galitz wrote: > > > > Just a nit: > > > > Most RPM based distros allow in-place upgrades between minor point releases > using "yum update" or "yum upgrade" (they follow different rules on how to > resolve obsolete packages). However, moving between major releases is still > recommended via a CD or other non-in-place media, though there are people > that have done it in-place you seriously risk inflicting harm to your system > in this manner. > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] *On > Behalf Of *Gregory Warnes > *Sent:* Mittwoch, 2. Juli 2008 02:50 > *To:* Mark Hahn > *Cc:* Beowulf > *Subject:* Re: [Beowulf] A press release > > > > [stuff snipped] > > > > Side note, one very nice thing about debian is the ability to upgrade a > system in-place from one O/S release to another via > > apt-get dist-upgrade > > Much nicer than reinstalling the O/S as seems to be (used to be?) the norm > with RPM-based systems > > -Greg > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080704/a848ba32/attachment.html From perry at piermont.com Fri Jul 4 06:54:39 2008 From: perry at piermont.com (Perry E. Metzger) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486DD08C.9070602@aei.mpg.de> (Carsten Aulbert's message of "Fri\, 04 Jul 2008 09\:26\:04 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> <87wsk2lue4.fsf@snark.cb.piermont.com> <486DD08C.9070602@aei.mpg.de> Message-ID: <87zloxiwkg.fsf@snark.cb.piermont.com> Carsten Aulbert writes: >> What about my kernel patch to use unprived ports? Did you try it? > > No sorry, this approach with just setting the limits seems much easier > than installing 1300 new kernels ;) Testing would be done with one machine. It would be foolish to test such a thing on your production network. What if it crashed everything in sight? Once you know you want something, though, you should be able to install it quickly. If installing kernels on your entire cluster is difficult, you are not managing you cluster properly. What if you really need to install new kernels? You should be able to replace arbitrary software on thousands of machines with a single easy command. If you can't, you aren't spending enough time on system automation. Perry From carsten.aulbert at aei.mpg.de Fri Jul 4 07:11:51 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <87zloxiwkg.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> <87wsk2lue4.fsf@snark.cb.piermont.com> <486DD08C.9070602@aei.mpg.de> <87zloxiwkg.fsf@snark.cb.piermont.com> Message-ID: <486E2FA7.4070100@aei.mpg.de> Perry E. Metzger wrote: > Testing would be done with one machine. It would be foolish to test > such a thing on your production network. What if it crashed everything > in sight? > Sure, testing always needs to start at the count of 1, then 2, 10, .... > Once you know you want something, though, you should be able to > install it quickly. If installing kernels on your entire cluster is > difficult, you are not managing you cluster properly. What if you > really need to install new kernels? > Did it yesterday, if I had pressed it it would have been done within ~10-15 minutes (along with other updates) > You should be able to replace arbitrary software on thousands of > machines with a single easy command. If you can't, you aren't spending > enough time on system automation. easy with dsh and fai softupdate :) Cheers Carsten From landman at scalableinformatics.com Fri Jul 4 07:41:26 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486E2FA7.4070100@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> <87wsk2lue4.fsf@snark.cb.piermont.com> <486DD08C.9070602@aei.mpg.de> <87zloxiwkg.fsf@snark.cb.piermont.com> <486E2FA7.4070100@aei.mpg.de> Message-ID: <486E3696.7030309@scalableinformatics.com> Carsten Aulbert wrote: > easy with dsh and fai softupdate :) trivial with pdsh pdsh apt-get install package or pdsh yum install package Clusters/systems of arbitrary size. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From tjrc at sanger.ac.uk Fri Jul 4 08:30:39 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <486E3696.7030309@scalableinformatics.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> <87wsk2lue4.fsf@snark.cb.piermont.com> <486DD08C.9070602@aei.mpg.de> <87zloxiwkg.fsf@snark.cb.piermont.com> <486E2FA7.4070100@aei.mpg.de> <486E3696.7030309@scalableinformatics.com> Message-ID: <7CCC45A5-FF25-41A8-BD10-E52A0DA9DAD5@sanger.ac.uk> On 4 Jul 2008, at 3:41 pm, Joe Landman wrote: > Carsten Aulbert wrote: > >> easy with dsh and fai softupdate :) > > trivial with pdsh > > pdsh apt-get install package Actually, that one could get you in a mess if the package is going to to ask questions. You might want to shut apt-get up. I actually use a small wrapper script for apt-get -- actually, I use aptitude these days, because its dependency handling is better -- which we call niagi: #!/bin/sh # # niagi - noninteractive aptitude install # DEBIAN_FRONTEND=noninteractive \ /usr/bin/aptitude -R -y \ -o Dpkg::Options::="--force-confdef" \ -o Dpkg::Options::="--force-confold" \ install "$@" This forces aptitude not to ask you what to do with the configuration files, if they've been locally modified, but also forces it to be conservative, always use your existing configuration file, if it's already present. Otherwise it configures the package defaults. Combine this little script with cfengine, dsh or whatever, and you have a winner. You can even use it to remove things, because aptitude install accepts suffixes to tell it to do other things. For example, say you wanted to replaced lprng with cups, you can do that in one fell swoop with: aptitude install lprng- cupsys (this is another reason I switched from apt-get to aptitude, although I'd caution against using aptitude like this in sarge - the version in etch and later is fine, but the old sarge one can bite occasionally) Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From landman at scalableinformatics.com Fri Jul 4 08:34:54 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <7CCC45A5-FF25-41A8-BD10-E52A0DA9DAD5@sanger.ac.uk> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> <87wsk2lue4.fsf@snark.cb.piermont.com> <486DD08C.9070602@aei.mpg.de> <87zloxiwkg.fsf@snark.cb.piermont.com> <486E2FA7.4070100@aei.mpg.de> <486E3696.7030309@scalableinformatics.com> <7CCC45A5-FF25-41A8-BD10-E52A0DA9DAD5@sanger.ac.uk> Message-ID: <486E431E.7040807@scalableinformatics.com> Tim Cutts wrote: > > On 4 Jul 2008, at 3:41 pm, Joe Landman wrote: > >> Carsten Aulbert wrote: >> >>> easy with dsh and fai softupdate :) >> >> trivial with pdsh >> >> pdsh apt-get install package > > Actually, that one could get you in a mess if the package is going to to > ask questions. You might want to shut apt-get up. I actually use a > small wrapper script for apt-get -- actually, I use aptitude these days, > because its dependency handling is better -- which we call niagi: oooohhhh apt-foo kung-fu ... wisdom received and greatly appreciated! -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From carsten.aulbert at aei.mpg.de Fri Jul 4 09:53:36 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] automount on high ports In-Reply-To: <7CCC45A5-FF25-41A8-BD10-E52A0DA9DAD5@sanger.ac.uk> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <20080702200448.GA17424@bx9.net> <87hcb76hpk.fsf@snark.cb.piermont.com> <20080702223714.GA5908@bx9.net> <486C6660.5070705@aei.mpg.de> <87wsk2lue4.fsf@snark.cb.piermont.com> <486DD08C.9070602@aei.mpg.de> <87zloxiwkg.fsf@snark.cb.piermont.com> <486E2FA7.4070100@aei.mpg.de> <486E3696.7030309@scalableinformatics.com> <7CCC45A5-FF25-41A8-BD10-E52A0DA9DAD5@sanger.ac.uk> Message-ID: <486E5590.7060401@aei.mpg.de> Hi Tim, Tim Cutts wrote: > >> trivial with pdsh >> >> pdsh apt-get install package > Well with dsh it's the same, but "our" way ensures that the nodes will have the exactly same set-up after a reinstallation ;) > DEBIAN_FRONTEND=noninteractive \ > /usr/bin/aptitude -R -y \ > -o Dpkg::Options::="--force-confdef" \ > -o Dpkg::Options::="--force-confold" \ > install "$@" > > This forces aptitude not to ask you what to do with the configuration > files, if they've been locally modified, but also forces it to be > conservative, always use your existing configuration file, if it's > already present. Otherwise it configures the package defaults. Combine > this little script with cfengine, dsh or whatever, and you have a winner. > Brutally you could also use "yes yes" ;) > You can even use it to remove things, because aptitude install accepts > suffixes to tell it to do other things. For example, say you wanted to > replaced lprng with cups, you can do that in one fell swoop with: > > aptitude install lprng- cupsys > > (this is another reason I switched from apt-get to aptitude, although > I'd caution against using aptitude like this in sarge - the version in > etch and later is fine, but the old sarge one can bite occasionally) yes, that's also my reason why I prefer aptitude over apt For completeness: http://www.informatik.uni-koeln.de/fai/ Cheers Carsten From rgb at phy.duke.edu Fri Jul 4 10:24:51 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <5753D20A3C7E4233B4FAF0D8670A0423@geoffPC> References: <20080701193721.B6843B5404D@mx2.its.rochester.edu> <5753D20A3C7E4233B4FAF0D8670A0423@geoffPC> Message-ID: On Fri, 4 Jul 2008, Geoff Galitz wrote: > > > Just a nit: > > > > Most RPM based distros allow in-place upgrades between minor point releases > using "yum update" or "yum upgrade" (they follow different rules on how to > resolve obsolete packages). However, moving between major releases is still > recommended via a CD or other non-in-place media, though there are people > that have done it in-place you seriously risk inflicting harm to your system > in this manner. Sure, although I've done it (actually, there are a LOT of people that have done it) and I've never heard of anybody actually screwing everything up. To some extent it depends on how the system was managed and how serious the changes are between major releases. If you installed a "standard" system and used only yum to install and update from a standard set of repos, then you have almost certainly avoided RPM hell and have a very high chance of succeeding with an upgrade, with of course some work likely to be required deciding what to do when packages disappear or major libraries move. That work is required for ANY system -- independent of packaging or manager -- when major libraries change and packages disappear and new tools appear. Installing from scratch simply ensures that the tools that are installed are consistent, but it still leaves one dealing with the your favorite one that has disappeared or the new one that you have to figure out or your favorite personal program that has to be rebuilt and maybe even hacked first to accomodate an new library interface. If, on the other hand, you installed your system, then built eighteen pieces of software on your own and installed them, overwriting libraries and configuration files that were installed from RPM, do a couple of rpm --force's, and manage in the process to move yourself deep into RPM hell, well, what is going to be able to safely upgrade that? I tend to reinstall upgrades most of the time instead of upgrade, but that's only because kickstart makes that so easy that it is actually faster AND safer than screwing around with a local upgrade, and sure, there is the possibility of trouble if you do it otherwise, and who likes trouble (even if you've never heard of anybody who has actually HAD trouble). rgb > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > > _____ > > From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On > Behalf Of Gregory Warnes > Sent: Mittwoch, 2. Juli 2008 02:50 > To: Mark Hahn > Cc: Beowulf > Subject: Re: [Beowulf] A press release > > > > [stuff snipped] > > > > Side note, one very nice thing about debian is the ability to upgrade a > system in-place from one O/S release to another via > > apt-get dist-upgrade > > Much nicer than reinstalling the O/S as seems to be (used to be?) the norm > with RPM-based systems > > -Greg > > > > > > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From lindahl at pbm.com Fri Jul 4 14:08:15 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: References: <20080702050726.81FAA7343C2@mx6.its.rochester.edu> Message-ID: <20080704210815.GA28048@bx9.net> On Wed, Jul 02, 2008 at 01:29:39AM -0400, Gregory Warnes wrote: > On fundimental difference in philospohy explains both the fundimental > differences between RPM and debian packages, and the reason for the lack of > emphasis of in-place upgrades of desktop distros: vendor income. It is not > in Red Hat?s financial interest to make it easy to upgrade a system in-place > by an automated tool. They make money by selling new O/S versions. > Consequently, Red Hat explicitly designed the RPM format to discourage > in-place upgrades. Please take off your tin hat. Red Hat sells by subscription, so, it doesn't matter which version of RHEL you are running, just the count of servers. See: https://www.redhat.com/apps/store/server/ and note that there are no version numbers mentioned. -- greg From gdjacobs at gmail.com Fri Jul 4 16:19:54 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: References: Message-ID: <486EB01A.5070209@gmail.com> Gregory Warnes wrote: > > > > On 7/1/08 3:25PM , "Mark Hahn" wrote: > > > Hmmm.... for me, its all about the kernel. Thats 90+% of the > battle. Some > > distros use good kernels, some do not. I won't mention who I think > is in the > > latter category. > > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - > does that work out well? > > Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, ... > > is it a problem getting commercial > packages (pathscale/pgi/intel compilers, gaussian, etc) to run? > > I?ve never had any major problems. Most linux vendors supply both RPM?s > and .tar.gz installers, and I generally have better luck with the > latter, even on RPM based systems anyway. > > > the couple debian people I know tend to have more ideological motives > (which I do NOT impugn, except that I am personally more swayed by > practical, concrete reasons.) > > My ?conversion? to use of Debian had little to do with ideological > motives, and a lot more to do with minimizing the amount of time I had > to take away from my research to support the Linux clusters I was > maintaining at the time. > > Side note, one very nice thing about debian is the ability to upgrade a > system in-place from one O/S release to another via > > apt-get dist-upgrade > > Much nicer than reinstalling the O/S as seems to be (used to be?) the > norm with RPM-based systems > > -Greg I did in place upgrades for RH8 machines to RH9 using APT-RPM, back in the day. I'm not sure about Yum, as I just haven't had cause to use RH/FC in some time. -- Geoffrey D. Jacobs From csamuel at vpac.org Sat Jul 5 02:49:36 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <827522005.154971215251080315.JavaMail.root@zimbra.vpac.org> Message-ID: <152656872.154991215251376682.JavaMail.root@zimbra.vpac.org> ----- "Jon Aquilina" wrote: > that also applies to the k/ubuntu as well it used to > be you can edit the source list and do a complete > dist upgrade. now that has change and requires the > alternate installation cd. Er, no it doesn't. https://help.ubuntu.com/community/HardyUpgrades/ The supported way for servers is with do-release-upgrade (from the update-manager-core package). cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Sat Jul 5 02:54:36 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: Message-ID: <842288399.155021215251676766.JavaMail.root@zimbra.vpac.org> ----- "Robert G. Brown" wrote: > ...and it can break the hell out of the elaborate > dependency system if you go installing random libraries > in e.g. /usr/local Oh indeed, our current method is to use: /usr/local/$package/$version and then use Modules to let people set up the appropriate environment for them. We have also reduced our customisations of users init scripts to just adding: module load vpac and then having the vpac modules load what we recommend as a default environment. When we change those settings we put a new (dated) vpac module in and so let users go back should they so wish. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Sat Jul 5 03:01:45 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <362688751.155051215251832582.JavaMail.root@zimbra.vpac.org> Message-ID: <2045038041.155071215252105917.JavaMail.root@zimbra.vpac.org> ----- "Joe Landman" wrote: > eeek!! something named local is shared??? No, /usr/local is local to the cluster, the compute nodes are just drones in the Borg collective. ;-) -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From eagles051387 at gmail.com Sat Jul 5 03:12:20 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <2045038041.155071215252105917.JavaMail.root@zimbra.vpac.org> References: <362688751.155051215251832582.JavaMail.root@zimbra.vpac.org> <2045038041.155071215252105917.JavaMail.root@zimbra.vpac.org> Message-ID: resistance is futile :p On Sat, Jul 5, 2008 at 12:01 PM, Chris Samuel wrote: > > ----- "Joe Landman" wrote: > > > eeek!! something named local is shared??? > > No, /usr/local is local to the cluster, the compute > nodes are just drones in the Borg collective. ;-) > > -- > Christopher Samuel - (03) 9925 4751 - Systems Manager > The Victorian Partnership for Advanced Computing > P.O. Box 201, Carlton South, VIC 3053, Australia > VPAC is a not-for-profit Registered Research Agency > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080705/7c4a76c1/attachment.html From csamuel at vpac.org Sat Jul 5 03:30:50 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <1501882596.155101215252266373.JavaMail.root@zimbra.vpac.org> Message-ID: <1124767276.155141215253850248.JavaMail.root@zimbra.vpac.org> ----- "Jon Aquilina" wrote: > one thing must not be forgotten though. in regards to pkging stuff for > the ubuntu variation once someone like you and me you upload it for > someone higher up on the chain to check and upload to the servers. so > basically someone is checking what someone else has packaged. Ubuntu has the concept of a Personal Package Archive (PPA) which will build x86 and AMD64 packages from a source package that you provide and builds a repo from them that you (and others) can apt-get from. https://help.launchpad.net/PPAQuickStart -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Sat Jul 5 03:34:52 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: Message-ID: <721042444.155171215254092247.JavaMail.root@zimbra.vpac.org> ----- "Mark Hahn" wrote: > >> I was hoping for some discussion of concrete issues. for > instance, > >> I have the impression debian uses something other than sysvinit - > >> does that work out well? > >> > > Debian uses standard sysvinit-style scripts in /etc/init.d, > /etc/rc0.d, ... > > thanks. I guess I was assuming that mainstream debian was like > ubuntu. Fedora has also adopted Upstart, and given that RHEL is said to be based off Fedora it'll be interesting to see whether this gets adopted there too.. http://fedoraproject.org/wiki/Features/Upstart cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Sat Jul 5 03:36:45 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] A press release In-Reply-To: <486A8B33.7020600@scalableinformatics.com> Message-ID: <660319876.155201215254205322.JavaMail.root@zimbra.vpac.org> ----- "Joe Landman" wrote: > Yeah ... can't escape this. I like some of the elements of > Ubuntu/Debian better than I do RHEL (the network configuration > in Debian is IMO sane, while in RHEL/Centos/SuSE it is not). > There are some aspects that are worse (no /etc/profile.d ... > so I add that back in by hand ). Shouldn't be necessary on Ubuntu these days: chris@quad:~$ cat /etc/issue Ubuntu 8.04.1 \n \l chris@quad:~$ dlocate /etc/profile.d base-files: /etc/profile.d cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Sat Jul 5 03:43:35 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Small Distributed Clusters In-Reply-To: Message-ID: <405946662.155231215254615483.JavaMail.root@zimbra.vpac.org> ----- "Tim Cutts" wrote: > clusterssh gets a little unwieldy with more than 30 or so > machines at a time, even if you set the xterm font to eye-wateringly > small and have a monitor the size of a football pitch. This was pretty much the conclusion of the folks who were using it to admin the 45 Linksys AP's running OpenWRT for LCA 2008, the auto-sized tiling of Windows was sub-optimal for their purposes (though it worked very well). cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Sat Jul 5 03:46:39 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: Message-ID: <2019408218.155261215254799082.JavaMail.root@zimbra.vpac.org> ----- "Jon Aquilina" wrote: > would it be possible to back up to tape as well as raided hdd array? Of course, this has been a feature of various backup systems (free and proprietary) for many years. cheers! Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From john.hearns at streamline-computing.com Sat Jul 5 04:16:24 2008 From: john.hearns at streamline-computing.com (John Hearns) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: <2BADCF250DCC4A6CAB307A8E4B0A4C32@geoffPC> References: <200806281735.m5SHZ8vS025843@bluewest.scyld.com> <2BADCF250DCC4A6CAB307A8E4B0A4C32@geoffPC> Message-ID: <1215256594.5035.3.camel@Vigor13> On Fri, 2008-07-04 at 09:10 +0200, Geoff Galitz wrote: > Backing up to tape allows you to go back to a specific point in > history. Particularly useful if you need to recover a file that has > become corrupted or you need to rollback to a specific stage and you > are unaware of that fact for a few days. dirvish allows you to do exactly this on a RAID based backup system: http://www.dirvish.org/ Dirvish has the concept of a "vault" which is defined to have a cerain lifetime (weeks, months, years...) You make backup copies to your vault - the smart thing being that any files which are unchanged since the last backup are links to the first copy of the file. So your vault size does not grow and grow endlessly. You can roll back to any given date. From csamuel at vpac.org Sat Jul 5 05:27:56 2008 From: csamuel at vpac.org (Chris Samuel) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: <1215256594.5035.3.camel@Vigor13> Message-ID: <2128671214.155311215260876971.JavaMail.root@zimbra.vpac.org> ----- "John Hearns" wrote: > - the smart thing being that any files which are unchanged since the > last backup are links to the first copy of the file. So your vault > size does not grow and grow endlessly. You can roll back to any given > date. FWIW BackupPC claims to do the same, extending that to duplicate copies across multiple machines. Of course then you want to be sure that the single copy you have on disk doesn't go bad.. http://backuppc.sourceforge.net/info.html cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From eagles051387 at gmail.com Sat Jul 5 07:22:09 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: <2128671214.155311215260876971.JavaMail.root@zimbra.vpac.org> References: <1215256594.5035.3.camel@Vigor13> <2128671214.155311215260876971.JavaMail.root@zimbra.vpac.org> Message-ID: what i dont understand is why someone would want to invest in something that is already quite expensive instead of using a method which is not expensive and in a way provides double redundency. On Sat, Jul 5, 2008 at 2:27 PM, Chris Samuel wrote: > > ----- "John Hearns" wrote: > > > - the smart thing being that any files which are unchanged since the > > last backup are links to the first copy of the file. So your vault > > size does not grow and grow endlessly. You can roll back to any given > > date. > > FWIW BackupPC claims to do the same, extending that to > duplicate copies across multiple machines. Of course > then you want to be sure that the single copy you have > on disk doesn't go bad.. > > http://backuppc.sourceforge.net/info.html > > cheers, > Chris > -- > Christopher Samuel - (03) 9925 4751 - Systems Manager > The Victorian Partnership for Advanced Computing > P.O. Box 201, Carlton South, VIC 3053, Australia > VPAC is a not-for-profit Registered Research Agency > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080705/fa331247/attachment.html From gerry.creager at tamu.edu Sat Jul 5 08:03:26 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Fri Mar 19 01:07:28 2010 Subject: [Beowulf] Re: OT: LTO Ultrium (3) throughput? In-Reply-To: References: <1215256594.5035.3.camel@Vigor13> <2128671214.155311215260876971.JavaMail.root@zimbra.vpac.org> Message-ID: <486F8D3E.5060003@tamu.edu> In my data management exploits, I'm inclined to have first-tier (iSCSI) disk, second-tier (AoE) disk, and third-tier (remote site) storage. If I can manage the remote site as another storage server farm, with rotating media, great. If I can manage it with robotic tape, great. I still duplicate Tier 2 data to Tier 3 for disaster recovery. A lot of this depends on how serious you are about being able to get your data back. Even though I can tap the ultimate archival site for the meteorological data I retain, translating it from netcdf to database is time-consuming and requires a human to babysit at times. Being able to respond nearly immediately to user requests for data from the Tier 1 data makes our services more valuable (and makes my work with data assimilation for weather models easier/faster). I retain some 90 days of data on Tier 1. Requests for data floated off to Tier 2 take longer to fill but the data holdings are, for all intents and purposes, permanent. Takes longer to get the data off but users know and understand that, and a simple e-mail tells 'em it's ready. Permanent, less-volatile Tier 3 storage is disaster-recovery stuff. Similarly, for hurricanes making US landfall, we also store data away on DVD to make its retrieval a (little) bit easier to locate. We use a database to maintain an inventory of where things are on disk, with significant file metadata, but sometimes it's easier to go to the DVD storage case to retrieve that stuff. If you're not as worried about how you'll recover your data after the inevitable storage failure (ask me about burning a RAID shelf down, some day), then not worrying about diversity in data storage/management isn't as big an issue. gerry Jon Aquilina wrote: > what i dont understand is why someone would want to invest in something > that is already quite expensive instead of using a method which is not > expensive and in a way provides double redundency. > > On Sat, Jul 5, 2008 at 2:27 PM, Chris Samuel > wrote: > > > ----- "John Hearns" > wrote: > > > - the smart thing being that any files which are unchanged since the > > last backup are links to the first copy of the file. So your vault > > size doe