From eagles051387 at gmail.com Tue Jul 1 00:26:15 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Tue, 1 Jul 2008 09:26:15 +0200 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <6D1C4C9B-432F-4547-93F4-391B0847951D@xs4all.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <6D1C4C9B-432F-4547-93F4-391B0847951D@xs4all.nl> Message-ID: not sure if this applies to all kinds of senarios that clusters are used in but isnt the more ram you have the better? On 6/30/08, Vincent Diepeveen wrote: > > Toon, > > Can you drop a line on how important RAM is for weather forecasting in > latest type of calculations you're performing? > > Thanks, > Vincent > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > Jim Lux wrote: >> >> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's >>> scale of operations. We face this all the time with NASA work. >>> Semiconductor manufacturers have no real reason to produce special purpose >>> or customized versions of their products for space use, because they can >>> sell all they can make to the consumer market. More than once, I've had a >>> phone call along the lines of this: >>> "Jim: I'm interested in your new ABC321 part." >>> "Rep: Great. I'll just send the NDA over and we can talk about it." >>> "Jim: Great, you have my email and my fax # is..." >>> "Rep: By the way, what sort of volume are you going to be using?" >>> "Jim: Oh, 10-12.." >>> "Rep: thousand per week, excellent..." >>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every >>> year." >>> "Rep: Oh..." >>> {Well, to be fair, it's not that bad, they don't hang up on you.. >>> >> >> Since about a year, it's been clear to me that weather forecasting (i.e., >> running a more or less sophisticated atmospheric model to provide weather >> predictions) is going to be "mainstream" in the sense that every business >> that needs such forecasts for its operations can simply run them in-house. >> >> Case in point: I bought a $1100 HP box (the obvious target group being >> teenage downloaders) which performs the HIRLAM limited area model *on the >> grid that we used until October 2006* in December last year. >> >> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. >> >> I wonder what effect this will have on CPU developments ... >> >> -- >> Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >> At home: http://moene.indiv.nluug.nl/~toon/ >> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at moonet.co.uk Tue Jul 1 01:20:56 2008 From: andrew at moonet.co.uk (andrew holway) Date: Tue, 1 Jul 2008 10:20:56 +0200 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> Message-ID: Hi Jon, We have our own stack which we stick on top of the customers favourite red hat clone. Usually Scientific Linux. Here is a bit more about it. http://www.clustervision.com/products_os.php We sell as a standalone product and it does quite well. I could even go so far to say that it is 'stack of choice' in many European institutions. We have done a couple of M$ installations too. Ta Andy On Sat, Jun 28, 2008 at 12:09 PM, Jon Aquilina wrote: > congrats. just wondering what distro is being used on your clusters? > > On Thu, Jun 26, 2008 at 8:52 PM, Joe Landman > wrote: >> >> andrew holway wrote: >>> >>> http://www.clustervision.com/pr_top500_uk.php >> >> cool ... congratulations to ClusterVision! >> >> -- >> Joseph Landman, Ph.D >> Founder and CEO >> Scalable Informatics LLC, >> email: landman at scalableinformatics.com >> web : http://www.scalableinformatics.com >> http://jackrabbit.scalableinformatics.com >> phone: +1 734 786 8423 >> fax : +1 866 888 3112 >> cell : +1 734 612 4615 >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > Jonathan Aquilina From Dan.Kidger at quadrics.com Tue Jul 1 01:42:59 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger at quadrics.com) Date: Tue, 1 Jul 2008 09:42:59 +0100 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> >Hi Jon, >We have our own stack which we stick on top of the customers favourite >red hat clone. Usually Scientific Linux. > >Here is a bit more about it. > >http://www.clustervision.com/products_os.php > >We sell as a standalone product and it does quite well. I could even >go so far to say that it is 'stack of choice' in many European >institutions. Every throught of getting a job in Sales and Marketing? :-) Daniel. On Sat, Jun 28, 2008 at 12:09 PM, Jon Aquilina wrote: > congrats. just wondering what distro is being used on your clusters? > > On Thu, Jun 26, 2008 at 8:52 PM, Joe Landman > wrote: >> >> andrew holway wrote: >>> >>> http://www.clustervision.com/pr_top500_uk.php >> >> cool ... congratulations to ClusterVision! >> >> -- >> Joseph Landman, Ph.D >> Founder and CEO >> Scalable Informatics LLC, From Dan.Kidger at quadrics.com Tue Jul 1 01:46:14 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger at quadrics.com) Date: Tue, 1 Jul 2008 09:46:14 +0100 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <1214864562.6912.29.camel@Vigor13> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <1214864562.6912.29.camel@Vigor13> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AD@quadbrsex1.quadrics.com> John is correct here. It is one thing to do long range climate prediction yourself using distributed computing and tweaking the stochastics based on a set of starting conditions, and another to try and work out if it will be sunny next Tuesday. Weather modelling is a different animal to CP- you need a supply of fresh input data - and a sophisticated infrastructure to harvest , collate, sanitise and feed these numbers into your computer model. Also with CP you typically run many instances concurrently which takes weeks/months to complete, but with WM, you have maybe 6 hours to run the whole job from start to finish which implies a closely coupled cluster. Daniel ------------------------------------------------------------- Dr. Daniel Kidger, Quadrics Ltd. daniel.kidger at quadrics.com One Bridewell St., Mobile: +44 (0)779 209 1851 Bristol, BS1 2AA, UK Office: +44 (0)117 915 5519 ----------------------- www.quadrics.com -------------------- -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of John Hearns Sent: 30 June 2008 23:23 To: beowulf at beowulf.org Subject: Re: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? On Mon, 2008-06-30 at 20:20 +0200, Toon Moene wrote: > > Since about a year, it's been clear to me that weather forecasting > (i.e., running a more or less sophisticated atmospheric model to provide > weather predictions) is going to be "mainstream" in the sense that every > business that needs such forecasts for its operations can simply run > them in-house. Garbage in, garbage out. By that I mean that the CPU horsepower may be more and more readily affordable for businesses like that - let's say it is an ice-cream wholesaler who would like to have a three day forecast to allow stocking of their outlets with ice cream. However, the models depend on input from sensor networks - not my area of expertise, but I should imagine manned and unmanned weather stations, ocean buoys to measure wave height, satellite sensors. Do we see such data sources being made freely available, and in real time (ie not archived data sets)?? _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From eagles051387 at gmail.com Tue Jul 1 02:28:59 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Tue, 1 Jul 2008 11:28:59 +0200 Subject: [Beowulf] A press release In-Reply-To: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: >We have our own stack which we stick on top of the customers favourite >red hat clone. Usually Scientific Linux. does it necessarily have to be a redhat clone. can it also be a debian based clone? On 7/1/08, Dan.Kidger at quadrics.com wrote: > > >Hi Jon, > > >We have our own stack which we stick on top of the customers favourite > >red hat clone. Usually Scientific Linux. > > > >Here is a bit more about it. > > > >http://www.clustervision.com/products_os.php > > > >We sell as a standalone product and it does quite well. I could even > >go so far to say that it is 'stack of choice' in many European > >institutions. > > Every throught of getting a job in Sales and Marketing? :-) > > > Daniel. > > > On Sat, Jun 28, 2008 at 12:09 PM, Jon Aquilina > wrote: > > congrats. just wondering what distro is being used on your clusters? > > > > On Thu, Jun 26, 2008 at 8:52 PM, Joe Landman > > wrote: > >> > >> andrew holway wrote: > >>> > >>> http://www.clustervision.com/pr_top500_uk.php > >> > >> cool ... congratulations to ClusterVision! > >> > >> -- > >> Joseph Landman, Ph.D > >> Founder and CEO > >> Scalable Informatics LLC, > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From henning.fehrmann at aei.mpg.de Tue Jul 1 02:36:43 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Tue, 1 Jul 2008 11:36:43 +0200 Subject: [Beowulf] automount on high ports Message-ID: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> Hello, we need to automount NFS directories on high ports to increase the number of possible mounts. Currently, we are limited up to ca 360 mounts. The NFS-server exports with the option 'insecure' but the mounts still end up on ports <1024 on the client side. Is there a way to enable automounts on higher ports? How can it be done manually: mount -t nfs -o ....? We are using autofs version 5. Thank you, Henning From steve_heaton at exemail.com.au Tue Jul 1 03:28:40 2008 From: steve_heaton at exemail.com.au (Particle Boy) Date: Tue, 01 Jul 2008 20:28:40 +1000 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf], Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <200807010728.m617S3Uc011226@bluewest.scyld.com> References: <200807010728.m617S3Uc011226@bluewest.scyld.com> Message-ID: <486A06D8.2050705@exemail.com.au> Date: Mon, 30 Jun 2008 23:22:32 +0100 From: John Hearns > However, the models depend on input from sensor networks - not my area > of expertise, but I should imagine manned and unmanned weather >stations, >ocean buoys to measure wave height, satellite sensors. >Do we see such data sources being made freely available, and in real >time (ie not archived data sets)?? G'day John and all In a nutshell yes, you can can get sets of initial conditions from various agencies around the globe. The NCEP at NOAA is a great resource. SOO/STRC at UCAR packages WRF EMS with the pointers built right in for the various feeds :) Cheers Stevo From eagles051387 at gmail.com Tue Jul 1 03:38:52 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Tue, 1 Jul 2008 12:38:52 +0200 Subject: [Beowulf] open mosix alternative Message-ID: does anyone know an altenative to openmosix?? would it be worth reviving the development of the kernel? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From eagles051387 at gmail.com Tue Jul 1 03:39:48 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Tue, 1 Jul 2008 12:39:48 +0200 Subject: [Beowulf] software for compatible with a cluster Message-ID: does anyone know of any rendering software that will work with a cluster? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From geoff at galitz.org Tue Jul 1 04:04:48 2008 From: geoff at galitz.org (Geoff Galitz) Date: Tue, 1 Jul 2008 13:04:48 +0200 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: Message-ID: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> I know people who use Houdini for this: http://www.sidefx.com/index.php I cannot vouch for how well it works or what is involved, though. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Jon Aquilina Sent: Dienstag, 1. Juli 2008 12:40 To: Beowulf Mailing List Subject: [Beowulf] software for compatible with a cluster does anyone know of any rendering software that will work with a cluster? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From eagles051387 at gmail.com Tue Jul 1 04:26:38 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Tue, 1 Jul 2008 13:26:38 +0200 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> Message-ID: reason i am asking is because i would like to setup a rendering cluster and provide rendering services. does this also work for 3d animated movies that require rendering or does one need somethin entierly different for that? On 7/1/08, Geoff Galitz wrote: > > > > > > I know people who use Houdini for this: > > > > http://www.sidefx.com/index.php > > > > I cannot vouch for how well it works or what is involved, though. > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] *On > Behalf Of *Jon Aquilina > *Sent:* Dienstag, 1. Juli 2008 12:40 > *To:* Beowulf Mailing List > *Subject:* [Beowulf] software for compatible with a cluster > > > > does anyone know of any rendering software that will work with a cluster? > > -- > Jonathan Aquilina > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From gerry.creager at tamu.edu Tue Jul 1 04:59:03 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue, 01 Jul 2008 06:59:03 -0500 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48694BD5.5090303@moene.indiv.nluug.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <48693DCA.3010903@tamu.edu> <48694BD5.5090303@moene.indiv.nluug.nl> Message-ID: <486A1C07.9050208@tamu.edu> Toon Moene wrote: > Gerry Creager wrote: > >> I'm running WRF on ranger, the 580 TF Sun cluster at utexas.edu. I >> can complete the WRF single domain run, using 384 cores in ~30 min >> wall clock time. At the WRF Users Conference last week, the number of >> folks I talked to running WRF on workstations or "operationally" on >> 16-64 core clusters was impressive. I suspect a lot of desktop >> weather forecasting will, as you suggest, become the norm. The >> question, then, is: Are we looking at an enterprise where everyone >> with a gaming machine thinks they understand the model well enough to >> try predicting the weather, or are some still in awe of Lorenz' >> hypothesis about its complexity? > > This is where I think the pluses of the established meteorological > society will be: We know how to establish the quality of meteorological > models, how to compare them, how to dive into their parametrizations to > figure out the relevant differences and to solve the problems. > > Because we know this, we will be sought after. However, we will be > working inside the industry that needs this knowlegde, and outside > academia or institutionalized weather centres. This is already starting to happen. However, what I continue to see is managers wanting/expecting an absolute answer be generated numerically, and they're paying less attention to the modelers' concerns about the "goodness" of the model in certain settings. As an example, for our evening news programs, we've someone purporting to be a meteorologist. Over the last 10 years, the proportion of folks actually trained in meteorology has grown significantly, and talking to them one-on-one, they tend to recognize the limitations of the models they present. Yet, rather than saying the temperature tomorrow will be in a range from 93-98 deg F (with apologies to our brothers across the Pond) they're generally required to say, "96F" because their managers believe the public requires an absolute number. Perhaps, in some industries where statistical analysis is more integral, we'll see appropriate use of the data... gerry -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From gerry.creager at tamu.edu Tue Jul 1 05:13:47 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue, 01 Jul 2008 07:13:47 -0500 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <1214864562.6912.29.camel@Vigor13> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <1214864562.6912.29.camel@Vigor13> Message-ID: <486A1F7B.9080408@tamu.edu> John Hearns wrote: > On Mon, 2008-06-30 at 20:20 +0200, Toon Moene wrote: > >> Since about a year, it's been clear to me that weather forecasting >> (i.e., running a more or less sophisticated atmospheric model to provide >> weather predictions) is going to be "mainstream" in the sense that every >> business that needs such forecasts for its operations can simply run >> them in-house. > > Garbage in, garbage out. > > By that I mean that the CPU horsepower may be more and more readily > affordable for businesses like that - let's say it is an ice-cream > wholesaler who would like to have a three day forecast to allow stocking > of their outlets with ice cream. > However, the models depend on input from sensor networks - not my area > of expertise, but I should imagine manned and unmanned weather stations, > ocean buoys to measure wave height, satellite sensors. > Do we see such data sources being made freely available, and in real > time (ie not archived data sets)?? In the US, at least for academic institutions and hobbyists, surface and upper air observations of the sort you describe are generally available for incorporation into models for data assimilation. Models are generally forced and bounded using model data from other atmospheric models, also available. As I understand it from colleagues in Europe, getting similar data over there is more problemmatical. > Hopefully on topic the Manchester Guardian newspaper (you all know me > now for a Guardian reader) is running a "Free Our Data" campaign - to > pressurise Government to make freely available GIS type data and census > data which the Government has. I'm personally unconvinced of the > overwhelming justification for (say) the Ordnance Survey to give all of > its mapping data away for free. > http://www.freeourdata.org.uk/ Last summer, in Paris, I had a discussion on this subject with the Ordinance Survey's chief cartographer. It is their intent to free the data save reasonable costs of reproduction/maintenance as soon as they can establish these. In the US, this is the norm. In Texas, where I live, there's a site with State basemap data, highly accurate roadway data, land-use/land-cover, census, etc. that's just an FTP call away, or, if you want to pay roughly $10 per DVD, they'll burn a copy for you (cost of personnel for reproduction of the DVD). Some states have deemed their data proprietary. A lot have locked their data down somewhat since 9/11, as our Department of Homeland Security has called for restricting access to Critical Infrastructure data. Note that the last listing of Critical Infrastructure for Texas listed some 268 pages of delineation, description and justification. I fear it's been updated/expanded since then. It included banks, cemeteries, schools, bridges, water and sewer plants, shopping malls, high-traffic motor-ways, refrigerated facilities, supermarkets, gas stations, bridges, power transformer and generation sites, power transmission lines, petroleum pipelines, and gas stations, to name a few. There was discussion of adding individual residences to the list. As you can see, restricting access to "critical infrastructure" could result in a blank map. -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From m.janssens at opencfd.co.uk Tue Jul 1 05:48:39 2008 From: m.janssens at opencfd.co.uk (Mattijs Janssens) Date: Tue, 1 Jul 2008 13:48:39 +0100 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <200807011348.39343.m.janssens@opencfd.co.uk> On Tuesday 01 July 2008 11:38, Jon Aquilina wrote: > does anyone know an altenative to openmosix?? would it be worth reviving > the development of the kernel? maybe http://www.kerrighed.org (and that is all I know about it) Regards, Mattijs From geoff at galitz.org Tue Jul 1 05:50:33 2008 From: geoff at galitz.org (Geoff Galitz) Date: Tue, 1 Jul 2008 14:50:33 +0200 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: It seems that much of the effort that was going into openMOSIX is now going into KVM. http://kvm.qumranet.com/kvmwiki I think the idea is that MOSIX functionality is more easily developed and deployed in the form of virtual machines than directly at the kernel level. There are some trade-offs, of course... more overhead being chief among them but the virtualization model is clearly the overall favorite. It sure does beat the heck out of having to track each kernel individually. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Jon Aquilina Sent: Dienstag, 1. Juli 2008 12:39 To: Beowulf Mailing List Subject: [Beowulf] open mosix alternative does anyone know an altenative to openmosix?? would it be worth reviving the development of the kernel? -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.kosmowski at gmail.com Tue Jul 1 05:51:54 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Tue, 1 Jul 2008 08:51:54 -0400 Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: <200807010728.m617S3Ub011226@bluewest.scyld.com> References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: At some point there a cost-benefit analysis needs to be performed. If my cluster at peak usage only uses 4 Gb RAM per CPU (I live in single-core land still and do not yet differentiate between CPU and core) and my nodes all have 16 Gb per CPU then I am wasting RAM resources and would be better off buying new machines and physically transferring the RAM to and from them or running more jobs each distributed across fewer CPUs. Or saving on my electricity bill and powering down some nodes. As heretical as this last sounds, I'm tempted to throw in the towel on my PhD studies because I can no longer afford the power to run my three node cluster at home. Energy costs may end up being the straw that breaks this camel's back. Mark E. Kosmowski > From: "Jon Aquilina" > > not sure if this applies to all kinds of senarios that clusters are used in > but isnt the more ram you have the better? > > On 6/30/08, Vincent Diepeveen wrote: > > > > Toon, > > > > Can you drop a line on how important RAM is for weather forecasting in > > latest type of calculations you're performing? > > > > Thanks, > > Vincent > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > Jim Lux wrote: > >> > >> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's > >>> scale of operations. We face this all the time with NASA work. > >>> Semiconductor manufacturers have no real reason to produce special purpose > >>> or customized versions of their products for space use, because they can > >>> sell all they can make to the consumer market. More than once, I've had a > >>> phone call along the lines of this: > >>> "Jim: I'm interested in your new ABC321 part." > >>> "Rep: Great. I'll just send the NDA over and we can talk about it." > >>> "Jim: Great, you have my email and my fax # is..." > >>> "Rep: By the way, what sort of volume are you going to be using?" > >>> "Jim: Oh, 10-12.." > >>> "Rep: thousand per week, excellent..." > >>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every > >>> year." > >>> "Rep: Oh..." > >>> {Well, to be fair, it's not that bad, they don't hang up on you.. > >>> > >> > >> Since about a year, it's been clear to me that weather forecasting (i.e., > >> running a more or less sophisticated atmospheric model to provide weather > >> predictions) is going to be "mainstream" in the sense that every business > >> that needs such forecasts for its operations can simply run them in-house. > >> > >> Case in point: I bought a $1100 HP box (the obvious target group being > >> teenage downloaders) which performs the HIRLAM limited area model *on the > >> grid that we used until October 2006* in December last year. > >> > >> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. > >> > >> I wonder what effect this will have on CPU developments ... > >> > >> -- > >> Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 > >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > >> At home: http://moene.indiv.nluug.nl/~toon/ > >> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > >> > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > -- > Jonathan Aquilina From mark.kosmowski at gmail.com Tue Jul 1 05:53:35 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Tue, 1 Jul 2008 08:53:35 -0400 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Message-ID: And I forgot to change the subject. Apologies. On 7/1/08, Mark Kosmowski wrote: > At some point there a cost-benefit analysis needs to be performed. If > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > single-core land still and do not yet differentiate between CPU and > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > resources and would be better off buying new machines and physically > transferring the RAM to and from them or running more jobs each > distributed across fewer CPUs. Or saving on my electricity bill and > powering down some nodes. > > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. > > Mark E. Kosmowski > > > From: "Jon Aquilina" > > > > > not sure if this applies to all kinds of senarios that clusters are used in > > but isnt the more ram you have the better? > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > Toon, > > > > > > Can you drop a line on how important RAM is for weather forecasting in > > > latest type of calculations you're performing? > > > > > > Thanks, > > > Vincent > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > Jim Lux wrote: > > >> > > >> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's > > >>> scale of operations. We face this all the time with NASA work. > > >>> Semiconductor manufacturers have no real reason to produce special purpose > > >>> or customized versions of their products for space use, because they can > > >>> sell all they can make to the consumer market. More than once, I've had a > > >>> phone call along the lines of this: > > >>> "Jim: I'm interested in your new ABC321 part." > > >>> "Rep: Great. I'll just send the NDA over and we can talk about it." > > >>> "Jim: Great, you have my email and my fax # is..." > > >>> "Rep: By the way, what sort of volume are you going to be using?" > > >>> "Jim: Oh, 10-12.." > > >>> "Rep: thousand per week, excellent..." > > >>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every > > >>> year." > > >>> "Rep: Oh..." > > >>> {Well, to be fair, it's not that bad, they don't hang up on you.. > > >>> > > >> > > >> Since about a year, it's been clear to me that weather forecasting (i.e., > > >> running a more or less sophisticated atmospheric model to provide weather > > >> predictions) is going to be "mainstream" in the sense that every business > > >> that needs such forecasts for its operations can simply run them in-house. > > >> > > >> Case in point: I bought a $1100 HP box (the obvious target group being > > >> teenage downloaders) which performs the HIRLAM limited area model *on the > > >> grid that we used until October 2006* in December last year. > > >> > > >> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. > > >> > > >> I wonder what effect this will have on CPU developments ... > > >> > > >> -- > > >> Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 > > >> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > >> At home: http://moene.indiv.nluug.nl/~toon/ > > >> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > >> > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > -- > > Jonathan Aquilina > From geoff at galitz.org Tue Jul 1 05:54:26 2008 From: geoff at galitz.org (Geoff Galitz) Date: Tue, 1 Jul 2008 14:54:26 +0200 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> Message-ID: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> That is out of my field of expertise. Sounds like a question for professional digital artists. I can put you in touch some folks that most likely know the answer to your questions, if you like. Anybody know of any current approaches to this? Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: Jon Aquilina [mailto:eagles051387 at gmail.com] Sent: Dienstag, 1. Juli 2008 13:27 To: Geoff Galitz Cc: Beowulf Mailing List Subject: Re: [Beowulf] software for compatible with a cluster reason i am asking is because i would like to setup a rendering cluster and provide rendering services. does this also work for 3d animated movies that require rendering or does one need somethin entierly different for that? On 7/1/08, Geoff Galitz wrote: I know people who use Houdini for this: http://www.sidefx.com/index.php I cannot vouch for how well it works or what is involved, though. Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org _____ From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Jon Aquilina Sent: Dienstag, 1. Juli 2008 12:40 To: Beowulf Mailing List Subject: [Beowulf] software for compatible with a cluster does anyone know of any rendering software that will work with a cluster? -- Jonathan Aquilina -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From ajt at rri.sari.ac.uk Tue Jul 1 06:14:38 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Tue, 01 Jul 2008 14:14:38 +0100 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <486A2DBE.10302@rri.sari.ac.uk> Jon Aquilina wrote: > does anyone know an altenative to openmosix?? would it be worth reviving > the development of the kernel? Hello, Jonathan. I'm still running openMosix (linux-2.4.26-om1) and I did have an attempt at porting it to the 2.4.32 kernel so I could use SATA disks, but I couldn't get process migration to work. My deb's for rebuilding the openMosix kernel under Ubuntu 6.06.1 LTS are at: http://bioinformatics.rri.sari.ac.uk/openmosix We are currently evaluating Kerrighed as an alternative: http://www.kerrighed.org Kerrighed also forms the basis of 'XtreemOS': http://www.xtreemos.eu/ Although Kerrighed looks very promising, it is also quite fragile in our hands. If one node crashes, you lose the entire cluster. That said, the Kerrighed project is extremely well supported and I believe it will be a good alternative in the near future. We will continue to run openMosix in the short-term, but I may evaluate MOSIX2: http://www.mosix.org/ I was, previously, opposed to Mosix on idealogical grounds and loyal to Moshe Bar but to be fair to Mosix is now free for non-profit use and the source code is available (but not GPL). Please let me know if you are seriously considering reviving openMosix! Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From landman at scalableinformatics.com Tue Jul 1 07:00:06 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 01 Jul 2008 10:00:06 -0400 Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: <486A3866.7030302@scalableinformatics.com> Mark Kosmowski wrote: > At some point there a cost-benefit analysis needs to be performed. If > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > single-core land still and do not yet differentiate between CPU and > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > resources and would be better off buying new machines and physically > transferring the RAM to and from them or running more jobs each > distributed across fewer CPUs. Or saving on my electricity bill and > powering down some nodes. Possible, though if you do heavy IO even with single core chips, and you are running a 64 bit OS, the extra buffer cache is not to be rejected lightly. > > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. Which country are you in? You may be able to apply for "free" computing resources. Tera-grid in the US, other similar resources. Mark Hahn might give you pointers for Canada, and the folks at Streamline/Clustervision/... might be able to give you pointers for UK/EU. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From eagles051387 at gmail.com Tue Jul 1 07:18:50 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Tue, 1 Jul 2008 16:18:50 +0200 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> Message-ID: that would be greatly appreciated On 7/1/08, Geoff Galitz wrote: > > > > That is out of my field of expertise. Sounds like a question for > professional digital artists. I can put you in touch some folks that most > likely know the answer to your questions, if you like. > > > > Anybody know of any current approaches to this? > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* Jon Aquilina [mailto:eagles051387 at gmail.com] > *Sent:* Dienstag, 1. Juli 2008 13:27 > *To:* Geoff Galitz > *Cc:* Beowulf Mailing List > *Subject:* Re: [Beowulf] software for compatible with a cluster > > > > reason i am asking is because i would like to setup a rendering cluster and > provide rendering services. does this also work for 3d animated movies that > require rendering or does one need somethin entierly different for that? > > On 7/1/08, *Geoff Galitz* wrote: > > > > > > I know people who use Houdini for this: > > > > http://www.sidefx.com/index.php > > > > I cannot vouch for how well it works or what is involved, though. > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > ------------------------------ > > *From:* beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] *On > Behalf Of *Jon Aquilina > *Sent:* Dienstag, 1. Juli 2008 12:40 > *To:* Beowulf Mailing List > *Subject:* [Beowulf] software for compatible with a cluster > > > > does anyone know of any rendering software that will work with a cluster? > > -- > Jonathan Aquilina > > > > > -- > Jonathan Aquilina > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From vanallsburg at hope.edu Tue Jul 1 07:43:34 2008 From: vanallsburg at hope.edu (Paul Van Allsburg) Date: Tue, 01 Jul 2008 10:43:34 -0400 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> Message-ID: <486A4296.4050501@hope.edu> I'd like to do the same, as a project for a group of students... Please keep me in the loop? Thanks! Paul -- Paul Van Allsburg Computational Science & Modeling Facilitator Natural Sciences Division, Hope College 35 East 12th Street Holland, Michigan 49423 616-395-7292 http://www.hope.edu/academic/csm/ Jon Aquilina wrote: > that would be greatly appreciated > > On 7/1/08, *Geoff Galitz* > > wrote: > > > > That is out of my field of expertise. Sounds like a question for > professional digital artists. I can put you in touch some folks > that most likely know the answer to your questions, if you like. > > > > Anybody know of any current approaches to this? > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > > * From: * Jon Aquilina [mailto:eagles051387 at gmail.com > ] > *Sent:* Dienstag, 1. Juli 2008 13:27 > *To:* Geoff Galitz > *Cc:* Beowulf Mailing List > *Subject:* Re: [Beowulf] software for compatible with a cluster > > > > reason i am asking is because i would like to setup a rendering > cluster and provide rendering services. does this also work for 3d > animated movies that require rendering or does one need somethin > entierly different for that? > > On 7/1/08, *Geoff Galitz* > wrote: > > > > > > I know people who use Houdini for this: > > > > http://www.sidefx.com/index.php > > > > I cannot vouch for how well it works or what is involved, though. > > > > > > Geoff Galitz > Blankenheim NRW, Deutschland > http://www.galitz.org > > * From: * beowulf-bounces at beowulf.org > > [mailto:beowulf-bounces at beowulf.org > ] *On Behalf Of *Jon Aquilina > *Sent:* Dienstag, 1. Juli 2008 12:40 > *To:* Beowulf Mailing List > *Subject:* [Beowulf] software for compatible with a cluster > > > > does anyone know of any rendering software that will work with a > cluster? > > -- > Jonathan Aquilina > > > > > -- > Jonathan Aquilina > > > > > -- > Jonathan Aquilina From perry at piermont.com Tue Jul 1 07:44:48 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 10:44:48 -0400 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <48694432.4020608@scalableinformatics.com> (Joe Landman's message of "Mon\, 30 Jun 2008 16\:38\:10 -0400") References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <48693DCA.3010903@tamu.edu> <48694432.4020608@scalableinformatics.com> Message-ID: <87d4lxk6jj.fsf@snark.cb.piermont.com> Joe Landman writes: > I see a curious phenomenon going on in crash simulation and NVH. We > see an increasing "decoupling" if you will, between the detailed > issues of simulation and coding, and the end user using the simulation > system. That is, the users may know the engineering side, but don't > seem to grasp the finer aspects of the simulation ... what to take as > reasonably accurate, and what to grasp might not be. > > I don't see this in chemistry, in large part due to many of the users > also writing their own software. On the contrary. I know computational chemistry specialists who worry about users of the common commercial software (Gaussian, Jaguar, etc.) not knowing what to believe and what not to believe in the output. Since I've seen people in synthetic organic labs running the simulation software to design possible synthetic pathways without understanding the software, I think this worry is perfectly valid. The overwhelming majority of users are not computational chemists at all -- they're ordinary organic chemists, and they don't have a good gut feel for what the limitations of the tools are. I know of very few users of computational chemistry software who roll their own. Try reading the computational chemistry mailing lists for a little while, or reading the journals, and you'll get a feel for what the average user is like. There might be a lot of people writing software out there, but there are vastly more who just want to get answers and don't understand how the programs work at all. Perry -- Perry E. Metzger perry at piermont.com From perry at piermont.com Tue Jul 1 07:53:06 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 10:53:06 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> (Henning Fehrmann's message of "Tue\, 1 Jul 2008 11\:36\:43 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> Message-ID: <878wwlk65p.fsf@snark.cb.piermont.com> Henning Fehrmann writes: > we need to automount NFS directories on high ports to increase the > number of possible mounts. Currently, we are limited up to ca 360 mounts. A TCP socket is a 4-tuple of localhost:localport:remotehost:remoteport A given localhost:localport pair can speak to an unlimted array of remotehost:remoteport sets. For example, in theory, your SMTP port can get connections from up to 2^32 different hosts on each of 2^16 different sockets from each, for a total space of 2^48 connections to a single local socket number. This in no way restricts how many connections can come in to another port, either, because a given socket is again the full 4-tuple -- if you have an SSH port, it too can get 2^48 connections. Now, there is this (odd) convention that only root can open a socket below 1024, so hosts "trust" (what a bad idea) sockets under that number. You can still, however, get up to 1023 connections from any given remote host to a given local host's port. Thus, your problem sounds rather odd. There is no obvious reason you should be limited to 360 connections. Perhaps your problem is not what you think it is at all. Could you explain it in more detail? -- Perry E. Metzger perry at piermont.com From ajt at rri.sari.ac.uk Tue Jul 1 08:31:48 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Tue, 01 Jul 2008 16:31:48 +0100 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <486A4DE4.1090807@rri.sari.ac.uk> Geoff Galitz wrote: > [...] > I think the idea is that MOSIX functionality is more easily developed > and deployed in the form of virtual machines than directly at the kernel > level. There are some trade-offs, of course... more overhead being > chief among them but the virtualization model is clearly the overall > favorite. It sure does beat the heck out of having to track each kernel > individually. Hello, Geoff. MOSIX functionality is mainly about load-balancing between independent kernels, and avoiding severe memory depletion by migrating processes between kernels. In fact (open)MOSIX implements an SMP-like model, but with a high-latency interconect (usually GBit ethernet). There is no need to 'track' kernels, because the oM HPC extension does it for you. The principle objective of SSI computing is to use many small machines as if they are one big one. This is the opposite of virtualisation which uses one (or a few) BIG machines like a lot of small ones. It does this by virtually separating the kernels. There is some confusion about this because it *is* very convenient to teach about or develop and test SSI software on virtual compute nodes if you don't have a lot of real nodes, but it defeats the purpose of SSI to use this approach in production. You might be interested to know that one reason Moshe Bar gave when he announced the end of the openMosix project was that SMP is now so cheap that SSI clustering less of a factor in computing: http://sourceforge.net/forum/forum.php?forum_id=715406 I'm not sure I agree - I still find openMosix useful, and I'll continue using it on our Beowulf here until I find a better alternative. Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From tjrc at sanger.ac.uk Tue Jul 1 08:40:27 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Tue, 1 Jul 2008 16:40:27 +0100 Subject: [Beowulf] automount on high ports In-Reply-To: <878wwlk65p.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> Message-ID: On 1 Jul 2008, at 3:53 pm, Perry E. Metzger wrote: > > Henning Fehrmann writes: >> we need to automount NFS directories on high ports to increase the >> number of possible mounts. Currently, we are limited up to ca 360 >> mounts. > > A TCP socket is a 4-tuple of localhost:localport:remotehost:remoteport > > A given localhost:localport pair can speak to an unlimted array of > remotehost:remoteport sets. For example, in theory, your SMTP port can > get connections from up to 2^32 different hosts on each of 2^16 > different sockets from each, for a total space of 2^48 connections to > a single local socket number. This in no way restricts how many > connections can come in to another port, either, because a given > socket is again the full 4-tuple -- if you have an SSH port, it too > can get 2^48 connections. > > Now, there is this (odd) convention that only root can open a socket > below 1024, so hosts "trust" (what a bad idea) sockets under that > number. You can still, however, get up to 1023 connections from any > given remote host to a given local host's port. > > Thus, your problem sounds rather odd. There is no obvious reason you > should be limited to 360 connections. > > Perhaps your problem is not what you think it is at all. Could you > explain it in more detail? Certainly on my systems where I use the am-utils automounter, I find the limit on the number of simultaneously mounted filesystems is more in the region of 1500. I've been desperately trying to reduce the number of NFS filesystems we have though. Currently our automount map has about 600 entries, I think. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From perry at piermont.com Tue Jul 1 08:48:47 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 11:48:47 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: (Tim Cutts's message of "Tue\, 1 Jul 2008 16\:40\:27 +0100") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> Message-ID: <87od5hip0g.fsf@snark.cb.piermont.com> Tim Cutts writes: > Certainly on my systems where I use the am-utils automounter, I find > the limit on the number of simultaneously mounted filesystems is more > in the region of 1500. And that's doubtless not from TCP port issues but because of other kinds of resources being limited. > I've been desperately trying to reduce the number of NFS filesystems > we have though. Currently our automount map has about 600 entries, > I think. Sometimes that's reasonable. I've seen large sites where everyone has a workstation in front of them and all of the thousands of users get their home dir automounted when they sit in front of a box and log in. However, one notes that in such a situation, the automount maps have thousands or tens of thousands of entries, but any given machine generally only is mounting a few file systems. -- Perry E. Metzger perry at piermont.com From kilian at stanford.edu Tue Jul 1 08:49:42 2008 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Tue, 1 Jul 2008 08:49:42 -0700 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <200807010849.42415.kilian@stanford.edu> Hi Jon, On Tuesday 01 July 2008 03:38:52 am Jon Aquilina wrote: > does anyone know an altenative to openmosix?? You may want to check out OpenSSI: http://www.openssi.org As its name says, that's a SSI clustering solution, with unified process namespace, full process migration, load-balancing, single root filesystem, etc. A complete list of features is available at: http://wiki.openssi.org/go/Features Cheers, -- Kilian From henning.fehrmann at aei.mpg.de Tue Jul 1 09:47:47 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Tue, 1 Jul 2008 18:47:47 +0200 Subject: [Beowulf] automount on high ports In-Reply-To: <878wwlk65p.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> Message-ID: <20080701164747.GA15901@gretchen.aei.uni-hannover.de> On Tue, Jul 01, 2008 at 10:53:06AM -0400, Perry E. Metzger wrote: > > Henning Fehrmann writes: > > we need to automount NFS directories on high ports to increase the > > number of possible mounts. Currently, we are limited up to ca 360 mounts. > > > Thus, your problem sounds rather odd. There is no obvious reason you > should be limited to 360 connections. > > Perhaps your problem is not what you think it is at all. Could you > explain it in more detail? I guess it has also something to do with the automounter. I am not able to increase this number. But even if the automounter would handle more we need to be able to use higher ports: netstat shows always ports below 1024. tcp 0 0 client:941 server:nfs We need to mount up to 1400 nfs exports. Cheers Henning From hahn at mcmaster.ca Tue Jul 1 09:51:32 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 1 Jul 2008 12:51:32 -0400 (EDT) Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: >> We have our own stack which we stick on top of the customers favourite >> red hat clone. Usually Scientific Linux. > > does it necessarily have to be a redhat clone. can it also be a debian based > clone? but why? is there some concrete advantage to using Debian? I've never understood why Debian users tend to be very True Believer, or what it is that hooks them. From prentice at ias.edu Tue Jul 1 10:20:32 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 01 Jul 2008 13:20:32 -0400 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: <486A6760.5010006@ias.edu> Mark Hahn wrote: >>> We have our own stack which we stick on top of the customers favourite >>> red hat clone. Usually Scientific Linux. >> >> does it necessarily have to be a redhat clone. can it also be a debian >> based >> clone? > > but why? is there some concrete advantage to using Debian? > I've never understood why Debian users tend to be very True Believer, > or what it is that hooks them. And the Debian users can say the same thing about Red Hat users. Or SUSE users. And if any still exist, the Slackware users could say the same thing about the both of them. But then the Slackware users could also point out that the first Linux distro was Slackware, so they are using the one true Linux distro... If you want to have a religious war about which distro to use, go somewhere else. I'm sure there are plenty of mailing lists and newsgroups where I'm sure that happens every day. This is a mailing list about beowulf clusters, and the last time I checked, you can create clusters using any Linux distribution you like, or even non-Linux operating systems, such as IRIX, Solaris, etc. -- Prentice From landman at scalableinformatics.com Tue Jul 1 10:46:01 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 01 Jul 2008 13:46:01 -0400 Subject: [Beowulf] A press release In-Reply-To: <486A6760.5010006@ias.edu> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <486A6D59.7020704@scalableinformatics.com> Prentice Bisbal wrote: > Mark Hahn wrote: [...] > If you want to have a religious war about which distro to use, go > somewhere else. I'm sure there are plenty of mailing lists and > newsgroups where I'm sure that happens every day. Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. Some distros use good kernels, some do not. I won't mention who I think is in the latter category. FWIW: we tend to build systems and place our own kernel on them. Basically we want them to work, and not be surprised by bad things, like crashes due to 4k stacks or backported (mis)features. We also want them to have updated drivers, and NFS/file system bits. > This is a mailing list about beowulf clusters, and the last time I > checked, you can create clusters using any Linux distribution you like, > or even non-Linux operating systems, such as IRIX, Solaris, etc. With all due respect, I think Mark knows what this list is about. There are lots of folks out there using Fedora, RHEL, Ubuntu, Debian, SuSE, ... We generally don't care which distro is used. Only that the kernel is reasonable, stable under load, and supports updated file systems/network capability. Beowulf depends upon good kernels at the end of the day. You need high performance and stability throughout. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From thpierce at gmail.com Tue Jul 1 05:07:22 2008 From: thpierce at gmail.com (Tom Pierce) Date: Tue, 1 Jul 2008 08:07:22 -0400 Subject: [Beowulf] June New York/Jersey HPC users meeting Message-ID: <25e9e5ad0807010507s74ea33e7p42abeff3d275b5a2@mail.gmail.com> Dear Dan, First, you missed a enjoyable meeting with lively discussion, good pub food and beer. I hope we meet there again in July. I attended most of the meeting. My memory summarized it: Sun Grid Engine users were the majority at the meeting ( 60% SGE users, and 40% Torqur/Maui users) The installations of the two systems are different experiences. With SGE, you are about "half-done" after you install the system. The installation of Torque/Maui is more functional right out of the box. Both seem to have similar functionality when setup. SGE has Sun developers actively working on it, so the newest versions have more options. eg a Flexlm link for license management. Torque/Maui is open source, and has not been modified as often as SGE has. Altho cpusets, similar to SGE cpusets, have recently been added. Torque/Maui has commercial upgrades to Torque/Moab for large sites, or people who want paid support. (and Moab supports Flexlm license management). There seem to be more installations of Torque/Maui than there are of SGE, but that was just a discussion of perceptions. However, the history of PBS, up through Torque, means that there are a great many PBS scripts on the internet for job submissions of HPC applications. The discussion of MPI interfaces was ongoing. Neither system seemed to have an advantage. Torque has the OSC mpiexec script and SGE has some builtin hooks for MPI. The discussions mentioned Openmpi, LAM, MPICH, GM and no obvious resolution that one system was more functional or easier than the other for MPI codes. At the end, I would call it a "draw". Torque/Maui easier to setup and lots of examples vs SGE flexibility and Flexlm license mgt. Tom Daniel.Roberts at sanofi-aventis.com wrote: Anyone have minutes or conclusions to offer from this scheduler smack down? Thanks Dan -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] If you live or work in the New York/North Jersey Metropolitan area, mark your calender for this Thursday, June 19th. The NYCA-HUG (New York City Area HPC Users Group) will be trying to answer the ultimate question Torque or Sun Grid Engine? We will be discussing the pros/cons of each scheduler for HPC clusters. Come and add your experiences, wants, and rants. Then you decide. -------------- next part -------------- An HTML attachment was scrubbed... URL: From merc4krugger at gmail.com Tue Jul 1 06:27:44 2008 From: merc4krugger at gmail.com (Krugger) Date: Tue, 1 Jul 2008 14:27:44 +0100 Subject: [Beowulf] automount on high ports In-Reply-To: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> Message-ID: Hi, Am I understanding it correctly? You want to have more than 360 mounts in a single NFS client? And you want that client to be run on a non-privileged port? What you are doing doesn't make much sense to me, but you can try adding the option "lockd.udpport=32768 lockd.tcpport=32768" to your kernel flags so that the kernel puts the daemon lockd that handles NFS locks at the port you selected in the client side. I don't understand how changing the port will help you get more mounts in. I would actually suggest you review the maximum allowed filehandles for each process. You will also need and start services manually, something like: statd -p 32765 -o 32766 mountd -p 32767 If you use modules you need to reconfigure you modules with "options lockd nlm_udpport=32768 nlm_tcpport=32768" to your /etc/modules.conf If I am misunderstanding and you are having a maximum of 360 clients for your NFS server, then maybe you are having a network problem, because with NFS3 your clients will lose connection to the server when de UDP starts losing packets due to heavy I/O from the calculations if both happen on the same network. Maybe NFS v4 might help with TCP connections or/and some sort of shaping to make sure there is enough bandwith reservered for NFS to operate properly. Notice that all have differant ports 32765,32766,32767,32768 Krugger On Tue, Jul 1, 2008 at 10:36 AM, Henning Fehrmann wrote: > Hello, > > we need to automount NFS directories on high ports to increase the number of possible mounts. > Currently, we are limited up to ca 360 mounts. > > The NFS-server exports with the option 'insecure' but the mounts still end up on ports <1024 on the client side. > > Is there a way to enable automounts on higher ports? How can it be done manually: > mount -t nfs -o ....? > > We are using autofs version 5. > > Thank you, > Henning > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From vernard at venger.net Tue Jul 1 08:19:15 2008 From: vernard at venger.net (Vernard Martin) Date: Tue, 01 Jul 2008 11:19:15 -0400 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: References: Message-ID: <486A4AF3.9040108@venger.net> Jon Aquilina wrote: > does anyone know of any rendering software that will work with a cluster? The Big Daddy of them all, Pixar's RenderMan Pro Server is supported under Linux and is used by nearly everybody in Hollywood that does graphic rendering for movies. It ain't cheap but its pretty much the best there Check out https://renderman.pixar.com/products/techspecs/index.htm for more info. From gregory.warnes at rochester.edu Tue Jul 1 09:39:38 2008 From: gregory.warnes at rochester.edu (Gregory Warnes) Date: Tue, 01 Jul 2008 12:39:38 -0400 Subject: [Beowulf] open mosix alternative In-Reply-To: <200807010849.42415.kilian@stanford.edu> Message-ID: Or, of course, the original Mosix project. Ammon Barak is very amiable and willing to work with folks. http://www.mosix.org -Greg On 7/1/08 11:49AM , "Kilian CAVALOTTI" wrote: > Hi Jon, > > On Tuesday 01 July 2008 03:38:52 am Jon Aquilina wrote: >> > does anyone know an altenative to openmosix?? > > You may want to check out OpenSSI: http://www.openssi.org > > As its name says, that's a SSI clustering solution, with unified process > namespace, full process migration, load-balancing, single root > filesystem, etc. A complete list of features is available at: > http://wiki.openssi.org/go/Features > > Cheers, > -- > Kilian > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Gregory R. Warnes, Ph.D Program Director Center for Computational Arts, Sciences, and Engineering University of Rochester Tel: 585-273-2794 Fax: 585-276-2097 Email: gregory.warnes at rochester.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Tue Jul 1 11:06:34 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 01 Jul 2008 14:06:34 -0400 Subject: [Beowulf] open mosix alternative In-Reply-To: References: Message-ID: <486A722A.3000405@scalableinformatics.com> Hi Job Jon Aquilina wrote: > does anyone know an altenative to openmosix?? would it be worth reviving > the development of the kernel? OpenMOSIX was all about process migration between different independent OSes. You can still get some of that with Scyld, with OpenSSI, and a few others. If you prefer more of an SMP model (simpler programming), you should look at ScaleMP DSMs. Some on this list argue the shared memory programming is not easier than distributed memory programming, though I am not one of them who makes this argument. It has different challenges, costs and benefits than MPI. It has different limitations. Not so surprisingly, with the advent of many-core units, shared memory programming techniques are needed to get good performance within a single system. Disclosure: We are looking at these units for some of our work. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 cell : +1 734 612 4615 From smulcahy at aplpi.com Tue Jul 1 11:11:23 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Tue, 01 Jul 2008 19:11:23 +0100 Subject: [Beowulf] A press release In-Reply-To: <486A6D59.7020704@scalableinformatics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <486A734B.3000701@aplpi.com> Joe Landman wrote: > Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. > Some distros use good kernels, some do not. I won't mention who I think > is in the latter category. > .. > We generally don't care which distro is used. Only that the kernel is > reasonable, stable under load, and supports updated file systems/network > capability. This information would be most interesting to me and surely others on the list .. can you talk about of the distributions that provide "good kernels" if not about the others (and hey, theres hundreds of Linux distributions out there - http://lwn.net/Distributions/ so we couldn't infer the bad ones from your omissions ;) -stephen -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From geoff at galitz.org Tue Jul 1 12:05:54 2008 From: geoff at galitz.org (Geoff Galitz) Date: Tue, 1 Jul 2008 21:05:54 +0200 Subject: [Beowulf] Re: "hobbyists" In-Reply-To: <48693A89.3080605@moene.indiv.nluug.nl> References: <485920D8.2030309@ias.edu> <6.2.5.6.2.20080618164843.02b1bd30@jpl.nasa.gov> <200806190945.21604.kilian@stanford.edu><485A9520.2080508@scalableinformatics.com> <48693A89.3080605@moene.indiv.nluug.nl> Message-ID: <128FF5A06DBD4D74B8AA8CB6E4EF4B1F@geoffPC> Ohh... I was just waiting for the conversation to back to this. For an inside perspective: http://www.spiegel.de/international/europe/0,1518,562315,00.html Does that make me on-topic? -geoff Geoff Galitz Blankenheim NRW, Deutschland http://www.galitz.org -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Toon Moene Sent: Montag, 30. Juni 2008 21:57 To: Joe Landman Cc: beowulf at beowulf.org Subject: Re: [Beowulf] Re: "hobbyists" Joe Landman wrote: > Tactical nukes (aimed at armies) were on the table for a few of the NATO > scenarios involving responses to Soviet invasion of western Europe > (based upon some of the historical reading, though I am not sure how > serious they were). The western Europeans were understandably > un-enthusiastic about such scenarios. You bet we were. I was in the organization of the 400,000+ protest in Amsterdam in November. 1981. Cannon-fodder at a high level ... -- Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.indiv.nluug.nl/~toon/ Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From jan.heichler at gmx.net Tue Jul 1 12:08:32 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Tue, 1 Jul 2008 21:08:32 +0200 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: <66506789.20080701210832@gmx.net> An HTML attachment was scrubbed... URL: From jan.heichler at gmx.net Tue Jul 1 12:09:08 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Tue, 1 Jul 2008 21:09:08 +0200 Subject: [Beowulf] A press release In-Reply-To: <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> Message-ID: <62974595.20080701210908@gmx.net> Hallo Dan, Dienstag, 1. Juli 2008, meintest Du: >>Hi Jon, >>We have our own stack which we stick on top of the customers favourite >>red hat clone. Usually Scientific Linux. >>Here is a bit more about it. >>http://www.clustervision.com/products_os.php >>We sell as a standalone product and it does quite well. I could even >>go so far to say that it is 'stack of choice' in many European >>institutions. DKqc> Every throught of getting a job in Sales and Marketing? :-) What makes you think that he hasn't that kind of job? ;-) @Andy: SCNR Regards Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hahn at mcmaster.ca Tue Jul 1 12:19:24 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 1 Jul 2008 15:19:24 -0400 (EDT) Subject: [Beowulf] A press release In-Reply-To: <486A6760.5010006@ias.edu> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: >>> does it necessarily have to be a redhat clone. can it also be a debian >>> based >>> clone? >> >> but why? is there some concrete advantage to using Debian? >> I've never understood why Debian users tend to be very True Believer, >> or what it is that hooks them. > > And the Debian users can say the same thing about Red Hat users. Or SUSE very nice! an excellent parody of the True Believer response. but I ask again: what are the reasons one might prefer using debian? really, I'm not criticizing it - I really would like to know why it would matter whether someone (such as ClusterVisionOS (tm)) would use debian or another distro. From matt at technoronin.com Tue Jul 1 12:30:05 2008 From: matt at technoronin.com (Matt Lawrence) Date: Tue, 1 Jul 2008 14:30:05 -0500 (CDT) Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: On Tue, 1 Jul 2008, Mark Kosmowski wrote: > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. Perhaps you should consider getting time on someone else's cluster. For something that only requires three nodes, there should be quite a number of places to run. -- Matt It's not what I know that counts. It's what I can remember in time to use. From hahn at mcmaster.ca Tue Jul 1 12:25:48 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 1 Jul 2008 15:25:48 -0400 (EDT) Subject: [Beowulf] A press release In-Reply-To: <486A6D59.7020704@scalableinformatics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: > Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. Some > distros use good kernels, some do not. I won't mention who I think is in the > latter category. I was hoping for some discussion of concrete issues. for instance, I have the impression debian uses something other than sysvinit - does that work out well? is it a problem getting commercial packages (pathscale/pgi/intel compilers, gaussian, etc) to run? the couple debian people I know tend to have more ideological motives (which I do NOT impugn, except that I am personally more swayed by practical, concrete reasons.) From landman at scalableinformatics.com Tue Jul 1 12:53:23 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 01 Jul 2008 15:53:23 -0400 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <486A8B33.7020600@scalableinformatics.com> Mark Hahn wrote: >> Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. >> Some distros use good kernels, some do not. I won't mention who I >> think is in the latter category. > > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - does > that work out well? is it a problem getting commercial packages > (pathscale/pgi/intel compilers, gaussian, etc) to run? Hi Mark: We have multiple Ubuntu servers up, and thus far, no major problems ... just a few "translational" gotchas. We have successfully run pgi, intel, gaussian, gamess, ... on our Ubuntu units as well as our RHEL/Centos, Fedora, ... > > the couple debian people I know tend to have more ideological motives Yeah ... can't escape this. I like some of the elements of Ubuntu/Debian better than I do RHEL (the network configuration in Debian is IMO sane, while in RHEL/Centos/SuSE it is not). There are some aspects that are worse (no /etc/profile.d ... so I add that back in by hand ). > (which I do NOT impugn, except that I am personally more swayed by > practical, concrete reasons.) Building and deploying updated/correct kernels with Ubuntu/Debian is far easier (the build is much easier/saner) than with SuSE, RHEL, ... From a pragmatic view, this is what why we have a slight preference for that. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From lindahl at pbm.com Tue Jul 1 13:01:23 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 1 Jul 2008 13:01:23 -0700 Subject: [Beowulf] open mosix alternative In-Reply-To: <486A722A.3000405@scalableinformatics.com> References: <486A722A.3000405@scalableinformatics.com> Message-ID: <20080701200122.GA23583@bx9.net> On Tue, Jul 01, 2008 at 02:06:34PM -0400, Joe Landman wrote: > If you prefer more of an SMP model (simpler programming), you should > look at ScaleMP DSMs. Some on this list argue the shared memory > programming is not easier than distributed memory programming, Gee, and I thought the biggest argument about ScaleMP was that the previous 50 times the same thing was attempted, it had low performance. I'd love to see some benchmarks (other than Stream). So if you do look at it, please share. -- greg From asabigue at fing.edu.uy Tue Jul 1 13:14:26 2008 From: asabigue at fing.edu.uy (ariel sabiguero yawelak) Date: Tue, 01 Jul 2008 17:14:26 -0300 Subject: [Beowulf] Re: Beowulf Digest, Vol 53, Issue 1 In-Reply-To: References: <200807010728.m617S3Ub011226@bluewest.scyld.com> Message-ID: <486A9022.5070109@fing.edu.uy> Well Mark, don't give up! I am not sure which one is your application domain, but if you require 24x7 computation, then you should not be hosting that at home. On the other hand, if you are not doing real computation and you just have a testbed at home, maybe for debugging your parallel applications or something similar, you might be interested in a virtualized solution. Several years ago, I used to "debug" some neural networks at home, but training sessions (up to two weeks of training) happened at the university. I would suggest to do something like that. You can always scale-down your problem in several phases and save the complete data-set / problem for THE RUN. You are not being a heretic there, but suffering energy costs ;-) In more places that you may believe, useful computing nodes are being replaced just because of energy costs. Even in some application domains you can even loose computational power if you move from 4 nodes into a single quad-core (i.e. memory bandwidth problems). I know it is very nice to be able to do everything at home.. but maybe before dropping your studies or working overtime to pay the electricity bill, you might want to reconsider the fact of collapsing your phisical deploy into a single virtualized cluster. (or just dispatch several threads/processes in a single system). If you collapse into a single system you have only 1 mainboard, one HDD, one power source, one processor (physically speaking), .... and you can achieve almost the performance of 4 systems in one, consuming the power of.... well maybe even less than a single one. I don't want to go into discussions about performance gain/loose due to the variation of the hardware architecture. Invest some bucks (if you haven't done that yet) in a good power source. Efficiency of OEM unbranded power sources is realy pathetic. may be 45-50% efficiency, while a good power source might be 75-80% efficient. Use the energy for computing, not for heating your house. What I mean is that you could consider just collapsing a complete "small" cluster into single system. If your application is CPU-bound and not I/O bound, VMware Server could be an option, as it is free software (unfortunately not open, even tough some patches can be done on the drivers). I think it is not possible to publish benchmarking data about VMware, but I can tell you that in long timescales, the performance you get in the host OS is similar than the one of the guest OS. There are a lot of problems related to jitter, from crazy clocks to delays, but if your application is not sensitive to that, then you are Ok. Maybe this is not a solution, but you can provide more information regarding your problem before quitting... my 2 cents.... ariel Mark Kosmowski escribi?: > At some point there a cost-benefit analysis needs to be performed. If > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > single-core land still and do not yet differentiate between CPU and > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > resources and would be better off buying new machines and physically > transferring the RAM to and from them or running more jobs each > distributed across fewer CPUs. Or saving on my electricity bill and > powering down some nodes. > > As heretical as this last sounds, I'm tempted to throw in the towel on > my PhD studies because I can no longer afford the power to run my > three node cluster at home. Energy costs may end up being the straw > that breaks this camel's back. > > Mark E. Kosmowski > > >> From: "Jon Aquilina" >> > > >> not sure if this applies to all kinds of senarios that clusters are used in >> but isnt the more ram you have the better? >> >> On 6/30/08, Vincent Diepeveen wrote: >> >>> Toon, >>> >>> Can you drop a line on how important RAM is for weather forecasting in >>> latest type of calculations you're performing? >>> >>> Thanks, >>> Vincent >>> >>> >>> On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: >>> >>> Jim Lux wrote: >>> >>>> Yep. And for good reason. Even a big DoD job is still tiny in Nvidia's >>>> >>>>> scale of operations. We face this all the time with NASA work. >>>>> Semiconductor manufacturers have no real reason to produce special purpose >>>>> or customized versions of their products for space use, because they can >>>>> sell all they can make to the consumer market. More than once, I've had a >>>>> phone call along the lines of this: >>>>> "Jim: I'm interested in your new ABC321 part." >>>>> "Rep: Great. I'll just send the NDA over and we can talk about it." >>>>> "Jim: Great, you have my email and my fax # is..." >>>>> "Rep: By the way, what sort of volume are you going to be using?" >>>>> "Jim: Oh, 10-12.." >>>>> "Rep: thousand per week, excellent..." >>>>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe every >>>>> year." >>>>> "Rep: Oh..." >>>>> {Well, to be fair, it's not that bad, they don't hang up on you.. >>>>> >>>>> >>>> Since about a year, it's been clear to me that weather forecasting (i.e., >>>> running a more or less sophisticated atmospheric model to provide weather >>>> predictions) is going to be "mainstream" in the sense that every business >>>> that needs such forecasts for its operations can simply run them in-house. >>>> >>>> Case in point: I bought a $1100 HP box (the obvious target group being >>>> teenage downloaders) which performs the HIRLAM limited area model *on the >>>> grid that we used until October 2006* in December last year. >>>> >>>> It's about twice as slow as our then-operational 50-CPU Sun Fire 15K. >>>> >>>> I wonder what effect this will have on CPU developments ... >>>> >>>> -- >>>> Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 >>>> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >>>> At home: http://moene.indiv.nluug.nl/~toon/ >>>> Progress of GNU Fortran: http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html >>>> >>>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> >> -- >> Jonathan Aquilina >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From perry at piermont.com Tue Jul 1 13:21:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 16:21:55 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <20080701164747.GA15901@gretchen.aei.uni-hannover.de> (Henning Fehrmann's message of "Tue\, 1 Jul 2008 18\:47\:47 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> Message-ID: <87fxqtuzh8.fsf@snark.cb.piermont.com> Henning Fehrmann writes: >> Thus, your problem sounds rather odd. There is no obvious reason you >> should be limited to 360 connections. >> >> Perhaps your problem is not what you think it is at all. Could you >> explain it in more detail? > > I guess it has also something to do with the automounter. I am not able > to increase this number. > But even if the automounter would handle more we need to be able to > use higher ports: > netstat shows always ports below 1024. > > tcp 0 0 client:941 server:nfs > > We need to mount up to 1400 nfs exports. All NFS clients are connecting to a single port, not to a different port for every NFS export. You do not need 1400 listening TCP ports on a server to export 1400 different file systems. Only one port is needed, whether you are exporting one file system or one million, just as only one SMTP port is needed whether you are receiving mail from one client or from one million. The clients are connecting from ports below 1024 because Berkeley set up a hack in the original BSD stack so that only root could open ports below 1024. This way, you could "know" the process on the remote host was a root process, thus you could feel "secure" [sic]. It doesn't add any real security any more, but it is also not the cause of any problem you are experiencing. We can help you figure this out, but you will have to give a lot more detail about the problem. Please describe your network setup. How many servers do you have? How many clients? How many file systems are those servers exporting? How many is a typical client mounting, and why? Start there and we can try to move forward. -- Perry E. Metzger perry at piermont.com From landman at scalableinformatics.com Tue Jul 1 13:24:04 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 01 Jul 2008 16:24:04 -0400 Subject: [Beowulf] open mosix alternative In-Reply-To: <20080701200122.GA23583@bx9.net> References: <486A722A.3000405@scalableinformatics.com> <20080701200122.GA23583@bx9.net> Message-ID: <486A9264.5090902@scalableinformatics.com> Greg Lindahl wrote: > On Tue, Jul 01, 2008 at 02:06:34PM -0400, Joe Landman wrote: > >> If you prefer more of an SMP model (simpler programming), you should >> look at ScaleMP DSMs. Some on this list argue the shared memory >> programming is not easier than distributed memory programming, > > Gee, and I thought the biggest argument about ScaleMP was that the > previous 50 times the same thing was attempted, it had low > performance. The researchy DSMs had low performance. That is known. This one seems not to be bad over good IB nets. You always have latency. Can't escape that. > I'd love to see some benchmarks (other than Stream). So if you do look > at it, please share. If you are serious about this, I'll bug Shai as to what is shareable. He does have benchmarks. The ones I have seen (real applications, not microbenchmarks), looked pretty good. Which is why we are looking at them. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From andrew at moonet.co.uk Tue Jul 1 13:35:29 2008 From: andrew at moonet.co.uk (andrew holway) Date: Tue, 1 Jul 2008 22:35:29 +0200 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <66506789.20080701210832@gmx.net> Message-ID: > does it necessarily have to be a redhat clone. can it also be a debian based > clone? Not at all, If there were demand or a customer with enough cash to throw at the job then we would of course accommodate his every need. Considering that it is taking several rather expensive developers quite a long time to push out the latest incarnation, ClusterVisionOS 4 through beta this cost could be considerable to ensure a stable environment. I'm no expert in the subtleties of distributions but maintaining and supporting one to a high enough standard is quite enough work thanks very much :) Ta Andy From lindahl at pbm.com Tue Jul 1 13:37:14 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 1 Jul 2008 13:37:14 -0700 Subject: [Beowulf] open mosix alternative In-Reply-To: <486A9264.5090902@scalableinformatics.com> References: <486A722A.3000405@scalableinformatics.com> <20080701200122.GA23583@bx9.net> <486A9264.5090902@scalableinformatics.com> Message-ID: <20080701203713.GB28024@bx9.net> On Tue, Jul 01, 2008 at 04:24:04PM -0400, Joe Landman wrote: > If you are serious about this, I'll bug Shai as to what is shareable. He > does have benchmarks. The ones I have seen (real applications, not > microbenchmarks), looked pretty good. Which is why we are looking at > them. If you look back on this mailing list, you'll see that I asked him for benchmarks, and he posted stream. Which isn't interesting, because it's embarrassingly parallel. -- greg From hahn at mcmaster.ca Tue Jul 1 13:44:05 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 1 Jul 2008 16:44:05 -0400 (EDT) Subject: [Beowulf] open mosix alternative In-Reply-To: <20080701200122.GA23583@bx9.net> References: <486A722A.3000405@scalableinformatics.com> <20080701200122.GA23583@bx9.net> Message-ID: > I'd love to see some benchmarks (other than Stream). So if you do look > at it, please share. me too. in particular, I'd like to see "hot page" performance - where a multithreaded program bangs on a heavily write-shared page. From gerry.creager at tamu.edu Tue Jul 1 13:57:08 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue, 01 Jul 2008 15:57:08 -0500 Subject: Commodity supercomputing, was: Re: NDAs Re: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point? In-Reply-To: <486A9822.7000902@moene.indiv.nluug.nl> References: <1210016466.4924.1.camel@Vigor13> <48551E70.7070507@scalableinformatics.com> <4AF41375-3A13-4691-A2A1-D5B853FEC3A4@xs4all.nl> <20080615154227.u8fwdpn08ww4c40k@webmail.jpl.nasa.gov> <6.2.5.6.2.20080616084554.02e4dd18@jpl.nasa.gov> <486923D6.8070907@moene.indiv.nluug.nl> <1214864562.6912.29.camel@Vigor13> <486A1F7B.9080408@tamu.edu> <486A9822.7000902@moene.indiv.nluug.nl> Message-ID: <486A9A24.9000800@tamu.edu> I was at the WRF conf. last week. A colleague from the Netherlands was lamenting that he couldn't get ECMWF data (I don't recall the annual cost/year but it was huge). NOAA/NCEP GFS data are available via FTP and regular enough to allow really simple scripting, as well as other methods. I don't understand why folks wouldn't use these data. As for competing, if our companies are not sufficiently technically astute, should we be protecting them from European companies, just because the data are free? Toon Moene wrote: > Gerry Creager wrote: > >> In the US, at least for academic institutions and hobbyists, surface >> and upper air observations of the sort you describe are generally >> available for incorporation into models for data assimilation. Models >> are generally forced and bounded using model data from other >> atmospheric models, also available. As I understand it from >> colleagues in Europe, getting similar data over there is more >> problemmatical. > > Exactly ! And what happens in Europe is that companies take the freely > available US data, use it to compete with US companies, and disregard > the (meteorological superior) ECMWF data, because it is not free. > > A colleague of mine held some very unpopular talks in Reading, England, > about this (according to his figures, 99 % of the meteorological data > used in Europe originates from the US). > -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From eagles051387 at gmail.com Tue Jul 1 15:53:34 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Wed, 2 Jul 2008 00:53:34 +0200 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <486A4296.4050501@hope.edu> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> Message-ID: my idea is more of for my thesis. if i am goign ot do anything like this. vernard thanks for the link. whats it like in a cluster environment? On Tue, Jul 1, 2008 at 4:43 PM, Paul Van Allsburg wrote: > I'd like to do the same, as a project for a group of students... Please > keep me in the loop? > Thanks! > Paul > > -- > Paul Van Allsburg Computational Science & Modeling Facilitator > Natural Sciences Division, Hope College > 35 East 12th Street > Holland, Michigan 49423 > 616-395-7292 http://www.hope.edu/academic/csm/ > > > Jon Aquilina wrote: > >> that would be greatly appreciated >> >> On 7/1/08, *Geoff Galitz* > >> wrote: >> >> >> That is out of my field of expertise. Sounds like a question for >> professional digital artists. I can put you in touch some folks >> that most likely know the answer to your questions, if you like. >> >> >> Anybody know of any current approaches to this? >> >> >> Geoff Galitz >> Blankenheim NRW, Deutschland >> http://www.galitz.org >> >> * From: * Jon Aquilina [mailto:eagles051387 at gmail.com >> ] >> *Sent:* Dienstag, 1. Juli 2008 13:27 >> *To:* Geoff Galitz >> *Cc:* Beowulf Mailing List >> *Subject:* Re: [Beowulf] software for compatible with a cluster >> >> >> reason i am asking is because i would like to setup a rendering >> cluster and provide rendering services. does this also work for 3d >> animated movies that require rendering or does one need somethin >> entierly different for that? >> >> On 7/1/08, *Geoff Galitz* > > wrote: >> >> >> >> I know people who use Houdini for this: >> >> >> http://www.sidefx.com/index.php >> >> >> I cannot vouch for how well it works or what is involved, though. >> >> >> >> Geoff Galitz >> Blankenheim NRW, Deutschland >> http://www.galitz.org >> >> * From: * beowulf-bounces at beowulf.org >> >> [mailto:beowulf-bounces at beowulf.org >> ] *On Behalf Of *Jon Aquilina >> *Sent:* Dienstag, 1. Juli 2008 12:40 >> *To:* Beowulf Mailing List >> *Subject:* [Beowulf] software for compatible with a cluster >> >> >> does anyone know of any rendering software that will work with a >> cluster? >> >> -- Jonathan Aquilina >> >> >> >> >> -- Jonathan Aquilina >> >> >> >> >> -- >> Jonathan Aquilina >> > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From perry at piermont.com Tue Jul 1 16:23:10 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 19:23:10 -0400 Subject: [Beowulf] A press release In-Reply-To: <486A6760.5010006@ias.edu> (Prentice Bisbal's message of "Tue\, 01 Jul 2008 13\:20\:32 -0400") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <87bq1hgpep.fsf@snark.cb.piermont.com> Prentice Bisbal writes: >>> does it necessarily have to be a redhat clone. can it also be a debian >>> based >>> clone? >> >> but why? is there some concrete advantage to using Debian? >> I've never understood why Debian users tend to be very True Believer, >> or what it is that hooks them. > > And the Debian users can say the same thing about Red Hat users. Or SUSE > users. And if any still exist, the Slackware users could say the same > thing about the both of them. But then the Slackware users could also > point out that the first Linux distro was Slackware, so they are using > the one true Linux distro... Precisely. It pays to allow people to use what they want. Fewer religious battles that way. Whether one distro or another has an advantage isn't the point -- people have their own tastes and it doesn't pay to tell them "no" without good reason. Perry -- Perry E. Metzger perry at piermont.com From perry at piermont.com Tue Jul 1 16:25:19 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 19:25:19 -0400 Subject: [Beowulf] A press release In-Reply-To: (Mark Hahn's message of "Tue\, 1 Jul 2008 15\:25\:48 -0400 $EDT$") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <877ic5gpb4.fsf@snark.cb.piermont.com> Mark Hahn writes: > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - > does that work out well? is it a problem getting commercial packages > (pathscale/pgi/intel compilers, gaussian, etc) to run? It is trivial to port init scripts between different init systems. They're just short shell scripts, they're utterly readable, and any sysadmin worth their salt can make the needed changes in a few minutes. If you have a large cluster, you need such a person anyway. Perry -- Perry E. Metzger perry at piermont.com From perry at piermont.com Tue Jul 1 16:31:50 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 19:31:50 -0400 Subject: [Beowulf] A press release In-Reply-To: (Mark Hahn's message of "Tue\, 1 Jul 2008 15\:19\:24 -0400 $EDT$") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <873amtgp09.fsf@snark.cb.piermont.com> Mark Hahn writes: >>> but why? is there some concrete advantage to using Debian? >>> I've never understood why Debian users tend to be very True Believer, >>> or what it is that hooks them. >> >> And the Debian users can say the same thing about Red Hat users. Or SUSE > > very nice! an excellent parody of the True Believer response. Actually, he was just being reasonable. > but I ask again: what are the reasons one might prefer using debian? > really, I'm not criticizing it - I really would like to know why it > would matter whether someone (such as ClusterVisionOS (tm)) would use > debian or another distro. Often it is just a question of what the people using the system are used to. I often prefer using BSD systems, largely because of certain technical advantages, but also to a great extent because my first big Unix boxes were Vaxes running 4.2BSD in the early 1980s and after 25 years with the same flavor of Unix you get used to the way things are done. It is much the same reason I use Emacs instead of vi -- I started using Emacs on Tops-20 decades ago and I'm too used to it now. If you told me I "have" to use vi, things would get ugly, even though I don't think there is anything wrong with using vi per se. Perry -- Perry E. Metzger perry at piermont.com From perry at piermont.com Tue Jul 1 16:34:17 2008 From: perry at piermont.com (Perry E. Metzger) Date: Tue, 01 Jul 2008 19:34:17 -0400 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: (Jon Aquilina's message of "Wed\, 2 Jul 2008 00\:53\:34 +0200") References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> Message-ID: <87y74lfabq.fsf@snark.cb.piermont.com> "Jon Aquilina" writes: > my idea is more of for my thesis. If you're trying to do 3d animation on the cheap and you want something that's already cluster capable, I'd try Blender. It is open source and it has already made some reasonable length movies. Not being an animation type, I know nothing about how nice it is compared to commercial products, but it is hard to beat the price. Perry -- Perry E. Metzger perry at piermont.com From hahn at mcmaster.ca Tue Jul 1 22:06:43 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 2 Jul 2008 01:06:43 -0400 (EDT) Subject: [Beowulf] A press release In-Reply-To: References: Message-ID: >> I was hoping for some discussion of concrete issues. for instance, >> I have the impression debian uses something other than sysvinit - >> does that work out well? >> > Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, ... thanks. I guess I was assuming that mainstream debian was like ubuntu. >> is it a problem getting commercial >> packages (pathscale/pgi/intel compilers, gaussian, etc) to run? >> > I?ve never had any major problems. Most linux vendors supply both RPM?s and > .tar.gz installers, and I generally have better luck with the latter, even > on RPM based systems anyway. interesting - I wonder why. the main difference would be that the rpm format encodes dependencies... >> the couple debian people I know tend to have more ideological motives >> (which I do NOT impugn, except that I am personally more swayed by >> practical, concrete reasons.) >> > My ?conversion? to use of Debian had little to do with ideological motives, > and a lot more to do with minimizing the amount of time I had to take away > from my research to support the Linux clusters I was maintaining at the > time. again interesting, thanks. what sorts of things in rpm-based distros consumed your time? > Side note, one very nice thing about debian is the ability to upgrade a > system in-place from one O/S release to another via > > apt-get dist-upgrade > > Much nicer than reinstalling the O/S as seems to be (used to be?) the norm > with RPM-based systems I've done major version upgrades using rpm, admittedly in the pre-fedora days. it _is_ a nice capability - I'm a little surprised desktop-oriented distros don't emphasize it... From tjrc at sanger.ac.uk Tue Jul 1 22:37:19 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed, 2 Jul 2008 06:37:19 +0100 Subject: [Beowulf] A press release In-Reply-To: <486A8B33.7020600@scalableinformatics.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> <486A8B33.7020600@scalableinformatics.com> Message-ID: <86A30BE3-3B8E-47C8-8286-D2D7E2C74A40@sanger.ac.uk> On 1 Jul 2008, at 8:53 pm, Joe Landman wrote: >> the couple debian people I know tend to have more ideological motives > > Yeah ... can't escape this. Indeed. Ubuntu is slightly more pragmatic than Debian, as far as the ideological stuff goes. > I like some of the elements of Ubuntu/Debian better than I do RHEL > (the network configuration in Debian is IMO sane, while in RHEL/ > Centos/SuSE it is not). There are some aspects that are worse (no / > etc/profile.d ... so I add that back in by hand ). Here, our clusters all run Debian, but we also have RHAS and SLES around when support matrices demand it (Oracle, mainly). I'd agree that fundamentally it's a case of what you're used to. We stopped using Red Hat widely about four years ago, and the reasons (which are probably not valid any more) were: 1) Not all userland programs were 64-bit file aware. 2) There were certain features which we just couldn't get to work properly on RHAS - a prime example being multipath SAN access. It "just worked" on Debian. 3) Smooth upgrades from one major release to the next without having to reinstall. While this is probably not important for beowulf nodes, it is for more complex servers. I still prefer Debian's package management system, but that's probably because I'm used to it, rather than it inherently being superior. yast2 can do pretty much everything that aptitude does, although I think aptitude is more amenable to automation through cfengine and the like. There are some very powerful little parts of the packaging system, like dpkg-divert, which allows you to replace a file from a package with your own, in such a way that it will not be overwritten the next time the package is upgraded. For those of us that need to customise our systems that sort of thing is very useful, and saves a lot of work down the line. >> (which I do NOT impugn, except that I am personally more swayed by >> practical, concrete reasons.) > > > Building and deploying updated/correct kernels with Ubuntu/Debian is > far easier (the build is much easier/saner) than with SuSE, > RHEL, ... From a pragmatic view, this is what why we have a slight > preference for that. I'd agree with that. Using make-kpkg to build a custom kernel .deb which you can then easily deploy to all your machines is a real boon. At the end of the day, people should use what they're comfortable with. I don't necessarily buy the support argument; there are some companies (Platform, for example) who will support you whichever distro you use; all they care about is what kernel version and C library version you're running. I like this attitude and I wish it was more widespread amongst proprietary software vendors. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From eagles051387 at gmail.com Tue Jul 1 22:37:21 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Wed, 2 Jul 2008 07:37:21 +0200 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <87y74lfabq.fsf@snark.cb.piermont.com> References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> Message-ID: if i use blender how nicely does it work in a cluster? On Wed, Jul 2, 2008 at 1:34 AM, Perry E. Metzger wrote: > > "Jon Aquilina" writes: > > my idea is more of for my thesis. > > If you're trying to do 3d animation on the cheap and you want > something that's already cluster capable, I'd try Blender. It is open > source and it has already made some reasonable length movies. Not > being an animation type, I know nothing about how nice it is compared > to commercial products, but it is hard to beat the price. > > Perry > -- > Perry E. Metzger perry at piermont.com > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsten.aulbert at aei.mpg.de Wed Jul 2 00:26:58 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Wed, 02 Jul 2008 09:26:58 +0200 Subject: [Beowulf] automount on high ports In-Reply-To: <87fxqtuzh8.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> Message-ID: <486B2DC2.9010604@aei.mpg.de> Hi Perry, Perry E. Metzger wrote: > > All NFS clients are connecting to a single port, not to a different > port for every NFS export. You do not need 1400 listening TCP ports on > a server to export 1400 different file systems. Only one port is > needed, whether you are exporting one file system or one million, just > as only one SMTP port is needed whether you are receiving mail from > one client or from one million. > That's clear and not the problem > The clients are connecting from ports below 1024 because Berkeley set > up a hack in the original BSD stack so that only root could open ports > below 1024. This way, you could "know" the process on the remote host > was a root process, thus you could feel "secure" [sic]. It doesn't add > any real security any more, but it is also not the cause of any > problem you are experiencing. We might run out of "secure" ports. > We can help you figure this out, but you will have to give a lot more > detail about the problem. Please describe your network setup. How many > servers do you have? How many clients? How many file systems are those > servers exporting? How many is a typical client mounting, and why? > Start there and we can try to move forward. > OK, we have 1342 nodes which act as servers as well as clients. Every node exports a single local directory and all other nodes can mount this. What we do now to optimize the available bandwidth and IOs is spread millions of files according to a hash algorithm to all nodes (multiple copies as well) and then run a few 1000 jobs opening one file from one box then one file from the other box and so on. With a short autofs timeout that ought to work. Typically it is possible that a single process opens about 10-15 files per second, i.e. making 10-15 mounts per second. With 4 parallel process per node that's 40-60 mounts/second. With a timeout of 5 seconds we should roughly have 200-300 concurrent mounts (on average, no idea abut the variance). Our tests so far have shown that sometimes a node keeps a few mounts open (autofs4 problems AFAIK) and at some point is not able to mount more shares. Usually this occurs at about 350 mounts and we are not yet 100% sure if we are running out of secure ports. All our boxes export now with "insecure" option (NFSv3), but our clients all connect from a "secure" port, anyone here who might give us a hint how to force this in Linux? Thanks a lot Carsten From tjrc at sanger.ac.uk Wed Jul 2 01:19:50 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed, 2 Jul 2008 09:19:50 +0100 Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote: > OK, we have 1342 nodes which act as servers as well as clients. Every > node exports a single local directory and all other nodes can mount > this. > > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) and then run a few 1000 jobs opening one file from one > box then one file from the other box and so on. With a short autofs > timeout that ought to work. Typically it is possible that a single > process opens about 10-15 files per second, i.e. making 10-15 mounts > per > second. With 4 parallel process per node that's 40-60 mounts/second. > With a timeout of 5 seconds we should roughly have 200-300 concurrent > mounts (on average, no idea abut the variance). Please tell me you're not serious! The overheads of just performing the NFS mounts are going to kill you, never mind all the network traffic going all over the place. Since you've distributed the files to the local disks of the nodes, surely the right way to perform this work is to schedule the computations so that each node works on the data on its own local disk, and doesn't have to talk networked storage at all? Or don't you know in advance which files a particular job is going to need? Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From henning.fehrmann at aei.mpg.de Wed Jul 2 01:44:58 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Wed, 2 Jul 2008 10:44:58 +0200 Subject: [Beowulf] automount on high ports In-Reply-To: <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> Message-ID: <20080702084458.GA12879@gretchen.aei.uni-hannover.de> On Wed, Jul 02, 2008 at 09:19:50AM +0100, Tim Cutts wrote: > > On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote: > > >OK, we have 1342 nodes which act as servers as well as clients. Every > >node exports a single local directory and all other nodes can mount this. > > > >What we do now to optimize the available bandwidth and IOs is spread > >millions of files according to a hash algorithm to all nodes (multiple > >copies as well) and then run a few 1000 jobs opening one file from one > >box then one file from the other box and so on. With a short autofs > >timeout that ought to work. Typically it is possible that a single > >process opens about 10-15 files per second, i.e. making 10-15 mounts per > >second. With 4 parallel process per node that's 40-60 mounts/second. > >With a timeout of 5 seconds we should roughly have 200-300 concurrent > >mounts (on average, no idea abut the variance). > > Please tell me you're not serious! The overheads of just performing the NFS mounts are going to kill you, never mind all the network traffic going > all over the place. > > Since you've distributed the files to the local disks of the nodes, surely the right way to perform this work is to schedule the computations so that > each node works on the data on its own local disk, and doesn't have to talk networked storage at all? Or don't you know in advance which files a > particular job is going to need? Yes, this is the problem. The amount of files is too big to store it everywhere (few TByte and 50 million files). Mounting a view NFS server does not provide the bandwidth. On the other hand, the coreswitch should be able to handle the flows non blocking. We think that nfs mounts are the fastest possibility to distribute the demanded files to the nodes. Henning From tjrc at sanger.ac.uk Wed Jul 2 01:45:21 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed, 2 Jul 2008 09:45:21 +0100 Subject: [Beowulf] A press release In-Reply-To: References: Message-ID: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> On 2 Jul 2008, at 6:06 am, Mark Hahn wrote: >>> I was hoping for some discussion of concrete issues. for instance, >>> I have the impression debian uses something other than sysvinit - >>> does that work out well? >>> >> Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/ >> rc0.d, ... > > thanks. I guess I was assuming that mainstream debian was like > ubuntu. It's sort of the other way around. Remember that Ubuntu is based off a six-monthly snapshot of Debian's testing track, which is why Hardy looks a lot more like the upcoming Debian Lenny than it does like Debian Etch. > interesting - I wonder why. the main difference would be that the > rpm format encodes dependencies... The difficulty is that many ISVs tend to do a fairly terrible job of packaging their applications as RPM's or DEB's, for example creating init scripts which don't obey the distribution's policies, or making willy-nilly modifications to configuration files all over the place, even in other packages (which in the Debian world is a *big* no-no, that's why many Debian/Ubuntu packages have now moved to the conf.d type of configuration directory, so that other packages can drop in little independent snippets of configuration) I have seen, for example, .deb packages from a Large Company With Which We Are All Familiar which essentially attempted to convert your system into a Red Hat system by moving all your init scripts around and whatnot, so once you'd installed this abomination, you'd totally wrecked the ability of many of the main distro packages to be updated ever again. Oh, and of course uninstalling the package didn't put anything back the way it had been before. Like you, I tend to use tarballs if they are available, and if I want to turn them into packages I do it myself, and make sure they are policy compliant for the distro. So this, while not a statement in favour of either flavour of distro, is definitely a warning to be very wary of what packages that have come from sources other than the distro itself might do (which of course, you'd be wary of anyway for security reasons). Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ajt at rri.sari.ac.uk Wed Jul 2 02:23:06 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Wed, 02 Jul 2008 10:23:06 +0100 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <486B48FA.7080403@rri.sari.ac.uk> Mark Hahn wrote: >[...] > but I ask again: what are the reasons one might prefer using debian? > really, I'm not criticizing it - I really would like to know why it > would matter whether someone (such as ClusterVisionOS (tm)) would use > debian or another distro. Hello, Mark. I've been on a well trodden path from trying out the 'free' version of Scyld under RH6.2, then using openMosix under all versions of RH up to RH9, Fedora up to core2, then Debian Sarge and now Ubuntu 6.06.1 LTS with an upgrade to 8.04.1 LTS imminent. As I see it, this has been a developmental journey and also a learning experiencefor me. As others on this thread have admitted, I'm not blind to the ideological objectives of Debian. However, I'm now using a very good commerically supported version of Linux with the what is widely acknowledged to be the largest user and developer community. It's my own experience of trying to do my work under RH/Fedora that's put me off these distro's and I see a BIG divide between 'real' HPC communities using BIG iron, and small Beowulf clusters like mine. I've got to admit that Tim Cutts did influence my decision to try out Debian (thanks, Tim!). I also use the (UK) NERC's Bio-Linux binary deb's and I was also influenced by their decision to change from RH to Debian for Bio-Linux. I can see that other communities use RH for similar reasons, though I should mention that our Beowulf spends a lot of time running quantum chemistry simulations (GAMESS etc.). I've pout up an Ubuntu blue-print for 'biobuntu', which consolidates the work I'm doing on several projects: https://blueprints.launchpad.net/ubuntu/+spec/biobuntu I am, of course, familiar with 'other' Biolinuxen and rpm repositories of bioinformatics software: http://en.wikipedia.org/wiki/BioLinux Having tried out many of these alternatives, I remain convinced that NEBC's Bio-Linux is most appropriate for my work. In particular, the level of support in the form of documentation and training courses provided by NEBC is very good. This means I don't have to reinvent the wheel - Always a good point for any Beowulf-related activity :-) Tony. -- Dr. A.J.Travis, | mailto:ajt at rri.sari.ac.uk Rowett Research Institute, | http://www.rri.sari.ac.uk/~ajt Greenburn Road, Bucksburn, | phone:+44 (0)1224 712751 Aberdeen AB21 9SB, Scotland, UK. | fax:+44 (0)1224 716687 From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 02:35:57 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 2 Jul 2008 11:35:57 +0200 (CEST) Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Carsten Aulbert wrote: > OK, we have 1342 nodes which act as servers as well as clients. Every > node exports a single local directory and all other nodes can mount this. Have you considered using a parallel file system ? > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) There have been many talks of improving performance by paying attention to the data locality on this very list. Are you not able to move the code to where the data is or move the data to where the code is ? F.e. using a simple TCP connection (nc, rsh, rsync or even http) to transfer the file to the local disk before using it is probably more efficient than the way you use NFS is you deal with small files (as they have to be written to some local storage). The setup and tear-down costs of the NFS connection (automounter, mount, unmount) simply doesn't exist in this case; the transfer of data on the wire happens the same way. Or you could even get around the limitation of storing it locally by using a ramdisk to temporarily store the files (if you have the free memory...) - from what I understand they are read then used immediately and not needed again in a short time frame so it makes no sense to store them for longer, a perfect application for a tmpfs. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu at iwr.uni-heidelberg.de From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 02:59:47 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 2 Jul 2008 11:59:47 +0200 (CEST) Subject: [Beowulf] A press release In-Reply-To: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: On Wed, 2 Jul 2008, Tim Cutts wrote: > The difficulty is that many ISVs tend to do a fairly terrible job of > packaging their applications as RPM's or DEB's I very much agree with this. While you mentioned init scripts that don't fit the distribution, I can add init scripts that are totally missing when they should be provided - a hand-made init script would not be part of the installed package and could fail in various ways if the package is updated or... uninstalled. > Like you, I tend to use tarballs if they are available, and if I > want to turn them into packages I do it myself, and make sure they > are policy compliant for the distro. I think that's actually more important than the distribution per-se. If you are able to package something to fit the distribution (f.e. to install a missing kernel module, add an important software package, etc.) you can more efficiently use your time later on as packaging (done properly) is normally a one-time effort. This goes into the direction that the admin should use the distribution, not fight it! -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu at iwr.uni-heidelberg.de From eagles051387 at gmail.com Wed Jul 2 04:16:43 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Wed, 2 Jul 2008 13:16:43 +0200 Subject: [Beowulf] A press release In-Reply-To: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: one thing must not be forgotten though. in regards to pkging stuff for the ubuntu variation once someone like you and me you upload it for someone higher up on the chain to check and upload to the servers. so basically someone is checking what someone else has packaged. On 7/2/08, Tim Cutts wrote: > > > On 2 Jul 2008, at 6:06 am, Mark Hahn wrote: > > I was hoping for some discussion of concrete issues. for instance, >>>> I have the impression debian uses something other than sysvinit - >>>> does that work out well? >>>> >>>> Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, >>> ... >>> >> >> thanks. I guess I was assuming that mainstream debian was like ubuntu. >> > > It's sort of the other way around. Remember that Ubuntu is based off a > six-monthly snapshot of Debian's testing track, which is why Hardy looks a > lot more like the upcoming Debian Lenny than it does like Debian Etch. > > interesting - I wonder why. the main difference would be that the rpm >> format encodes dependencies... >> > > The difficulty is that many ISVs tend to do a fairly terrible job of > packaging their applications as RPM's or DEB's, for example creating init > scripts which don't obey the distribution's policies, or making willy-nilly > modifications to configuration files all over the place, even in other > packages (which in the Debian world is a *big* no-no, that's why many > Debian/Ubuntu packages have now moved to the conf.d type of configuration > directory, so that other packages can drop in little independent snippets of > configuration) > > I have seen, for example, .deb packages from a Large Company With Which We > Are All Familiar which essentially attempted to convert your system into a > Red Hat system by moving all your init scripts around and whatnot, so once > you'd installed this abomination, you'd totally wrecked the ability of many > of the main distro packages to be updated ever again. Oh, and of course > uninstalling the package didn't put anything back the way it had been > before. > > Like you, I tend to use tarballs if they are available, and if I want to > turn them into packages I do it myself, and make sure they are policy > compliant for the distro. > > So this, while not a statement in favour of either flavour of distro, is > definitely a warning to be very wary of what packages that have come from > sources other than the distro itself might do (which of course, you'd be > wary of anyway for security reasons). > > Tim > > > -- > The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, > a charity registered in England with number 1021457 and acompany registered > in England with number 2742969, whose registeredoffice is 215 Euston Road, > London, NW1 2BE._______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From eagles051387 at gmail.com Wed Jul 2 04:18:20 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Wed, 2 Jul 2008 13:18:20 +0200 Subject: [Beowulf] A press release In-Reply-To: References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: im also not sure what support is like in other distros but i commend the kubuntu volunteers who man that irc channel for support as well as those who help with development. are there any other distros that provide support like this? On 7/2/08, Jon Aquilina wrote: > > one thing must not be forgotten though. in regards to pkging stuff for the > ubuntu variation once someone like you and me you upload it for someone > higher up on the chain to check and upload to the servers. so basically > someone is checking what someone else has packaged. > > On 7/2/08, Tim Cutts wrote: >> >> >> On 2 Jul 2008, at 6:06 am, Mark Hahn wrote: >> >> I was hoping for some discussion of concrete issues. for instance, >>>>> I have the impression debian uses something other than sysvinit - >>>>> does that work out well? >>>>> >>>>> Debian uses standard sysvinit-style scripts in /etc/init.d, /etc/rc0.d, >>>> ... >>>> >>> >>> thanks. I guess I was assuming that mainstream debian was like ubuntu. >>> >> >> It's sort of the other way around. Remember that Ubuntu is based off a >> six-monthly snapshot of Debian's testing track, which is why Hardy looks a >> lot more like the upcoming Debian Lenny than it does like Debian Etch. >> >> interesting - I wonder why. the main difference would be that the rpm >>> format encodes dependencies... >>> >> >> The difficulty is that many ISVs tend to do a fairly terrible job of >> packaging their applications as RPM's or DEB's, for example creating init >> scripts which don't obey the distribution's policies, or making willy-nilly >> modifications to configuration files all over the place, even in other >> packages (which in the Debian world is a *big* no-no, that's why many >> Debian/Ubuntu packages have now moved to the conf.d type of configuration >> directory, so that other packages can drop in little independent snippets of >> configuration) >> >> I have seen, for example, .deb packages from a Large Company With Which We >> Are All Familiar which essentially attempted to convert your system into a >> Red Hat system by moving all your init scripts around and whatnot, so once >> you'd installed this abomination, you'd totally wrecked the ability of many >> of the main distro packages to be updated ever again. Oh, and of course >> uninstalling the package didn't put anything back the way it had been >> before. >> >> Like you, I tend to use tarballs if they are available, and if I want to >> turn them into packages I do it myself, and make sure they are policy >> compliant for the distro. >> >> So this, while not a statement in favour of either flavour of distro, is >> definitely a warning to be very wary of what packages that have come from >> sources other than the distro itself might do (which of course, you'd be >> wary of anyway for security reasons). >> >> Tim >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, >> a charity registered in England with number 1021457 and acompany registered >> in England with number 2742969, whose registeredoffice is 215 Euston Road, >> London, NW1 2BE._______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > > -- > Jonathan Aquilina -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From carsten.aulbert at aei.mpg.de Wed Jul 2 04:22:41 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Wed, 02 Jul 2008 13:22:41 +0200 Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <486B6501.5000108@aei.mpg.de> Hi Bogdan, Bogdan Costescu wrote: > > Have you considered using a parallel file system ? We looked a bit into a few, but would love to get any input from anyone on that. What we found so far was not really convincing, e.g. glusterFS at that time was not really stable, lustre was too easy to crash - at l east at that time, ... > There have been many talks of improving performance by paying attention > to the data locality on this very list. Are you not able to move the > code to where the data is or move the data to where the code is ? In principle this *should* be possible, however then this particular user (and maybe many in the future) would need to circumvent the batch system and it's usually quite a hassle to set this up correctly beforehand. > > F.e. using a simple TCP connection (nc, rsh, rsync or even http) to > transfer the file to the local disk before using it is probably more > efficient than the way you use NFS is you deal with small files (as they > have to be written to some local storage). The setup and tear-down costs > of the NFS connection (automounter, mount, unmount) simply doesn't exist > in this case; the transfer of data on the wire happens the same way. Or > you could even get around the limitation of storing it locally by using > a ramdisk to temporarily store the files (if you have the free > memory...) - from what I understand they are read then used immediately > and not needed again in a short time frame so it makes no sense to store > them for longer, a perfect application for a tmpfs. The interesting bit is: Even with the data on a remote disk the overhead is not really that much more. The files are typically less than 100k in size, even doing an rsync or nc|tar from one box to another is REALLY slow with that many small files. tmpfs et al: The jobs usually reads the data once directly form the NFS share and processes it, it's not going back to this file again (well at least not this process). So I do think NFS would not be that bad although it won't be the optimal, but it's usually the easiest for the user to use and quite generic in the approach. Of course one could devise other and much better schemes, but you have always find a good compromise between usability and man-power needed to tailor a specific scheme. Thanks! Carsten From tjrc at sanger.ac.uk Wed Jul 2 04:29:03 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Wed, 2 Jul 2008 12:29:03 +0100 Subject: [Beowulf] A press release In-Reply-To: References: <011E261F-94D7-4F1C-AA69-4A008A1DA1E2@sanger.ac.uk> Message-ID: On 2 Jul 2008, at 12:16 pm, Jon Aquilina wrote: > one thing must not be forgotten though. in regards to pkging stuff > for the > ubuntu variation once someone like you and me you upload it for > someone > higher up on the chain to check and upload to the servers. so > basically > someone is checking what someone else has packaged. For maintainers that aren't Debian Developers (or the Ubuntu equivalent), yes, that's true. In my case, I am formally a Debian Developer (have been for more than 10 years), so my GPG signature on a binary upload is considered good enough, and it's not checked further, other than for really serious failures like a failure of the package to build from source on one of the autobuilders. I do check them myself fairly thoroughly though - lintian is a very useful tool for checking that packages comply with policy. Besides, the packages I maintain for Debian are things I use heavily in my day job, so it's in my own interest to make sure they work properly! I suspect the amount of checking that goes on in the universe and multiverse parts of Ubuntu is pretty minimal - I believe the packages are basically straight rebuilds of the Debian source packages using the Ubuntu autobuilder network, so that the library dependencies are correct. Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From perry at piermont.com Wed Jul 2 04:32:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 07:32:55 -0400 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: (Jon Aquilina's message of "Wed\, 2 Jul 2008 07\:37\:21 +0200") References: <7B82EE52C0DC4879AD6AE8FCA937C63C@geoffPC> <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> Message-ID: <87wsk4ed20.fsf@snark.cb.piermont.com> "Jon Aquilina" writes: > if i use blender how nicely does it work in a cluster? I believe it works quite well. Perry From perry at piermont.com Wed Jul 2 04:50:48 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 07:50:48 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> (Tim Cutts's message of "Wed\, 2 Jul 2008 09\:19\:50 +0100") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <66EB3DC0-B281-4869-BB8E-A55E577C44FE@sanger.ac.uk> Message-ID: <87od5gec87.fsf@snark.cb.piermont.com> Tim Cutts writes: > On 2 Jul 2008, at 8:26 am, Carsten Aulbert wrote: > >> OK, we have 1342 nodes which act as servers as well as clients. Every >> node exports a single local directory and all other nodes can mount >> this. >> >> What we do now to optimize the available bandwidth and IOs is spread >> millions of files according to a hash algorithm to all nodes (multiple >> copies as well) and then run a few 1000 jobs opening one file from one >> box then one file from the other box and so on. With a short autofs >> timeout that ought to work. Typically it is possible that a single >> process opens about 10-15 files per second, i.e. making 10-15 mounts >> per >> second. With 4 parallel process per node that's 40-60 mounts/second. >> With a timeout of 5 seconds we should roughly have 200-300 concurrent >> mounts (on average, no idea abut the variance). > > Please tell me you're not serious! The overheads of just performing > the NFS mounts are going to kill you, never mind all the network > traffic going all over the place. > > Since you've distributed the files to the local disks of the nodes, > surely the right way to perform this work is to schedule the > computations so that each node works on the data on its own local > disk, and doesn't have to talk networked storage at all? Or don't you > know in advance which files a particular job is going to need? Perhaps it makes sense given their job load. Perhaps it doesn't. If they need access to far more storage than a single node can hold, it might make sense. If individual nodes need lots of I/O but only on a very rare basis, so the disk bandwidth would be unused on most nodes most of the time if they were doing everything locally, perhaps it might make sense. I'll agree that it isn't an obviously good solution to most workloads, but we don't really know what their workload is like so we can't say that this is a bad move ab initio. Perry From atchley at myri.com Wed Jul 2 05:07:27 2008 From: atchley at myri.com (Scott Atchley) Date: Wed, 2 Jul 2008 08:07:27 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <486B6501.5000108@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <486B6501.5000108@aei.mpg.de> Message-ID: <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> On Jul 2, 2008, at 7:22 AM, Carsten Aulbert wrote: > Bogdan Costescu wrote: >> >> Have you considered using a parallel file system ? > > We looked a bit into a few, but would love to get any input from > anyone > on that. What we found so far was not really convincing, e.g. > glusterFS > at that time was not really stable, lustre was too easy to crash - > at l > east at that time, ... Hi Carsten, I have not looked at GlusterFS at all. I have worked with Lustre and PVFS2 (I wrote the shims to allow them to run on MX). Although I believe Lustre's robustness is very good these days, I do not believe that it will not work in your setting. I think that they currently do not recommend mounting a client on a node that is also working as a server as you are doing with NFS. I believe it is due to memory contention leading to deadlock. PVFS2 does, however, support your scenario where each node is a server and can be mounted locally as well. PVFS2 servers run in userspace and can be easily debugged. If you are using MPI-IO, it integrates nicely as well. Even so, keep in mind that using each node as a server will consume network resources and will compete with MPI communications. Scott From perry at piermont.com Wed Jul 2 05:28:48 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 08:28:48 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> (Carsten Aulbert's message of "Wed\, 02 Jul 2008 09\:26\:58 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <87d4lweagv.fsf@snark.cb.piermont.com> Carsten Aulbert writes: >> The clients are connecting from ports below 1024 because Berkeley set >> up a hack in the original BSD stack so that only root could open ports >> below 1024. This way, you could "know" the process on the remote host >> was a root process, thus you could feel "secure" [sic]. It doesn't add >> any real security any more, but it is also not the cause of any >> problem you are experiencing. > > We might run out of "secure" ports. A given client would need to be forming over 1000 connections to a given server NFS port for that to be a problem. This is not going to happen. The protocol doesn't work in such a way as to cause that to occur. >> We can help you figure this out, but you will have to give a lot more >> detail about the problem. Please describe your network setup. How many >> servers do you have? How many clients? How many file systems are those >> servers exporting? How many is a typical client mounting, and why? >> Start there and we can try to move forward. > > OK, we have 1342 nodes which act as servers as well as clients. Every > node exports a single local directory and all other nodes can mount this. Okay. In this instance, you're not going to run out of ports. Every machine might get 1341 connections from clients, and every machine might make 1341 client connections going out to other machines. None of this should cause you to run out of ports, period. If you don't understand that, refer back to my original message. A TCP socket is a unique 4-tuple. The host:port 2-tuples are NOT unique and not an exhaustible resource. There is is no way that your case is going to even remotely exhaust the 4-tuple space. > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) and then run a few 1000 jobs opening one file from one > box then one file from the other box and so on. With a short autofs > timeout that ought to work. I think there is no point in having a short autofs timeout, and you're likely to radically increase the overhead when you open files. > Our tests so far have shown that sometimes a node keeps a few mounts > open (autofs4 problems AFAIK) and at some point is not able to mount > more shares. Usually this occurs at about 350 mounts and we are not yet > 100% sure if we are running out of secure ports. You probably aren't running out of ports per se. You may be running out of OS resources, like file descriptors or something similar. -- Perry E. Metzger perry at piermont.com From landman at scalableinformatics.com Wed Jul 2 05:31:15 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 02 Jul 2008 08:31:15 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <486B2DC2.9010604@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> Message-ID: <486B7513.1020509@scalableinformatics.com> Carsten Aulbert wrote: >> The clients are connecting from ports below 1024 because Berkeley set >> up a hack in the original BSD stack so that only root could open ports >> below 1024. This way, you could "know" the process on the remote host >> was a root process, thus you could feel "secure" [sic]. It doesn't add >> any real security any more, but it is also not the cause of any >> problem you are experiencing. > > We might run out of "secure" ports. But you can force NFS to connect from the ports above 1024 so this shouldn't be an issue. [...] > OK, we have 1342 nodes which act as servers as well as clients. Every There is a short writeup on this with quotes from Bruce Allen in HPCwire. Too bad you didn't opt for JackRabbits there :) > node exports a single local directory and all other nodes can mount this. Fine, nothing terrible. > > What we do now to optimize the available bandwidth and IOs is spread > millions of files according to a hash algorithm to all nodes (multiple > copies as well) and then run a few 1000 jobs opening one file from one > box then one file from the other box and so on. With a short autofs Hmmm.... So you want to "track" spatial metadata (e.g. where the file is) according to some hash function that each node can execute, and then once this is known, perform IO. So, for example (as a relatively naive/simple minded version) some quick Perl pseudo-code ... # .... my $hash = MD5SUM($filename); my $machine = $hash % $Number_of_machines; my $machine_name= $name[$machine]; my $full_path = sprintf("/%s/%s",$machine_name,$filename); open(my $fh, ">".$full_path) or die "FATAL ERROR: unable to open $full_path\n"; # .... Is this about right? > timeout that ought to work. Typically it is possible that a single > process opens about 10-15 files per second, i.e. making 10-15 mounts per > second. With 4 parallel process per node that's 40-60 mounts/second. Hmmm ... mount latency we have seen is ~0.1 seconds or so, so I can believe 10-14/second. Note that due to strange latency effects in larger machines, we have also seen an automount take 0.5 seconds and more. Some delays due to name resolution. Never fully traced it, but this was on a 32 node cluster. You are talking a little bigger. > With a timeout of 5 seconds we should roughly have 200-300 concurrent > mounts (on average, no idea abut the variance). 200-300 mounts across 1342 nodes, sure. 200-300 mounts of one file system on one server from 200-300 client machines? I have some doubts ... > Our tests so far have shown that sometimes a node keeps a few mounts > open (autofs4 problems AFAIK) and at some point is not able to mount > more shares. Usually this occurs at about 350 mounts and we are not yet > 100% sure if we are running out of secure ports. Older kernels couldn't do more than 256 mounts. Not sure when/if this limit has been raised. This is a different problem though. If you have N machines mounting a file system, then you get N requests on port 2049 or similar (the inbound NFS port). You don't run out of secure ports. If the issue is that you are running 200+ outgoing mount requests from one machine, you will likely have a delay issue as you cross the 256 mount number (if your kernel hasn't been patched ... not sure if/when this has/will change). > All our boxes export now with "insecure" option (NFSv3), but our clients > all connect from a "secure" port, anyone here who might give us a hint > how to force this in Linux? See if you can get less than 256 mounts working well. If so, and it only starts falling off above 256 mounts, this would be important to know. Joe > > Thanks a lot > > Carsten > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From carsten.aulbert at aei.mpg.de Wed Jul 2 05:55:21 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Wed, 02 Jul 2008 14:55:21 +0200 Subject: [Beowulf] automount on high ports In-Reply-To: <87d4lweagv.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> Message-ID: <486B7AB9.9050202@aei.mpg.de> Hi Perry, Perry E. Metzger wrote: > > Okay. In this instance, you're not going to run out of ports. Every > machine might get 1341 connections from clients, and every machine > might make 1341 client connections going out to other machines. None > of this should cause you to run out of ports, period. If you don't > understand that, refer back to my original message. A TCP socket is a > unique 4-tuple. The host:port 2-tuples are NOT unique and not an > exhaustible resource. There is is no way that your case is going to > even remotely exhaust the 4-tuple space. Well, I understand your reasoning, but that's contradicted to what we do see netstat -an|awk '/2049/ {print $4}'|sed 's/10.10.13.41://'|sort -n shows us the follwing: 665 666 667 668 669 670 671 672 673 674 675 676 677 [...] 1017 1018 1019 1020 1021 1022 1023 Which corresponds exactly to the maximum achievable mounts of 358 right now. Besides, I'm far from being an expert on TCP/IP, but is it possible for a local process to bind to a port which is already in use but to another host? I don't think so, but may be wrong. Cheers Carsten From eagles051387 at gmail.com Wed Jul 2 06:05:09 2008 From: eagles051387 at gmail.com (Jon Aquilina) Date: Wed, 2 Jul 2008 15:05:09 +0200 Subject: [Beowulf] software for compatible with a cluster In-Reply-To: <20080702125625.GE47386@gby2.aoes.com> References: <3CB66E9F377C4961B5457896137EAD1B@geoffPC> <486A4296.4050501@hope.edu> <87y74lfabq.fsf@snark.cb.piermont.com> <87wsk4ed20.fsf@snark.cb.piermont.com> <20080702125625.GE47386@gby2.aoes.com> Message-ID: like you said in regards to maya money is a factor for me. if i do descide to setup a rendering cluster my problem is going to be finding someone who can make a small video in blender for me so i can render it. On 7/2/08, Greg Byshenk wrote: > > On Wed, Jul 02, 2008 at 07:32:55AM -0400, Perry E. Metzger wrote: > > "Jon Aquilina" writes: > > > > if i use blender how nicely does it work in a cluster? > > > I believe it works quite well. > > > The "Helmer" minicluster uses blender, and appears > to perform well. > > Also, Maya's 'muster' engine runs under Linux, and quite successfully. We > use it in a mixed environment, where the render pool consists of both > Windows workstations and Linux cluster nodes. > > Note, though, that like other commercial 3D products, Maya is expensive, > and may not be suitable for a student project. > > -- > Greg Byshenk > > -- Jonathan Aquilina -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark.kosmowski at gmail.com Wed Jul 2 06:11:46 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Wed, 2 Jul 2008 09:11:46 -0400 Subject: [Beowulf] Re: energy costs and poor grad students Message-ID: I'm in the US. I'm almost, but not quite ready for production runs - still learning the software / computational theory. I'm the first person in the research group (physical chemistry) to try to learn plane wave methods of solid state calculation as opposed to isolated atom-centered approximations and periodic atom centered calculations. It is turning out that the package I have spent the most time learning is perhaps not the best one for what we are doing. For a variety of reasons, many of which more off-topic than tac nukes and energy efficient washing machines ;) , I'm doing my studies part-time while working full-time in industry. I think I have come to a compromise that can keep me in business. Until I have a better understanding of the software and am ready for production runs, I'll stick to a small system that can be run on one node and leave the other two powered down. I've also applied for an adjunt instructor position at a local college for some extra cash and good experience. When I'm ready for production runs I can either just bite the bullet and pay the electricity bill or seek computer time elsewhere. Thanks for the encouragement, Mark E. Kosmowski On 7/1/08, ariel sabiguero yawelak wrote: > Well Mark, don't give up! > I am not sure which one is your application domain, but if you require 24x7 > computation, then you should not be hosting that at home. > On the other hand, if you are not doing real computation and you just have a > testbed at home, maybe for debugging your parallel applications or something > similar, you might be interested in a virtualized solution. Several years > ago, I used to "debug" some neural networks at home, but training sessions > (up to two weeks of training) happened at the university. > I would suggest to do something like that. > You can always scale-down your problem in several phases and save the > complete data-set / problem for THE RUN. > > You are not being a heretic there, but suffering energy costs ;-) > In more places that you may believe, useful computing nodes are being > replaced just because of energy costs. Even in some application domains you > can even loose computational power if you move from 4 nodes into a single > quad-core (i.e. memory bandwidth problems). I know it is very nice to be > able to do everything at home.. but maybe before dropping your studies or > working overtime to pay the electricity bill, you might want to reconsider > the fact of collapsing your phisical deploy into a single virtualized > cluster. (or just dispatch several threads/processes in a single system). > If you collapse into a single system you have only 1 mainboard, one HDD, one > power source, one processor (physically speaking), .... and you can achieve > almost the performance of 4 systems in one, consuming the power of.... well > maybe even less than a single one. I don't want to go into discussions about > performance gain/loose due to the variation of the hardware architecture. > Invest some bucks (if you haven't done that yet) in a good power source. > Efficiency of OEM unbranded power sources is realy pathetic. may be 45-50% > efficiency, while a good power source might be 75-80% efficient. Use the > energy for computing, not for heating your house. > What I mean is that you could consider just collapsing a complete "small" > cluster into single system. If your application is CPU-bound and not I/O > bound, VMware Server could be an option, as it is free software > (unfortunately not open, even tough some patches can be done on the > drivers). I think it is not possible to publish benchmarking data about > VMware, but I can tell you that in long timescales, the performance you get > in the host OS is similar than the one of the guest OS. There are a lot of > problems related to jitter, from crazy clocks to delays, but if your > application is not sensitive to that, then you are Ok. > Maybe this is not a solution, but you can provide more information regarding > your problem before quitting... > > my 2 cents.... > > ariel > > Mark Kosmowski escribi?: > > > At some point there a cost-benefit analysis needs to be performed. If > > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > > single-core land still and do not yet differentiate between CPU and > > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > > resources and would be better off buying new machines and physically > > transferring the RAM to and from them or running more jobs each > > distributed across fewer CPUs. Or saving on my electricity bill and > > powering down some nodes. > > > > As heretical as this last sounds, I'm tempted to throw in the towel on > > my PhD studies because I can no longer afford the power to run my > > three node cluster at home. Energy costs may end up being the straw > > that breaks this camel's back. > > > > Mark E. Kosmowski > > > > > > > > > From: "Jon Aquilina" > > > > > > > > > > > > > > > not sure if this applies to all kinds of senarios that clusters are used > in > > > but isnt the more ram you have the better? > > > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > > > > > Toon, > > > > > > > > Can you drop a line on how important RAM is for weather forecasting in > > > > latest type of calculations you're performing? > > > > > > > > Thanks, > > > > Vincent > > > > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > > > Jim Lux wrote: > > > > > > > > > > > > > Yep. And for good reason. Even a big DoD job is still tiny in > Nvidia's > > > > > > > > > > > > > > > > scale of operations. We face this all the time with NASA work. > > > > > > Semiconductor manufacturers have no real reason to produce > special purpose > > > > > > or customized versions of their products for space use, because > they can > > > > > > sell all they can make to the consumer market. More than once, > I've had a > > > > > > phone call along the lines of this: > > > > > > "Jim: I'm interested in your new ABC321 part." > > > > > > "Rep: Great. I'll just send the NDA over and we can talk about > it." > > > > > > "Jim: Great, you have my email and my fax # is..." > > > > > > "Rep: By the way, what sort of volume are you going to be using?" > > > > > > "Jim: Oh, 10-12.." > > > > > > "Rep: thousand per week, excellent..." > > > > > > "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe > every > > > > > > year." > > > > > > "Rep: Oh..." > > > > > > {Well, to be fair, it's not that bad, they don't hang up on you.. > > > > > > > > > > > > > > > > > > > > > > > Since about a year, it's been clear to me that weather forecasting > (i.e., > > > > > running a more or less sophisticated atmospheric model to provide > weather > > > > > predictions) is going to be "mainstream" in the sense that every > business > > > > > that needs such forecasts for its operations can simply run them > in-house. > > > > > > > > > > Case in point: I bought a $1100 HP box (the obvious target group > being > > > > > teenage downloaders) which performs the HIRLAM limited area model > *on the > > > > > grid that we used until October 2006* in December last year. > > > > > > > > > > It's about twice as slow as our then-operational 50-CPU Sun Fire > 15K. > > > > > > > > > > I wonder what effect this will have on CPU developments ... > > > > > > > > > > -- > > > > > Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 > 214290 > > > > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > > > At home: http://moene.indiv.nluug.nl/~toon/ > > > > > Progress of GNU Fortran: > http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf at beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > -- > > > Jonathan Aquilina > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > From landman at scalableinformatics.com Wed Jul 2 06:44:20 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 02 Jul 2008 09:44:20 -0400 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: References: Message-ID: <486B8634.6020309@scalableinformatics.com> Hi Mark Mark Kosmowski wrote: > I'm in the US. I'm almost, but not quite ready for production runs - > still learning the software / computational theory. I'm the first > person in the research group (physical chemistry) to try to learn > plane wave methods of solid state calculation as opposed to isolated > atom-centered approximations and periodic atom centered calculations. Heh... my research group in grad school went through that transition in the mid 90s. Went from an LCAO-type simulation to CP like methods. We needed a t3e to run those (then). Love to compare notes and see which code you are using someday. On-list/off-list is fine. > It is turning out that the package I have spent the most time learning > is perhaps not the best one for what we are doing. For a variety of > reasons, many of which more off-topic than tac nukes and energy > efficient washing machines ;) , I'm doing my studies part-time while > working full-time in industry. More power to ya! I did mine that way too ... the writing was the hardest part. Just don't lose focus, or stop believing you can do it. When the light starts getting visible at the end of the process, it is quite satisfying. I have other words to describe this, but they require a beer lever to get them out of me ... > I think I have come to a compromise that can keep me in business. > Until I have a better understanding of the software and am ready for > production runs, I'll stick to a small system that can be run on one > node and leave the other two powered down. I've also applied for an > adjunt instructor position at a local college for some extra cash and > good experience. When I'm ready for production runs I can either just > bite the bullet and pay the electricity bill or seek computer time > elsewhere. Give us a shout when you want to try the time on a shared resource. Some folks here may be able to make good suggestions. RGB is a physics guy at Duke, doing lots of simulations, and might know of resources. Others here might as well. Joe > > Thanks for the encouragement, > > Mark E. Kosmowski > > On 7/1/08, ariel sabiguero yawelak wrote: >> Well Mark, don't give up! >> I am not sure which one is your application domain, but if you require 24x7 >> computation, then you should not be hosting that at home. >> On the other hand, if you are not doing real computation and you just have a >> testbed at home, maybe for debugging your parallel applications or something >> similar, you might be interested in a virtualized solution. Several years >> ago, I used to "debug" some neural networks at home, but training sessions >> (up to two weeks of training) happened at the university. >> I would suggest to do something like that. >> You can always scale-down your problem in several phases and save the >> complete data-set / problem for THE RUN. >> >> You are not being a heretic there, but suffering energy costs ;-) >> In more places that you may believe, useful computing nodes are being >> replaced just because of energy costs. Even in some application domains you >> can even loose computational power if you move from 4 nodes into a single >> quad-core (i.e. memory bandwidth problems). I know it is very nice to be >> able to do everything at home.. but maybe before dropping your studies or >> working overtime to pay the electricity bill, you might want to reconsider >> the fact of collapsing your phisical deploy into a single virtualized >> cluster. (or just dispatch several threads/processes in a single system). >> If you collapse into a single system you have only 1 mainboard, one HDD, one >> power source, one processor (physically speaking), .... and you can achieve >> almost the performance of 4 systems in one, consuming the power of.... well >> maybe even less than a single one. I don't want to go into discussions about >> performance gain/loose due to the variation of the hardware architecture. >> Invest some bucks (if you haven't done that yet) in a good power source. >> Efficiency of OEM unbranded power sources is realy pathetic. may be 45-50% >> efficiency, while a good power source might be 75-80% efficient. Use the >> energy for computing, not for heating your house. >> What I mean is that you could consider just collapsing a complete "small" >> cluster into single system. If your application is CPU-bound and not I/O >> bound, VMware Server could be an option, as it is free software >> (unfortunately not open, even tough some patches can be done on the >> drivers). I think it is not possible to publish benchmarking data about >> VMware, but I can tell you that in long timescales, the performance you get >> in the host OS is similar than the one of the guest OS. There are a lot of >> problems related to jitter, from crazy clocks to delays, but if your >> application is not sensitive to that, then you are Ok. >> Maybe this is not a solution, but you can provide more information regarding >> your problem before quitting... >> >> my 2 cents.... >> >> ariel >> >> Mark Kosmowski escribi?: >> >>> At some point there a cost-benefit analysis needs to be performed. If >>> my cluster at peak usage only uses 4 Gb RAM per CPU (I live in >>> single-core land still and do not yet differentiate between CPU and >>> core) and my nodes all have 16 Gb per CPU then I am wasting RAM >>> resources and would be better off buying new machines and physically >>> transferring the RAM to and from them or running more jobs each >>> distributed across fewer CPUs. Or saving on my electricity bill and >>> powering down some nodes. >>> >>> As heretical as this last sounds, I'm tempted to throw in the towel on >>> my PhD studies because I can no longer afford the power to run my >>> three node cluster at home. Energy costs may end up being the straw >>> that breaks this camel's back. >>> >>> Mark E. Kosmowski >>> >>> >>> >>>> From: "Jon Aquilina" >>>> >>>> >>> >>> >>>> not sure if this applies to all kinds of senarios that clusters are used >> in >>>> but isnt the more ram you have the better? >>>> >>>> On 6/30/08, Vincent Diepeveen wrote: >>>> >>>> >>>>> Toon, >>>>> >>>>> Can you drop a line on how important RAM is for weather forecasting in >>>>> latest type of calculations you're performing? >>>>> >>>>> Thanks, >>>>> Vincent >>>>> >>>>> >>>>> On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: >>>>> >>>>> Jim Lux wrote: >>>>> >>>>> >>>>>> Yep. And for good reason. Even a big DoD job is still tiny in >> Nvidia's >>>>>> >>>>>>> scale of operations. We face this all the time with NASA work. >>>>>>> Semiconductor manufacturers have no real reason to produce >> special purpose >>>>>>> or customized versions of their products for space use, because >> they can >>>>>>> sell all they can make to the consumer market. More than once, >> I've had a >>>>>>> phone call along the lines of this: >>>>>>> "Jim: I'm interested in your new ABC321 part." >>>>>>> "Rep: Great. I'll just send the NDA over and we can talk about >> it." >>>>>>> "Jim: Great, you have my email and my fax # is..." >>>>>>> "Rep: By the way, what sort of volume are you going to be using?" >>>>>>> "Jim: Oh, 10-12.." >>>>>>> "Rep: thousand per week, excellent..." >>>>>>> "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe >> every >>>>>>> year." >>>>>>> "Rep: Oh..." >>>>>>> {Well, to be fair, it's not that bad, they don't hang up on you.. >>>>>>> >>>>>>> >>>>>>> >>>>>> Since about a year, it's been clear to me that weather forecasting >> (i.e., >>>>>> running a more or less sophisticated atmospheric model to provide >> weather >>>>>> predictions) is going to be "mainstream" in the sense that every >> business >>>>>> that needs such forecasts for its operations can simply run them >> in-house. >>>>>> Case in point: I bought a $1100 HP box (the obvious target group >> being >>>>>> teenage downloaders) which performs the HIRLAM limited area model >> *on the >>>>>> grid that we used until October 2006* in December last year. >>>>>> >>>>>> It's about twice as slow as our then-operational 50-CPU Sun Fire >> 15K. >>>>>> I wonder what effect this will have on CPU developments ... >>>>>> >>>>>> -- >>>>>> Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 >> 214290 >>>>>> Saturnushof 14, 3738 XG Maartensdijk, The Netherlands >>>>>> At home: http://moene.indiv.nluug.nl/~toon/ >>>>>> Progress of GNU Fortran: >> http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Beowulf mailing list, Beowulf at beowulf.org >>>>> To change your subscription (digest mode or unsubscribe) visit >>>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>>> >>>>> >>>>> >>>> -- >>>> Jonathan Aquilina >>>> >>>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From henning.fehrmann at aei.mpg.de Wed Jul 2 06:42:28 2008 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Wed, 2 Jul 2008 15:42:28 +0200 Subject: [Beowulf] automount on high ports In-Reply-To: <486B7AB9.9050202@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <20080702134228.GA5152@gretchen.aei.uni-hannover.de> > Which corresponds exactly to the maximum achievable mounts of 358 right 359 ;) If the number of mounts is smaller the ports are randomly used in this range. It would be convenient to enter the insecure area. Using the option insecure for the NFS exports is apparently not sufficient. Also every nfs server is connected from a distinct port on the client side. Two mounts to a single server might end up on the same port. Cheers Henning From gerry.creager at tamu.edu Wed Jul 2 07:09:34 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Wed, 02 Jul 2008 09:09:34 -0500 Subject: [Beowulf] automount on high ports In-Reply-To: <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <486B6501.5000108@aei.mpg.de> <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> Message-ID: <486B8C1E.2090007@tamu.edu> Scott Atchley wrote: > On Jul 2, 2008, at 7:22 AM, Carsten Aulbert wrote: > >> Bogdan Costescu wrote: >>> >>> Have you considered using a parallel file system ? >> >> We looked a bit into a few, but would love to get any input from anyone >> on that. What we found so far was not really convincing, e.g. glusterFS >> at that time was not really stable, lustre was too easy to crash - at l >> east at that time, ... > > Hi Carsten, > > I have not looked at GlusterFS at all. I have worked with Lustre and > PVFS2 (I wrote the shims to allow them to run on MX). > > Although I believe Lustre's robustness is very good these days, I do not > believe that it will not work in your setting. I think that they > currently do not recommend mounting a client on a node that is also > working as a server as you are doing with NFS. I believe it is due to > memory contention leading to deadlock. Lustre is good enough that it's the parallel FS at TACC for the Ranger cluster. And, I've had no real problems as a user thereof. We're brining up glustre on our new cluster here ( CentOS/RHEL5, not debian ). We looked at zfs but didn't have sufficient experience to go that path. > PVFS2 does, however, support your scenario where each node is a server > and can be mounted locally as well. PVFS2 servers run in userspace and > can be easily debugged. If you are using MPI-IO, it integrates nicely as > well. Even so, keep in mind that using each node as a server will > consume network resources and will compete with MPI communications. Someone at NCAR recently suggested we review PVFS2. I'm gonna do it as soon as I get a free moment on vacation. -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 07:12:09 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 2 Jul 2008 16:12:09 +0200 (CEST) Subject: [Beowulf] automount on high ports In-Reply-To: <87d4lweagv.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > A given client would need to be forming over 1000 connections to a > given server NFS port for that to be a problem. Not quite. The reserved ports that are free for use (512 and up) are not all free to be taken by NFS as it pleases - there are many daemons that have to use those well-known ports. F.e. some years ago a common complaint was that the CUPS daemon (port 631) was often conflicting with NFS client mounts; I think that what was chosen by various distributions was the easy way out - make the NFS client only allocate ports starting at 650 or so. > Every machine might get 1341 connections from clients, and every > machine might make 1341 client connections going out to other > machines None of this should cause you to run out of ports, period. With all due respect, I think that you are not quite familiar with the NFS implementation on Linux (and maybe other NFS implementations). What you describe is the theoretical use of TCP connections; the way NFS on Linux uses TCP is not quite as you imagine: there is one port taken on the client for each NFS mount and that port is not reused. Also mounting 2 different mount points from the same NFS server to the same NFS client uses 2 TCP ports on the client side - at least with NFS v2 and v3; for v4 I think that there is only one connection between a client and a server independent on the number of mount points. I do encourage you to subscribe to the Linux NFS list if you want to learn more; I've been there for a long time (unfortunately not anymore...) and the people, especially the developers, were very helpful. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu at iwr.uni-heidelberg.de From ntmoore at gmail.com Wed Jul 2 07:22:37 2008 From: ntmoore at gmail.com (Nathan Moore) Date: Wed, 2 Jul 2008 09:22:37 -0500 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: <486B8634.6020309@scalableinformatics.com> References: <486B8634.6020309@scalableinformatics.com> Message-ID: <6009416b0807020722l56f05affs878b762d285bba9d@mail.gmail.com> Does your university have public computer labs? Do the computers run some variant of Unix? At UMN, where I did my grad work in physics, there were a number of semi-public "Scientific Visualization" or "Large Data Analysis" labs that were hosted in the local supercomputer center. The center there has a number of large machines that you had to apply and give a really good rationale to use, but the smaller development labs (with 2-way to 10-way sunfires, similar sized sgi's, linux machines, etc) basically sat vacant 5-6 days per week. Some of the labs had a pbs queue, some had a condor queue, and some just required that background jobs be "nice +19 ./a.out". My graduate work required several large parametric studies which computationally looked like lots of monte-carlo-ish runs which could be done in parallel. The beauty of this was that no message passing was required, so, if there were 23 cores open one evening at 6pm, and assuming no one would be doing work overnight (for the next 14 hours), I could start 23 14 hour jobs at 6pm and have a little less than 2 weeks of cpu work done by 8am the next morning. I used (and mentioned) the technique in the paper, http://www.pnas.org/cgi/content/full/101/37/13431 (search for "computational impotence"). This only works though if your university's computer labs run a unix-ish os, and if the sysadmins are progressive. At the school where I presently teach similar endeavors have been much harder to start-up. Nathan Moore On Wed, Jul 2, 2008 at 8:44 AM, Joe Landman wrote: > Hi Mark > > Mark Kosmowski wrote: > >> I'm in the US. I'm almost, but not quite ready for production runs - >> still learning the software / computational theory. I'm the first >> person in the research group (physical chemistry) to try to learn >> plane wave methods of solid state calculation as opposed to isolated >> atom-centered approximations and periodic atom centered calculations. >> > > Heh... my research group in grad school went through that transition in the > mid 90s. Went from an LCAO-type simulation to CP like methods. We needed a > t3e to run those (then). > > Love to compare notes and see which code you are using someday. > On-list/off-list is fine. > > It is turning out that the package I have spent the most time learning >> is perhaps not the best one for what we are doing. For a variety of >> reasons, many of which more off-topic than tac nukes and energy >> efficient washing machines ;) , I'm doing my studies part-time while >> working full-time in industry. >> > > More power to ya! I did mine that way too ... the writing was the hardest > part. Just don't lose focus, or stop believing you can do it. When the > light starts getting visible at the end of the process, it is quite > satisfying. > > I have other words to describe this, but they require a beer lever to get > them out of me ... > > I think I have come to a compromise that can keep me in business. >> Until I have a better understanding of the software and am ready for >> production runs, I'll stick to a small system that can be run on one >> node and leave the other two powered down. I've also applied for an >> adjunt instructor position at a local college for some extra cash and >> good experience. When I'm ready for production runs I can either just >> bite the bullet and pay the electricity bill or seek computer time >> elsewhere. >> > > Give us a shout when you want to try the time on a shared resource. Some > folks here may be able to make good suggestions. RGB is a physics guy at > Duke, doing lots of simulations, and might know of resources. Others here > might as well. > > Joe > > > >> Thanks for the encouragement, >> >> Mark E. Kosmowski >> >> On 7/1/08, ariel sabiguero yawelak wrote: >> >>> Well Mark, don't give up! >>> I am not sure which one is your application domain, but if you require >>> 24x7 >>> computation, then you should not be hosting that at home. >>> On the other hand, if you are not doing real computation and you just >>> have a >>> testbed at home, maybe for debugging your parallel applications or >>> something >>> similar, you might be interested in a virtualized solution. Several years >>> ago, I used to "debug" some neural networks at home, but training >>> sessions >>> (up to two weeks of training) happened at the university. >>> I would suggest to do something like that. >>> You can always scale-down your problem in several phases and save the >>> complete data-set / problem for THE RUN. >>> >>> You are not being a heretic there, but suffering energy costs ;-) >>> In more places that you may believe, useful computing nodes are being >>> replaced just because of energy costs. Even in some application domains >>> you >>> can even loose computational power if you move from 4 nodes into a single >>> quad-core (i.e. memory bandwidth problems). I know it is very nice to be >>> able to do everything at home.. but maybe before dropping your studies or >>> working overtime to pay the electricity bill, you might want to >>> reconsider >>> the fact of collapsing your phisical deploy into a single virtualized >>> cluster. (or just dispatch several threads/processes in a single system). >>> If you collapse into a single system you have only 1 mainboard, one HDD, >>> one >>> power source, one processor (physically speaking), .... and you can >>> achieve >>> almost the performance of 4 systems in one, consuming the power of.... >>> well >>> maybe even less than a single one. I don't want to go into discussions >>> about >>> performance gain/loose due to the variation of the hardware architecture. >>> Invest some bucks (if you haven't done that yet) in a good power source. >>> Efficiency of OEM unbranded power sources is realy pathetic. may be >>> 45-50% >>> efficiency, while a good power source might be 75-80% efficient. Use the >>> energy for computing, not for heating your house. >>> What I mean is that you could consider just collapsing a complete "small" >>> cluster into single system. If your application is CPU-bound and not I/O >>> bound, VMware Server could be an option, as it is free software >>> (unfortunately not open, even tough some patches can be done on the >>> drivers). I think it is not possible to publish benchmarking data about >>> VMware, but I can tell you that in long timescales, the performance you >>> get >>> in the host OS is similar than the one of the guest OS. There are a lot >>> of >>> problems related to jitter, from crazy clocks to delays, but if your >>> application is not sensitive to that, then you are Ok. >>> Maybe this is not a solution, but you can provide more information >>> regarding >>> your problem before quitting... >>> >>> my 2 cents.... >>> >>> ariel >>> >>> Mark Kosmowski escribi?: >>> >>> At some point there a cost-benefit analysis needs to be performed. If >>>> my cluster at peak usage only uses 4 Gb RAM per CPU (I live in >>>> single-core land still and do not yet differentiate between CPU and >>>> core) and my nodes all have 16 Gb per CPU then I am wasting RAM >>>> resources and would be better off buying new machines and physically >>>> transferring the RAM to and from them or running more jobs each >>>> distributed across fewer CPUs. Or saving on my electricity bill and >>>> powering down some nodes. >>>> >>>> As heretical as this last sounds, I'm tempted to throw in the towel on >>>> my PhD studies because I can no longer afford the power to run my >>>> three node cluster at home. Energy costs may end up being the straw >>>> that breaks this camel's back. >>>> >>>> Mark E. Kosmowski >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From perry at piermont.com Wed Jul 2 07:26:13 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 10:26:13 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <486B7AB9.9050202@aei.mpg.de> (Carsten Aulbert's message of "Wed\, 02 Jul 2008 14\:55\:21 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <874p78e516.fsf@snark.cb.piermont.com> Skip to the bottom for advice on how to make NFS only use non-prived ports. My guess is still that it isn't priv ports that are causing trouble, but I describe at the bottom what you need to do to get rid of that issue entirely. I'd advise reading the rest, but the part about how to disable the stuff is after the --- near the bottom. Carsten Aulbert writes: > Well, I understand your reasoning, but that's contradicted to what we do see > > netstat -an|awk '/2049/ {print $4}'|sed 's/10.10.13.41://'|sort -n > > shows us the follwing: Are those all mounts to ONE HOST? Because if they are, you're going to run out of ports. If you're connecting to multiple hosts should you be okay, but you certainly could run out of ports between two hosts -- you only have 1023 prived connections from a given host to a single port on another box. Of course, one might validly ask why the other 650 odd ports aren't usable -- clearly they should be, right? The limit is 1023, not 358. It might be that there is some Linux oddness here. Anyway, this shouldn't be a problem if you're connecting to MANY servers, but maybe there's some linux weirdness here. See below. > Which corresponds exactly to the maximum achievable mounts of 358 right > now. Besides, I'm far from being an expert on TCP/IP, but is it possible > for a local process to bind to a port which is already in use but to > another host? Of course! You can use the same local port number with connections to different remote hosts. You can even use the same local port number with multiple connections to the same remote host provided the remote host is using different port numbers on its end. Every open socket is a 4-tuple of localip:localport:remoteip:remoteport Provided two sockets don't share that 4-tuple, you can have both. Now, a given OS may screw up how they handle this, but the *protocol* certainly permits it. Perhaps you're right and Linux isn't dealing with this gracefully. We can check that. > I don't think so, but may be wrong. Then how does an SMTP server handle thousands of simultaneous connections all coming to port 25? :) In any case, this is what the NFS FAQ says. It does mention the priv port problem, but only in a context in which makes me think it is talking about two given hosts and not one client and many hosts. However, I might be wrong. See below: >From http://nfs.sourceforge.net/ B3. Why can't I mount more than 255 NFS file systems on my client? Why is it sometimes even less than 255? A. On Linux, each mounted file system is assigned a major number, which indicates what file system type it is (eg. ext3, nfs, isofs); and a minor number, which makes it unique among the file systems of the same type. In kernels prior to 2.6, Linux major and minor numbers have only 8 bits, so they may range numerically from zero to 255. Because a minor number has only 8 bits, a system can mount only 255 file systems of the same type. So a system can mount up to 255 NFS file systems, another 255 ext3 file system, 255 more iosfs file systems, and so on. Kernels after 2.6 have 20-bit wide minor numbers, which alleviate this restriction. For the Linux NFS client, however, the problem is somewhat worse because it is an anonymous file system. Local disk-based file systems have a block device associated with them, but anonymous file systems do not. /proc, for example, is an anonymous file system, and so are other network file systems like AFS. All anonymous file systems share the same major number, so there can be a maximum of only 255 anonymous file systems mounted on a single host. Usually you won't need more than ten or twenty total NFS mounts on any given client. In some large enterprises, though, your work and users might be spread across hundreds of NFS file servers. To work around the limitation on the number of NFS file systems you can mount on a single host, we recommend that you set up and run one of the automounter daemons for Linux. An automounter finds and mounts file systems as they are needed, and unmounts any that it finds are inactive. You can find more information on Linux automounters here. You may also run into a limit on the number of privileged network ports on your system. The NFS client uses a unique socket with its own port number for each NFS mount point. Using an automounter helps address the limited number of available ports by automatically unmounting file systems that are not in use, thus freeing their network ports. NFS version 4 support in the Linux NFS client uses a single socket per client-server pair, which also helps increase the allowable number of NFS mount points on a client. Now, until you brought this up, I would have guessed that this meant you could run out of priv ports between host A and host B -- i.e. host B is the client, is connecting to one port on host A, and is trying to mount more than 1023 file systems on host A and fails because it runs out of priv ports. However, if your test is not between two hosts but is rather between multiple hosts, perhaps for whatever reason Linux is braindead and is not allowing you to re-use the same local socket ports. We can diagnose that later. --- So, here are the things you need to do to totally remove the priv ports thing from the situation: 1) On the server, in your exports file you have to put the "insecure" option onto every exported file system. Otherwise the mountd will demand that the remote side use a "secure" mount. You've already done this according to the initial mail message. However, that only tells the server not to care if the client comes in from a port above 1024 2) The client side is where the action is -- the client picks the port it opens after all. Unfortunately, Linux DOES NOT have an option to do this. BSD, Solaris, etc. do, but not Linux. You need to hack the source to make it happen. On a reasonably current source tree, go to: /usr/src/linux/fs/nfs/mount_clnt.c and look for the argument structure being built for rpc_create. You need to or-in RPC_CLNT_CREATE_NONPRIVPORT to the .flags member, as in (for example, depending on your version, this is 2.6.24): .flags = RPC_CLNT_CREATE_INTR, to .flags = RPC_CLNT_CREATE_INTR | RPC_CLNT_CREATE_NONPRIVPORT, This is a bloody ugly hack that will make ALL connections unprived, so you might have trouble with "normal" mounts. This can be done more cleanly, but it would require more than a one line patch. However, it would get you through testing. If it works for you and you really need it, a clean mount option could be added. My guess is that this is not your problem! However, can check and see if I'm wrong, and if I am, then we can move on to fixing it better. Perry -- Perry E. Metzger perry at piermont.com From perry at piermont.com Wed Jul 2 07:30:14 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 10:30:14 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <20080702134228.GA5152@gretchen.aei.uni-hannover.de> (Henning Fehrmann's message of "Wed\, 2 Jul 2008 15\:42\:28 +0200") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <20080702134228.GA5152@gretchen.aei.uni-hannover.de> Message-ID: <87wsk4cqa1.fsf@snark.cb.piermont.com> Henning Fehrmann writes: >> Which corresponds exactly to the maximum achievable mounts of 358 right > > 359 ;) > > If the number of mounts is smaller the ports are randomly used in this range. > It would be convenient to enter the insecure area. > Using the option insecure for the NFS exports is apparently not > sufficient. Well, no, it isn't. The server doesn't control what the client does. The "insecure" option only says the server will accept such connections -- you have to tell the client to make them. On BSD and Solaris that's easy, but on Linux you need to hack the kernel. I have just sent a message explaining how do do that. Note that I still don't think this is your problem, but you might as well check. -- Perry E. Metzger perry at piermont.com From perry at piermont.com Wed Jul 2 07:35:45 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 10:35:45 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: (Bogdan Costescu's message of "Wed\, 2 Jul 2008 16\:12\:09 +0200 $CEST$") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> Message-ID: <87skuscq0u.fsf@snark.cb.piermont.com> Bogdan Costescu writes: >> Every machine might get 1341 connections from clients, and every >> machine might make 1341 client connections going out to other >> machines None of this should cause you to run out of ports, period. > > With all due respect, I think that you are not quite familiar with the > NFS implementation on Linux (and maybe other NFS > implementations). I'm plenty familiar with the implementations on other OSes. I only looked at the code on Linux this morning for the first time (never had call before)... > What you describe is the theoretical use of TCP > connections; the way NFS on Linux uses TCP is not quite as you > imagine: there is one port taken on the client for each NFS mount and > that port is not reused. That's not an NFS implementation issue. It is a TCP implementation issue. (Actually, I'm currently looking at the code and it may be an issue in the rpc code, but never mind that.) In general, the OS should let you use a given port to connect to as many remote hosts as you like. The only thing it should prevent is having you talk to a single remote host/port combination from one local port (because you can't -- that would be the same 4-tuple.) > Also mounting 2 different mount points from > the same NFS server to the same NFS client uses 2 TCP ports on the > client side - at least with NFS v2 and v3; for v4 I think that there > is only one connection between a client and a server independent on > the number of mount points. That is indeed correct. (Actually, linux can burn more than 2 ports, depending.) -- Perry E. Metzger perry at piermont.com From rgb at phy.duke.edu Wed Jul 2 07:50:40 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 2 Jul 2008 10:50:40 -0400 (EDT) Subject: [Beowulf] A press release In-Reply-To: <87bq1hgpep.fsf@snark.cb.piermont.com> References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <87bq1hgpep.fsf@snark.cb.piermont.com> Message-ID: On Tue, 1 Jul 2008, Perry E. Metzger wrote: > > Prentice Bisbal writes: >>>> does it necessarily have to be a redhat clone. can it also be a debian >>>> based >>>> clone? >>> >>> but why? is there some concrete advantage to using Debian? >>> I've never understood why Debian users tend to be very True Believer, >>> or what it is that hooks them. >> >> And the Debian users can say the same thing about Red Hat users. Or SUSE >> users. And if any still exist, the Slackware users could say the same >> thing about the both of them. But then the Slackware users could also >> point out that the first Linux distro was Slackware, so they are using >> the one true Linux distro... Or rather, one of two or three contemporary "firsts", in the guise of SLS which became Slackware. I actually started with SLS and then transitioned to Slackware, all 20 or 30 little floppies of it. The problem (for me) was getting an install on a 4 MB system, which is all that I had at the time. > Precisely. It pays to allow people to use what they want. Fewer > religious battles that way. Whether one distro or another has an > advantage isn't the point -- people have their own tastes and it > doesn't pay to tell them "no" without good reason. It isn't all about religion. There are two "real" problems with Slackware. One is its packaging system, the other (related) is maintenance. It's packaging system doesn't really manage dependences or automated updates, and dependence resolution is a major pain in the ass when one is installing a large sheaf of applications all at once. I was once a passionate, fervent, nay, religious user -- it has/had a very SunOS/BSD-like etc layout that was quite painless for me to work, moving over from administrating a mostly-SunOS network, where RH had a much more SysV-like interface that I had to learn. The sources for most of its apps were visibly ports of of the same software I regularly built for the Suns -- remember that right up to linux, Sun workstations were "the" unix boxes for people that wrote and adopted Linux. Maintaining all the open source packages was "easy" on Suns because that is what the open source writers were using and was usually the makefile default, but it was a PITA (or more practically, "expensive" in human time and duplicated effort) there as well. Beyond automated install/updates and dependencies (that now can be sort-of-managed with add-ons basically derived from apt tools or rpm tools) Slackware's other major problem is simply its up-to-dateness. I don't know numbers, but I think it is way, way behind in number of users these days to both Debian and RH-derived distros, not to mention all the rest. I'd be surprised if it were as high as fifth in user base. This basically means that there is a time lag between package developments and releases in the other distros where the user (and hence DEVELOPER) base reside. Then there is a further delay in getting builds in that work with the existing dependencies, because there is no dependency system to speak of. Time lags of this sort are windows of opportunity when security exploits are discovered. They also annoy users, who ask "why is X available in distro Y but not here?" I think of Slackware as being a great hacker distro, a good distro for somebody who wants to work close to the metal (and very hard) to manage their sources, but not the best distro for trouble-free, scalable maintenance of a large network of systems OR for individual users installing a personal standalone workstation. These two points aren't (I think) "religion" -- they are practical costs associated with using the distro for clusters or workstation LANs or personal workstations that need to be considered when picking a distro for any of those purposes. When I considered them, I switched. The human costs are real; people pay money for them or they come out of a fixed opportunity cost time budget. One person can manage a staggeringly large, surprisingly heterogeneous network of RH-derived systems with kickstart with very little effort -- what effort one expends scales up to the entire network. Debian is reportedly similarly manageable at scale, although I have less experience there. I have never heard anyone say "Yeah, Slackware, that's the best distro to use if you have just one person and she has to manage four hundred systems in a mix of cluster, lab and desktop LAN settings. rgb > > Perry > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From perry at piermont.com Wed Jul 2 07:57:02 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 10:57:02 -0400 Subject: [Beowulf] A press release In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 10\:50\:40 -0400 $EDT$") References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <87bq1hgpep.fsf@snark.cb.piermont.com> Message-ID: <87skusbagx.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: >> Precisely. It pays to allow people to use what they want. Fewer >> religious battles that way. Whether one distro or another has an >> advantage isn't the point -- people have their own tastes and it >> doesn't pay to tell them "no" without good reason. > > It isn't all about religion. There are two "real" problems with > Slackware. One is its packaging system, the other (related) is > maintenance. I wasn't mentioning Slackware. The "major" distros are all pretty similar in features, but I wouldn't count Slackware that way. Perry From rgb at phy.duke.edu Wed Jul 2 08:12:05 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 2 Jul 2008 11:12:05 -0400 (EDT) Subject: [Beowulf] automount on high ports In-Reply-To: <486B7AB9.9050202@aei.mpg.de> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Carsten Aulbert wrote: > Which corresponds exactly to the maximum achievable mounts of 358 right > now. Besides, I'm far from being an expert on TCP/IP, but is it possible > for a local process to bind to a port which is already in use but to > another host? I don't think so, but may be wrong. AFAIK, no they don't. The way TCP daemons that listen on a well-known/privileged port work is that they accept a connection on that port, then fork a connection on a higher unprivileged (>1023) port on both ends so that the daemon can listen once again. You can see this by running e.g. netstat -a. Many daemons have a limit that can be set on the number of simultaneous connections they can manage. However, this is for TCP ports that maintain a persistent connection. UDP ports are "connectionless" and hence somewhat different. They tend to make a connection, receive a command/request for some service, immediately deliver the result, and end the connection. NFS used to be built on top of UDP, and honestly I don't know what it does and how it (NFSv3) does it on TCP and am too lazy to look it up, but the RFCs are there to be read. rgb > > Cheers > > Carsten > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From mark.kosmowski at gmail.com Wed Jul 2 08:19:42 2008 From: mark.kosmowski at gmail.com (Mark Kosmowski) Date: Wed, 2 Jul 2008 11:19:42 -0400 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: <486B8634.6020309@scalableinformatics.com> References: <486B8634.6020309@scalableinformatics.com> Message-ID: On 7/2/08, Joe Landman wrote: > Hi Mark > > Mark Kosmowski wrote: > > I'm in the US. I'm almost, but not quite ready for production runs - > > still learning the software / computational theory. I'm the first > > person in the research group (physical chemistry) to try to learn > > plane wave methods of solid state calculation as opposed to isolated > > atom-centered approximations and periodic atom centered calculations. > > > > Heh... my research group in grad school went through that transition in the > mid 90s. Went from an LCAO-type simulation to CP like methods. We needed a > t3e to run those (then). > > Love to compare notes and see which code you are using someday. > On-list/off-list is fine. Right now I'm using CPMD. This is the first package I've looked at and wrestled with the 32-bit limitations of memory allocation prior to the debut of the Opterons. I was at the cusp of buying UltraSparc hardware at student pricing to go forward when the Opterons were released to market, so I decided to go with the PC hardware I was already familiar with. We're comparing calculations to inelastic neutron scattering experiments and it looks like abinit or quantum espresso might be a better choice for this to do vibrational analysis at q-space other than the gamma point. Speaking of, I only have an eighth of a clue about understanding k-points (and, by extension, q-space). If anyone can suggest some reading for this topic that even a part-time chemistry student can understand it would be greatly appreciated. > > > It is turning out that the package I have spent the most time learning > > is perhaps not the best one for what we are doing. For a variety of > > reasons, many of which more off-topic than tac nukes and energy > > efficient washing machines ;) , I'm doing my studies part-time while > > working full-time in industry. > > > > More power to ya! I did mine that way too ... the writing was the hardest > part. Just don't lose focus, or stop believing you can do it. When the > light starts getting visible at the end of the process, it is quite > satisfying. > > I have other words to describe this, but they require a beer lever to get > them out of me ... I make mead on occaision - if you're ever in central NY (Syracuse - Rome - Utica area)... Speaking of satisfaction, I did teach myself enough Fortran to add to the CPMD code to give an output format natively readable by aClimax (used to calculate harmonics from fundamental frequencies for INS). This is/will be included in the recently/soon to be released version of CPMD. Heck, there's one or two pages of dissertation right there. :) > > > I think I have come to a compromise that can keep me in business. > > Until I have a better understanding of the software and am ready for > > production runs, I'll stick to a small system that can be run on one > > node and leave the other two powered down. I've also applied for an > > adjunt instructor position at a local college for some extra cash and > > good experience. When I'm ready for production runs I can either just > > bite the bullet and pay the electricity bill or seek computer time > > elsewhere. > > > > Give us a shout when you want to try the time on a shared resource. Some > folks here may be able to make good suggestions. RGB is a physics guy at > Duke, doing lots of simulations, and might know of resources. Others here > might as well. > > Joe > > Sounds good. The big thing is getting a bit better understanding of the theory, especially DFT dispersion correction to account for hydrogen bonding. I'm thinking that I will learn about DFT dispersion correction with CPMD to at least get a reasonable understanding and then consider learning one of the other packages to do q-space calculations. > > > > Thanks for the encouragement, > > > > Mark E. Kosmowski > > > > On 7/1/08, ariel sabiguero yawelak wrote: > > > > > Well Mark, don't give up! > > > I am not sure which one is your application domain, but if you require > 24x7 > > > computation, then you should not be hosting that at home. > > > On the other hand, if you are not doing real computation and you just > have a > > > testbed at home, maybe for debugging your parallel applications or > something > > > similar, you might be interested in a virtualized solution. Several > years > > > ago, I used to "debug" some neural networks at home, but training > sessions > > > (up to two weeks of training) happened at the university. > > > I would suggest to do something like that. > > > You can always scale-down your problem in several phases and save the > > > complete data-set / problem for THE RUN. > > > > > > You are not being a heretic there, but suffering energy costs ;-) > > > In more places that you may believe, useful computing nodes are being > > > replaced just because of energy costs. Even in some application domains > you > > > can even loose computational power if you move from 4 nodes into a > single > > > quad-core (i.e. memory bandwidth problems). I know it is very nice to be > > > able to do everything at home.. but maybe before dropping your studies > or > > > working overtime to pay the electricity bill, you might want to > reconsider > > > the fact of collapsing your phisical deploy into a single virtualized > > > cluster. (or just dispatch several threads/processes in a single > system). > > > If you collapse into a single system you have only 1 mainboard, one HDD, > one > > > power source, one processor (physically speaking), .... and you can > achieve > > > almost the performance of 4 systems in one, consuming the power of.... > well > > > maybe even less than a single one. I don't want to go into discussions > about > > > performance gain/loose due to the variation of the hardware > architecture. > > > Invest some bucks (if you haven't done that yet) in a good power source. > > > Efficiency of OEM unbranded power sources is realy pathetic. may be > 45-50% > > > efficiency, while a good power source might be 75-80% efficient. Use the > > > energy for computing, not for heating your house. > > > What I mean is that you could consider just collapsing a complete > "small" > > > cluster into single system. If your application is CPU-bound and not I/O > > > bound, VMware Server could be an option, as it is free software > > > (unfortunately not open, even tough some patches can be done on the > > > drivers). I think it is not possible to publish benchmarking data about > > > VMware, but I can tell you that in long timescales, the performance you > get > > > in the host OS is similar than the one of the guest OS. There are a lot > of > > > problems related to jitter, from crazy clocks to delays, but if your > > > application is not sensitive to that, then you are Ok. > > > Maybe this is not a solution, but you can provide more information > regarding > > > your problem before quitting... > > > > > > my 2 cents.... > > > > > > ariel > > > > > > Mark Kosmowski escribi?: > > > > > > > > > > At some point there a cost-benefit analysis needs to be performed. If > > > > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > > > > single-core land still and do not yet differentiate between CPU and > > > > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > > > > resources and would be better off buying new machines and physically > > > > transferring the RAM to and from them or running more jobs each > > > > distributed across fewer CPUs. Or saving on my electricity bill and > > > > powering down some nodes. > > > > > > > > As heretical as this last sounds, I'm tempted to throw in the towel on > > > > my PhD studies because I can no longer afford the power to run my > > > > three node cluster at home. Energy costs may end up being the straw > > > > that breaks this camel's back. > > > > > > > > Mark E. Kosmowski > > > > > > > > > > > > > > > > > > > > > From: "Jon Aquilina" > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > not sure if this applies to all kinds of senarios that clusters are > used > > > > > > > > > > > > in > > > > > > > > > > > > but isnt the more ram you have the better? > > > > > > > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > > > > > > > > > > > > > > > > Toon, > > > > > > > > > > > > Can you drop a line on how important RAM is for weather > forecasting in > > > > > > latest type of calculations you're performing? > > > > > > > > > > > > Thanks, > > > > > > Vincent > > > > > > > > > > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > > > > > > > Jim Lux wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Yep. And for good reason. Even a big DoD job is still tiny in > > > > > > > > > > > > > > > > > > > > > > > > > Nvidia's > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > scale of operations. We face this all the time with NASA work. > > > > > > > > Semiconductor manufacturers have no real reason to produce > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > special purpose > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > or customized versions of their products for space use, > because > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > they can > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sell all they can make to the consumer market. More than once, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I've had a > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > phone call along the lines of this: > > > > > > > > "Jim: I'm interested in your new ABC321 part." > > > > > > > > "Rep: Great. I'll just send the NDA over and we can talk about > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > it." > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "Jim: Great, you have my email and my fax # is..." > > > > > > > > "Rep: By the way, what sort of volume are you going to be > using?" > > > > > > > > "Jim: Oh, 10-12.." > > > > > > > > "Rep: thousand per week, excellent..." > > > > > > > > "Jim: No, a dozen pieces, total, lifetime buy, or at best > maybe > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > every > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > year." > > > > > > > > "Rep: Oh..." > > > > > > > > {Well, to be fair, it's not that bad, they don't hang up on > you.. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Since about a year, it's been clear to me that weather > forecasting > > > > > > > > > > > > > > > > > > > > > > > > > (i.e., > > > > > > > > > > > > > > > > > > > > > > > > > running a more or less sophisticated atmospheric model to > provide > > > > > > > > > > > > > > > > > > > > > > > > > weather > > > > > > > > > > > > > > > > > > > > > > > > > predictions) is going to be "mainstream" in the sense that every > > > > > > > > > > > > > > > > > > > > > > > > > business > > > > > > > > > > > > > > > > > > > > > > > > > that needs such forecasts for its operations can simply run them > > > > > > > > > > > > > > > > > > > > > > > > > in-house. > > > > > > > > > > > > > > > > > > > > > > > > > Case in point: I bought a $1100 HP box (the obvious target > group > > > > > > > > > > > > > > > > > > > > > > > > > being > > > > > > > > > > > > > > > > > > > > > > > > > teenage downloaders) which performs the HIRLAM limited area > model > > > > > > > > > > > > > > > > > > > > > > > > > *on the > > > > > > > > > > > > > > > > > > > > > > > > > grid that we used until October 2006* in December last year. > > > > > > > > > > > > > > It's about twice as slow as our then-operational 50-CPU Sun Fire > > > > > > > > > > > > > > > > > > > > > > > > > 15K. > > > > > > > > > > > > > > > > > > > > > > > > > I wonder what effect this will have on CPU developments ... > > > > > > > > > > > > > > -- > > > > > > > Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 > > > > > > > > > > > > > > > > > > > > > > > > > 214290 > > > > > > > > > > > > > > > > > > > > > > > > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > > > > > At home: http://moene.indiv.nluug.nl/~toon/ > > > > > > > Progress of GNU Fortran: > > > > > > > > > > > > > > > > > > > > > > > > > http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Beowulf mailing list, Beowulf at beowulf.org > > > > > > To change your subscription (digest mode or unsubscribe) visit > > > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Jonathan Aquilina > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Beowulf mailing list, Beowulf at beowulf.org > > > > To change your subscription (digest mode or unsubscribe) visit > > > > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman at scalableinformatics.com > web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com > phone: +1 734 786 8423 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > From prentice at ias.edu Wed Jul 2 08:22:54 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 02 Jul 2008 11:22:54 -0400 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> Message-ID: <486B9D4E.80405@ias.edu> Mark Hahn wrote: >>>> does it necessarily have to be a redhat clone. can it also be a debian >>>> based >>>> clone? >>> >>> but why? is there some concrete advantage to using Debian? >>> I've never understood why Debian users tend to be very True Believer, >>> or what it is that hooks them. >> >> And the Debian users can say the same thing about Red Hat users. Or SUSE > > very nice! an excellent parody of the True Believer response. > > but I ask again: what are the reasons one might prefer using debian? > really, I'm not criticizing it - I really would like to know why it > would matter whether someone (such as ClusterVisionOS (tm)) would use > debian or another distro. > >From my interactions with others re: Debian, it's usually about true opensourceness, since Debian claims that every package distributed by them is GPLed, or some how meets some open source legal criteria. Also, I don't think there's any plan for Debian to go corporate, release and enterprise version, and effectively bite the had that feeds it, like Red Hat and SUSE did. Those are not technical issues, but philosophical/legal/political issues. Me? I use RH and it's derivatives for a couple of reasons. Here they are in historical order: 1. When I started learning Linux on my own, all the Linux authorities (websites, LJ, etc) recommended RH b/c RPM made it easy to install software, and if you bought a boxed version, you got the Metro-X X-server, which supported much more video hardware than XFree86 did at the time, and had an easy to use GUI to configure X. 2. Now that I'm a professional system admin who often has to support commercial apps, I find I have to use a RH-based distro for two reasons: A. Most commercial software "supports" only Red Hat. Some go so far as to refuse to install if RH is not detected. The most extreme case of this is EMC PowerPath, whose kernel modules won't install if it's not a RH (or SUSE) kernel. B. Red Hat has done such a good job of spreading FUD about the other Linux distros, management has a cow if you tell them you're installing something other than RH. This is why I consider Red Hat the Microsoft of Linux. None of those are technical issues, either. Since the term "Linux" applies to the kernel only in the strictest sense, there should be no technical reasons to choose one distro over another. Issues like nice GUI management tools are human issues not technical issues. -- Prentice From perry at piermont.com Wed Jul 2 08:23:27 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 11:23:27 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 11\:12\:05 -0400 $EDT$") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: <87hcb8b98w.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: > On Wed, 2 Jul 2008, Carsten Aulbert wrote: >> Which corresponds exactly to the maximum achievable mounts of 358 right >> now. Besides, I'm far from being an expert on TCP/IP, but is it possible >> for a local process to bind to a port which is already in use but to >> another host? I don't think so, but may be wrong. > > AFAIK, no they don't. The way TCP daemons that listen on a > well-known/privileged port work is that they accept a connection on that > port, then fork a connection on a higher unprivileged (>1023) port on > both ends so that the daemon can listen once again. Try netstat on a heavily loaded SMTP box. You'll see all these connections from some random foreign port to port 25 locally -- lots of connections to port 25 at the same time. You don't switch to a different port number after the connection comes in, you stay on it. You can in theory talk to up to (nearly) 2^48 different foreign host/port combos off of local port 25, because every remote host/remote port pair makes for a different 4-tuple. > Many daemons have a limit that can be set on the number of > simultaneous connections they can manage. That's a resource issue, not a TCP architecture issue per se. You might not have enough memory, CPU, etc. to handle more than a certain number of connections. By the way, you can now design daemons to handle tens of thousands of simultaneous connections with clean event driven design on a modern multiprocessor with plenty of memory. This is way off topic, though. > However, this is for TCP ports that maintain a persistent connection. > UDP ports are "connectionless" and hence somewhat different. I'm assuming they're doing NFS over TCP. If they're using UDP, things are somewhat different because of the existence of "connectionless" UDP. However, they *should* use TCP for performance. (I know people used to claim the opposite, but it turns out you really want TCP so you get proper congestion control.) Perry -- Perry E. Metzger perry at piermont.com From peter.st.john at gmail.com Wed Jul 2 08:25:19 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Wed, 2 Jul 2008 11:25:19 -0400 Subject: [Beowulf] Re: energy costs and poor grad students In-Reply-To: References: Message-ID: Mark, Would it be feasible to downclock your three nodes? All you physicists know better than I, that the power draw and heat production are not linear in GHz. A 1 GHz processor is less than half the cost per tick than a 2GHz, so if power budget is more urgent for you than time to completion then that might help; continue running all of your nodes, but slower. But I've never done this myself. OTOH as a mathematician I don't have to :-) See http://xkcd.com/435/ ("Purity") Peter On 7/2/08, Mark Kosmowski wrote: > > I'm in the US. I'm almost, but not quite ready for production runs - > still learning the software / computational theory. I'm the first > person in the research group (physical chemistry) to try to learn > plane wave methods of solid state calculation as opposed to isolated > atom-centered approximations and periodic atom centered calculations. > > It is turning out that the package I have spent the most time learning > is perhaps not the best one for what we are doing. For a variety of > reasons, many of which more off-topic than tac nukes and energy > efficient washing machines ;) , I'm doing my studies part-time while > working full-time in industry. > > I think I have come to a compromise that can keep me in business. > Until I have a better understanding of the software and am ready for > production runs, I'll stick to a small system that can be run on one > node and leave the other two powered down. I've also applied for an > adjunt instructor position at a local college for some extra cash and > good experience. When I'm ready for production runs I can either just > bite the bullet and pay the electricity bill or seek computer time > elsewhere. > > Thanks for the encouragement, > > Mark E. Kosmowski > > On 7/1/08, ariel sabiguero yawelak wrote: > > Well Mark, don't give up! > > I am not sure which one is your application domain, but if you require > 24x7 > > computation, then you should not be hosting that at home. > > On the other hand, if you are not doing real computation and you just > have a > > testbed at home, maybe for debugging your parallel applications or > something > > similar, you might be interested in a virtualized solution. Several years > > ago, I used to "debug" some neural networks at home, but training > sessions > > (up to two weeks of training) happened at the university. > > I would suggest to do something like that. > > You can always scale-down your problem in several phases and save the > > complete data-set / problem for THE RUN. > > > > You are not being a heretic there, but suffering energy costs ;-) > > In more places that you may believe, useful computing nodes are being > > replaced just because of energy costs. Even in some application domains > you > > can even loose computational power if you move from 4 nodes into a single > > quad-core (i.e. memory bandwidth problems). I know it is very nice to be > > able to do everything at home.. but maybe before dropping your studies or > > working overtime to pay the electricity bill, you might want to > reconsider > > the fact of collapsing your phisical deploy into a single virtualized > > cluster. (or just dispatch several threads/processes in a single system). > > If you collapse into a single system you have only 1 mainboard, one HDD, > one > > power source, one processor (physically speaking), .... and you can > achieve > > almost the performance of 4 systems in one, consuming the power of.... > well > > maybe even less than a single one. I don't want to go into discussions > about > > performance gain/loose due to the variation of the hardware architecture. > > Invest some bucks (if you haven't done that yet) in a good power source. > > Efficiency of OEM unbranded power sources is realy pathetic. may be > 45-50% > > efficiency, while a good power source might be 75-80% efficient. Use the > > energy for computing, not for heating your house. > > What I mean is that you could consider just collapsing a complete "small" > > cluster into single system. If your application is CPU-bound and not I/O > > bound, VMware Server could be an option, as it is free software > > (unfortunately not open, even tough some patches can be done on the > > drivers). I think it is not possible to publish benchmarking data about > > VMware, but I can tell you that in long timescales, the performance you > get > > in the host OS is similar than the one of the guest OS. There are a lot > of > > problems related to jitter, from crazy clocks to delays, but if your > > application is not sensitive to that, then you are Ok. > > Maybe this is not a solution, but you can provide more information > regarding > > your problem before quitting... > > > > my 2 cents.... > > > > ariel > > > > Mark Kosmowski escribi?: > > > > > At some point there a cost-benefit analysis needs to be performed. If > > > my cluster at peak usage only uses 4 Gb RAM per CPU (I live in > > > single-core land still and do not yet differentiate between CPU and > > > core) and my nodes all have 16 Gb per CPU then I am wasting RAM > > > resources and would be better off buying new machines and physically > > > transferring the RAM to and from them or running more jobs each > > > distributed across fewer CPUs. Or saving on my electricity bill and > > > powering down some nodes. > > > > > > As heretical as this last sounds, I'm tempted to throw in the towel on > > > my PhD studies because I can no longer afford the power to run my > > > three node cluster at home. Energy costs may end up being the straw > > > that breaks this camel's back. > > > > > > Mark E. Kosmowski > > > > > > > > > > > > > From: "Jon Aquilina" > > > > > > > > > > > > > > > > > > > > > not sure if this applies to all kinds of senarios that clusters are > used > > in > > > > but isnt the more ram you have the better? > > > > > > > > On 6/30/08, Vincent Diepeveen wrote: > > > > > > > > > > > > > Toon, > > > > > > > > > > Can you drop a line on how important RAM is for weather forecasting > in > > > > > latest type of calculations you're performing? > > > > > > > > > > Thanks, > > > > > Vincent > > > > > > > > > > > > > > > On Jun 30, 2008, at 8:20 PM, Toon Moene wrote: > > > > > > > > > > Jim Lux wrote: > > > > > > > > > > > > > > > > Yep. And for good reason. Even a big DoD job is still tiny in > > Nvidia's > > > > > > > > > > > > > > > > > > > scale of operations. We face this all the time with NASA work. > > > > > > > Semiconductor manufacturers have no real reason to produce > > special purpose > > > > > > > or customized versions of their products for space use, because > > they can > > > > > > > sell all they can make to the consumer market. More than once, > > I've had a > > > > > > > phone call along the lines of this: > > > > > > > "Jim: I'm interested in your new ABC321 part." > > > > > > > "Rep: Great. I'll just send the NDA over and we can talk about > > it." > > > > > > > "Jim: Great, you have my email and my fax # is..." > > > > > > > "Rep: By the way, what sort of volume are you going to be > using?" > > > > > > > "Jim: Oh, 10-12.." > > > > > > > "Rep: thousand per week, excellent..." > > > > > > > "Jim: No, a dozen pieces, total, lifetime buy, or at best maybe > > every > > > > > > > year." > > > > > > > "Rep: Oh..." > > > > > > > {Well, to be fair, it's not that bad, they don't hang up on > you.. > > > > > > > > > > > > > > > > > > > > > > > > > > > Since about a year, it's been clear to me that weather > forecasting > > (i.e., > > > > > > running a more or less sophisticated atmospheric model to provide > > weather > > > > > > predictions) is going to be "mainstream" in the sense that every > > business > > > > > > that needs such forecasts for its operations can simply run them > > in-house. > > > > > > > > > > > > Case in point: I bought a $1100 HP box (the obvious target group > > being > > > > > > teenage downloaders) which performs the HIRLAM limited area model > > *on the > > > > > > grid that we used until October 2006* in December last year. > > > > > > > > > > > > It's about twice as slow as our then-operational 50-CPU Sun Fire > > 15K. > > > > > > > > > > > > I wonder what effect this will have on CPU developments ... > > > > > > > > > > > > -- > > > > > > Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 > > 214290 > > > > > > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > > > > > > At home: http://moene.indiv.nluug.nl/~toon/ > > > > > > Progress of GNU Fortran: > > http://gcc.gnu.org/ml/gcc/2008-01/msg00009.html > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Beowulf mailing list, Beowulf at beowulf.org > > > > > To change your subscription (digest mode or unsubscribe) visit > > > > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Jonathan Aquilina > > > > > > > > > > > _______________________________________________ > > > Beowulf mailing list, Beowulf at beowulf.org > > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From prentice at ias.edu Wed Jul 2 08:28:53 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 02 Jul 2008 11:28:53 -0400 Subject: [Beowulf] A press release In-Reply-To: References: <4863E551.8090802@scalableinformatics.com> <0D49B15ACFDF2F46BF90B6E08C90048A04884918AC@quadbrsex1.quadrics.com> <486A6760.5010006@ias.edu> <486A6D59.7020704@scalableinformatics.com> Message-ID: <486B9EB5.6020906@ias.edu> Mark Hahn wrote: >> Hmmm.... for me, its all about the kernel. Thats 90+% of the battle. >> Some distros use good kernels, some do not. I won't mention who I >> think is in the latter category. > > I was hoping for some discussion of concrete issues. for instance, > I have the impression debian uses something other than sysvinit - does > that work out well? is it a problem getting commercial packages > (pathscale/pgi/intel compilers, gaussian, etc) to run? > > the couple debian people I know tend to have more ideological motives > (which I do NOT impugn, except that I am personally more swayed by > practical, concrete reasons.) I agree. I follow the same pragmatic rational paradigm. -- Prentice From atchley at myri.com Wed Jul 2 08:32:54 2008 From: atchley at myri.com (Scott Atchley) Date: Wed, 2 Jul 2008 11:32:54 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: <486B8C1E.2090007@tamu.edu> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <486B6501.5000108@aei.mpg.de> <04BB8220-B185-42A2-8E34-DA61066B6D51@myri.com> <486B8C1E.2090007@tamu.edu> Message-ID: On Jul 2, 2008, at 10:09 AM, Gerry Creager wrote: >> Although I believe Lustre's robustness is very good these days, I >> do not believe that it will not work in your setting. I think that >> they currently do not recommend mounting a client on a node that is >> also working as a server as you are doing with NFS. I believe it is >> due to memory contention leading to deadlock. > > Lustre is good enough that it's the parallel FS at TACC for the > Ranger cluster. And, I've had no real problems as a user thereof. > We're brining up glustre on our new cluster here ( > CentOS/RHEL5, not debian ). We looked at zfs but didn't > have sufficient experience to go that path. I believe that all the large DOE labs are using Lustre and would not if it were not reliable. My only concern was Carsten not having dedicated server nodes and mounting directly on those nodes. I may be off-base and hopefully one of the Lustre/SUN people might correct me if so. :-) Scott From rgb at phy.duke.edu Wed Jul 2 08:46:51 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 2 Jul 2008 11:46:51 -0400 (EDT) Subject: [Beowulf] automount on high ports In-Reply-To: <87hcb8b98w.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > You don't switch to a different port number after the connection comes > in, you stay on it. You can in theory talk to up to (nearly) 2^48 > different foreign host/port combos off of local port 25, because every > remote host/remote port pair makes for a different 4-tuple. Ah. I should have known that. >> Many daemons have a limit that can be set on the number of >> simultaneous connections they can manage. > > That's a resource issue, not a TCP architecture issue per se. You > might not have enough memory, CPU, etc. to handle more than a certain > number of connections. > > By the way, you can now design daemons to handle tens of thousands of > simultaneous connections with clean event driven design on a modern > multiprocessor with plenty of memory. This is way off topic, though. Not on a cluster list. Networking in a very real sense IS the topic. I've written forking daemons (which is why I should have known, or remembered, about the four-tuple thing:-) because they are an essential component of IPCs in a network-based cluster or cluster distributed apps. Even though PVM and MPI make it easy to write portable code (and may well provide you with better performance than you can easily get on your own) there may well be occasions for cluster software writers to need to write their own networking, in band or out of band. >> However, this is for TCP ports that maintain a persistent connection. >> UDP ports are "connectionless" and hence somewhat different. > > I'm assuming they're doing NFS over TCP. If they're using UDP, things > are somewhat different because of the existence of "connectionless" > UDP. However, they *should* use TCP for performance. (I know people > used to claim the opposite, but it turns out you really want TCP so > you get proper congestion control.) Yah. To make UDP reliable, you have to load it down with most of the stuff in TCP anyway; it isn't clear that it was ever a great choice. IIRC PVM was originally built on UDP for similar reasons, but I think -- am not sure but think -- it is TCP today because it wasn't worth the hassle. I'm too lazy to crank up a PVM app to find out, though...;-) rgb -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Jul 2 08:53:31 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Wed, 2 Jul 2008 17:53:31 +0200 (CEST) Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> Message-ID: On Wed, 2 Jul 2008, Robert G. Brown wrote: > The way TCP daemons that listen on a well-known/privileged port work > is that they accept a connection on that port, then fork a > connection on a higher unprivileged (>1023) port on both ends so > that the daemon can listen once again. 'man 7 socket' and look up SO_REUSEADDR. I don't quite know what you mean by 'forking a connection'; when the daemon encounters a fork() all open file descriptors (including sockets) are being kept in both the parent and the child. The child (usually the part of the daemon that processes the content that comes on that connection) gets the same 4-tuple as the parent. The parent closes its file handle so that only the child is then active on that connection. > You can see this by running e.g. netstat -a. I seriously doubt that you have seen such a behaviour. Empirical evidence which might pass easier than theoretical one: on the e-mail server that I admin, there is an iptable rule to only allow incoming connections to port 25 - if connections would suddenly be migrated to different ports they would be blocked and I would not receive any e-mails from this list. But I do, especially during the past few days... (not that I complain :-)) -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8869/8240, Fax: +49 6221 54 8868/8850 E-mail: bogdan.costescu at iwr.uni-heidelberg.de From perry at piermont.com Wed Jul 2 09:33:06 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 12:33:06 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: (Robert G. Brown's message of "Wed\, 2 Jul 2008 11\:46\:51 -0400 $EDT$") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> Message-ID: <87vdzo9rgd.fsf@snark.cb.piermont.com> "Robert G. Brown" writes: > On Wed, 2 Jul 2008, Perry E. Metzger wrote: >> By the way, you can now design daemons to handle tens of thousands of >> simultaneous connections with clean event driven design on a modern >> multiprocessor with plenty of memory. This is way off topic, though. > > Not on a cluster list. Well, it actually kind of is. Typically, a box in an HPC cluster is running stuff that's compute bound and who's primary job isn't serving vast numbers of teeny high latency requests. That's much more what a web server does. However... > I've written forking daemons (which is why I should have known, or > remembered, about the four-tuple thing:-) because they are an essential > component of IPCs in a network-based cluster or cluster distributed > apps. One is best off *not* forking, actually. There's a good site on concurrency management for high performance servers. It is a bit old now but covers the topic well: http://www.kegel.com/c10k.html Myself, I'm a believer in event driven code. One thread, one core. All other concurrency management should be handled by events, not by multiple threads. Thread context switching is very very expensive, and threads are very expensive. Doing event driven programming wins overwhelmingly in such contexts. It is hard to impossible, on a modern machine, to handle tens of thousands of connections with forking or threads, but it is easy with events. I'm a fan of Niels Provos' "libevent" for such purposes. There are a lot of other libraries that plug in to it well, too. -- Perry E. Metzger perry at piermont.com From perry at piermont.com Wed Jul 2 09:37:55 2008 From: perry at piermont.com (Perry E. Metzger) Date: Wed, 02 Jul 2008 12:37:55 -0400 Subject: [Beowulf] automount on high ports In-Reply-To: (Bogdan Costescu's message of "Wed\, 2 Jul 2008 17\:53\:31 +0200 $CEST$") References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de>

Message-ID: <87iqvo9r8c.fsf@snark.cb.piermont.com> Bogdan Costescu writes: > 'man 7 socket' and look up SO_REUSEADDR. Incidently, I believe this may be part of the problem for the NFS client code in Linux. -- Perry E. Metzger perry at piermont.com From rgb at phy.duke.edu Wed Jul 2 10:54:37 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 2 Jul 2008 13:54:37 -0400 (EDT) Subject: [Beowulf] automount on high ports In-Reply-To: References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de>

Message-ID: On Wed, 2 Jul 2008, Bogdan Costescu wrote: > On Wed, 2 Jul 2008, Robert G. Brown wrote: > >> The way TCP daemons that listen on a well-known/privileged port work is >> that they accept a connection on that port, then fork a connection on a >> higher unprivileged (>1023) port on both ends so that the daemon can listen >> once again. > > 'man 7 socket' and look up SO_REUSEADDR. I don't quite know what you mean by > 'forking a connection'; when the daemon encounters a fork() all open file > descriptors (including sockets) are being kept in both the parent and the > child. The child (usually the part of the daemon that processes the content > that comes on that connection) gets the same 4-tuple as the parent. The > parent closes its file handle so that only the child is then active on that > connection. I'm stating it badly and incorrectly, confusing port with socket. See the following code. Server listens, bound to a specific port. When a connection is initiated by a (possibly remote) client, it accepts it (creating a socket with its own FD), leaving the original server socket FD unaffected. It then forks and the child CLOSES the original socket lest there be trouble. The server/parent similarly closes the client fd. The client typically got a "random" (kernel chosen) port on ITS side from the list of available unprotected ports when it formed its original socket, and it forms one side of the stream connection, with the server "accept" socket being the other. What I was trying to convey remarkably poorly is that once you've created a daemon and bound it to a port, if you try to start up a second daemon on that port you'll get a EADDRINUSE on the bind (and fail the loop that checks below), and so if you DON'T fork off the sockets with listen/accept you'll usually block the port indefinitely while handling each connection. I haven't tried (at least, not deliberately:-) not going through the asymmetric close so that the two processes both have all the FDs, but I'd guess bad things would happen if I did, a crap shoot race condition as to which process gets the data or worse. OTOH, some applications (esp nfsd and httpd) DO fork several child processes with the original open socket fd so that if incoming requests for a connection come while one of them is "busy" with the creation of a child of its own to handle the connection, another will pick it up round robin. Unless I'm misunderstanding how they work this or why. mail is even more interesting, as imapd has to stick around to manage each persistent imap connection, so an imapd server has umpty zillion instances of imapd. I don't know exactly what smtp daemons do -- postfix or sendmail. Anyway, some generic forking daemon code, adopted IIRC from Stevens originally and hacked around some to avoid TIME_WAIT and so on: server_fd = socket(AF_INET,SOCK_STREAM,0); if (server_fd < 0){ fprintf(stderr,"socket: %.100s", strerror(errno)); exit(1); } /* * Set socket options. We try to make the port reusable and have it * close as fast as possible without waiting in unnecessary wait states * on close. */ setsockopt(server_fd, SOL_SOCKET, SO_REUSEADDR, (void *)&on, sizeof(on)); linger.l_onoff = 1; /* Linger for just a bit */ linger.l_linger = 0; /* do NOT linger -- exit and discard data. */ setsockopt(server_fd, SOL_SOCKET, SO_LINGER, (void *)&linger, sizeof(linger)); serverlen = sizeof(serverINETaddress); bzero( (char*) &serverINETaddress,serverlen); /* clear structure */ serverINETaddress.sin_family = AF_INET; /* Internet domain */ serverINETaddress.sin_addr.s_addr = htonl(INADDR_ANY); /* Accept all */ serverINETaddress.sin_port = htons(port); /* Server port number */ serverSockAddrPtr = (struct sockaddr*) &serverINETaddress; /* * Bind the socket to the desired port. Try up to six times (30sec) IF the * port is in use */ retries = 6; errno = 0; /* To zero any possible garbage value */ while(retries--){ if(bind(server_fd,serverSockAddrPtr,serverlen) < 0) { if(errno != EADDRINUSE){ close(server_fd); fprintf(stderr,"bind: %.100s\n", strerror(errno)); fprintf(stderr,"socket bind to port %d failed: %d.\n", port,errno); exit(255); } } else break; /* printf("Got no port: %s\n",strerror(errno)); */ sleep(5); } if(errno){ if(errno == EADDRINUSE){ fprintf(stderr,"Timeout (tried to bind six times five seconds apart)\n"); } close(server_fd); fprintf(stderr,"bind to port %d failed: %.100s\n",port,strerror(errno)); exit(0); } /* * Socket exists. Service it. Queue up to n_connxns incoming connections * or die. Default 10 matches the limits in the default xinetd. */ if(listen(server_fd,nconnxns) < 0){ fprintf(stderr,"listen: %.100s", strerror(errno)); exit(255); } /* Arrange SIGCHLD to be caught. */ signal(SIGCHLD, sigchld_handler); /* * Initialize client structures. */ clientlen = sizeof(clientINETaddress); clientSockAddrPtr = (struct sockaddr*) &clientINETaddress; /* * Loop "forever", or until daemon crashes or is killed with a signal. */ while(1){ /* Accept a client connection */ if((verbose == D_ALL) || (verbose == D_DAEMON)){ printf("D_DAEMON: Accepting Client connection...\n"); } /* * Wait in select until there is a connection. Presumably this is * more efficient than just blocking on the accept */ FD_ZERO(&fdset); FD_SET(server_fd, &fdset); ret = select(server_fd + 1, &fdset, NULL, NULL, NULL); if (ret < 0 || !FD_ISSET(server_fd, &fdset)) { if (errno == EINTR) continue; fprintf(stderr,"select: %.100s", strerror(errno)); continue; } /* * A call is waiting. Accept it. */ client_fd = accept(server_fd,clientSockAddrPtr,&clientlen); if (client_fd < 0){ if (errno == EINTR) continue; fprintf(stderr,"accept: %.100s", strerror(errno)); continue; } if((verbose == D_ALL) || (verbose == D_DAEMON)){ printf("D_DAEMON: ...client connection made.\n"); } /* * IF I GET HERE... * ...I'm a real daemon. I therefore fork and have the child process * the connection. The parent continues listening and can service * multiple connections in parallel. */ /* * CHILD. Close the listening (server) socket, and start using the * accepted (client) socket. We break out of the (infinite) loop to * handle the connection. */ if ((pid = fork()) == 0){ close(server_fd); break; } /* * PARENT. Stay in the loop. Close the client socket (it's the child's) * but leave the server socket open. */ if (pid < 0) fprintf(stderr,"fork: %.100s", strerror(errno)); else if((verbose == D_ALL) || (verbose == D_DAEMON)){ printf("D_DAEMON: Forked child %d to handle socket %d.\n", pid,client_fd); } close(client_fd); } /* No need to wait for children -- I'm the child */ signal(SIGCHLD, SIG_DFL); /* Dissociate from calling process group and control terminal */ setsid(); > >> You can see this by running e.g. netstat -a. > > I seriously doubt that you have seen such a behaviour. Empirical evidence > which might pass easier than theoretical one: on the e-mail server that I > admin, there is an iptable rule to only allow incoming connections to port 25 > - if connections would suddenly be migrated to different ports they would be > blocked and I would not receive any e-mails from this list. But I do, > especially during the past few days... (not that I complain :-)) > > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From rgb at phy.duke.edu Wed Jul 2 11:03:31 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 2 Jul 2008 14:03:31 -0400 (EDT) Subject: [Beowulf] automount on high ports In-Reply-To: <87vdzo9rgd.fsf@snark.cb.piermont.com> References: <20080701093643.GA17845@gretchen.aei.uni-hannover.de> <878wwlk65p.fsf@snark.cb.piermont.com> <20080701164747.GA15901@gretchen.aei.uni-hannover.de> <87fxqtuzh8.fsf@snark.cb.piermont.com> <486B2DC2.9010604@aei.mpg.de> <87d4lweagv.fsf@snark.cb.piermont.com> <486B7AB9.9050202@aei.mpg.de> <87hcb8b98w.fsf@snark.cb.piermont.com> <87vdzo9rgd.fsf@snark.cb.piermont.com> Message-ID: On Wed, 2 Jul 2008, Perry E. Metzger wrote: > > "Robert G. Brown"