From deadline at eadline.org Mon Dec 1 10:31:29 2008 From: deadline at eadline.org (Douglas Eadline) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Free Webinar: Cool Crunching: Understanding Green HPC In-Reply-To: <571f1a060811292333q4aa31ab6x5b7995baa4145445@mail.gmail.com> References: <571f1a060811292333q4aa31ab6x5b7995baa4145445@mail.gmail.com> Message-ID: <34891.192.168.1.213.1228156289.squirrel@mail.eadline.org> I'm moderating a webinar called: Cool Crunching: Understanding Green HPC on Wednesday (Dec 3) at 11AM EST. More info and registration: http://linux-mag.com/id/7172 -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Mon Dec 1 10:58:33 2008 From: deadline at eadline.org (Douglas Eadline) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] cli alternative to cluster top? In-Reply-To: <4932C58C.6020706@scalableinformatics.com> References: <4932C58C.6020706@scalableinformatics.com> Message-ID: <38215.192.168.1.213.1228157913.squirrel@mail.eadline.org> maybe Joe means this: http://www.basement-supercomputing.com/content/view/19/45/ which is being updated for SGE 6.* I basically hacked it together because I could not easily understand qstat. Source is available, be advised code is ugly as it started out as hack while I was debugging SGE parallel environments and test suites. I should have a new version "real soon now" there are also some web based tools that do this for SGE as well (http://xml-qstat.org/) -- Doug > Thomas Vixel wrote: >> I've been googling for a top-like cli tool to use on our cluster, but >> the closest thing that comes up is Rocks' "cluster top" script. That >> could be tweaked to work via the cli, but due to factors beyond my >> control (management) all functionality has to come from a pre-fab >> program rather than a software stack with local, custom modifications. >> >> I'm sure this has come up more than once in the HPC sector as well -- >> could anyone point me to any top-like apps for our cluster? > > We have a ctop we have written a while ago. Depends upon pdsh, though > with a little effort, even that could be removed (albeit being a > somewhat slower program as a result). Our version is Perl based, open > source, and quite a few of our customers do use it. I had looked at > hooking it into wulfstat at some point. > > Doug Eadline has a top he had written (is that correct Doug?) for > clusters some time ago. > >> >> For reference, wulfware/wulfstat was nixed as well because of the >> xmlsysd dependency. > > Sometimes I wonder about the 'logic' underpinning some of the decisions > I hear about. > > ctop could work with plain ssh, though you will need to make sure that > all nodes are able to be reached via passwordless ssh (shouldn't be an > issue for most of todays clusters), and you will need some mechanism to > tell ctop which nodes you wish to include in the list. We have used > /etc/cluster/hosts.cluster in the past to list hostnames/ip addresses of > the cluster nodes. > > Let me know if you have pdsh implemented. BTW: ctop is OSS (GPLv2). > It should be available on our download site as an RPM/source RPM > (http://downloads.scalableinformatics.com). If there is enough interest > in it, I'll put it into our public Mercurial repository as well. > > > > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman@scalableinformatics.com > web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From deadline at eadline.org Mon Dec 1 11:02:28 2008 From: deadline at eadline.org (Douglas Eadline) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] cli alternative to cluster top? In-Reply-To: <49330EC5.50700@scalableinformatics.com> References: <571f1a060811292333q4aa31ab6x5b7995baa4145445@mail.gmail.com> <20081130182839.GA17239@bx9> <49330EC5.50700@scalableinformatics.com> Message-ID: <54979.192.168.1.213.1228158148.squirrel@mail.eadline.org> > Robert G. Brown wrote: >> On Sun, 30 Nov 2008, Greg Lindahl wrote: >> >>> >>> On Sun, Nov 30, 2008 at 11:45:44AM -0500, Robert G. Brown wrote: >>> >>>> That's fine, but I'm curious. How do you expect to run a cluster >>>> information tool over a network without a socket at both ends? >>> >>> There's always "qstat". The OP didn't really say what sorts of >>> information he was looking for... >> >> :-) Hey, didn't think of that -- an enormous Quake cluster? >> >> Although I didn't realize that qstat worked by electronic telepathy;-) > > to bad we can't use EPR pairs for this ... Well maybe not in this universe ... -- Doug > > -- > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman@scalableinformatics.com > web : http://www.scalableinformatics.com > http://jackrabbit.scalableinformatics.com > phone: +1 734 786 8423 x121 > fax : +1 866 888 3112 > cell : +1 734 612 4615 > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From rgb at phy.duke.edu Mon Dec 1 15:33:40 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] cli alternative to cluster top? In-Reply-To: References: <571f1a060811292333q4aa31ab6x5b7995baa4145445@mail.gmail.com> Message-ID: On Mon, 1 Dec 2008, Thomas Vixel wrote: > The main requirements were that 1) "it must look like top", 2) it must > be cli-based, 3) it should not introduce another piece of (server) > software (and thus point of failure) into the system, 4) and it should > not require any local hacks on our part. > > Since this is a web cluster, I suppose the most logical transport > would be HTTP, but everything I've seen so far would require me to > violate (4) to satisfy 1-3. SSH & SNMP are also available, I could see > SNMP being problematic for a project like this. SSH *might* work and > *might* be scalable for a project like this since the main expense is > in the construction the connections (and they can be re-used across > the course of the execution), but I have yet to find a top-like > program that leverages it. > > xmlsysd & wulfstat would appear to be the least expensive solution to > this problem, but as I said, management already nixed it. Um, I don't know what you call a "local hack", but it IS called xmlsysd for a reason. You don't have to run it as an actual public daemon. It produces xml, and in inetd mode it reads from stdin and writes to stdout. Try this. Install xmlsysd on something. Then in the command line, enter: ./xmlsysd -i 7880 (the port number isn't important -- you're just telling it to use inetd mode). It will run and nothing will happen. Then WITHOUT BREAKING OUT just enter init send See? I'd think you could write a TRIVIAL PHP or perl wrapper and pop this out through a webserver. The other end is a bit trickier, but I think doable. wulfstat/wulflogger don't currently speak GET, but the parser should still work. If you only wanted a few objects (e.g. load average), you could again write a pretty trivial perl script to get them. Or if you don't need it today (or the next week or two) over break I could probably hack wulfstat OR wulflogger to connect to a web interface and just use GET to get an update, and maybe write a "permanent" CGI wrapper that puts xmlsysd output on a web address when called (in lieu of "send"). rgb > > Honestly, if it weren't for (4), I'd probably just grab the top source > and graft it onto a SSH library. It might not be the most efficient > solution, but for a HA cluster it doesn't necessarily HAVE to be. I'd think the solution above would be a lot easier. > > On 11/30/08, Robert G. Brown wrote: >> On Sat, 29 Nov 2008, Greg Kurtzer wrote: >> >>> Warewulf has a real time top like command for the cluster nodes and >>> has been known to scale up to the thousands of nodes: >>> >>> http://www.runlevelzero.net/images/wwtop-screenshot.png >>> >>> We are just kicking off Warewulf development again now that Perceus >>> has gotten to a critical mass and Caos NSA 1.0 has been released. We >>> should have our repositories for Warewulf-3 pre-releases up shortly >>> but if you need something ASAP, please contact me offline and I will >>> get you what you need. >>> >>> Thanks! >>> Greg >>> >>> On Wed, Nov 26, 2008 at 12:39 PM, Thomas Vixel wrote: >>>> I've been googling for a top-like cli tool to use on our cluster, but >>>> the closest thing that comes up is Rocks' "cluster top" script. That >>>> could be tweaked to work via the cli, but due to factors beyond my >>>> control (management) all functionality has to come from a pre-fab >>>> program rather than a software stack with local, custom modifications. >>>> >>>> I'm sure this has come up more than once in the HPC sector as well -- >>>> could anyone point me to any top-like apps for our cluster? >>>> >>>> For reference, wulfware/wulfstat was nixed as well because of the >>>> xmlsysd dependency. >> >> That's fine, but I'm curious. How do you expect to run a cluster >> information tool over a network without a socket at both ends? If not >> xmlsysd, then something else -- sshd, xinetd, dedicated or general >> purpose, where the latter almost certainly will have have higher >> overhead? Or are you looking for something with a kernel level network >> interface, more like scyld? >> >> rgb >> >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf@beowulf.org >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>> >>> >>> >>> >>> -- >>> Greg Kurtzer >>> http://www.infiscale.com/ >>> http://www.runlevelzero.net/ >>> http://www.perceus.org/ >>> http://www.caoslinux.org/ >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> >> Robert G. Brown Phone(cell): 1-919-280-8443 >> Duke University Physics Dept, Box 90305 >> Durham, N.C. 27708-0305 >> Web: http://www.phy.duke.edu/~rgb >> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php >> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 >> > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From prentice at ias.edu Tue Dec 2 07:24:15 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] InfiniBand VL15 error Message-ID: <4935531F.6040807@ias.edu> I'm getting this error when I run ibchecknet on my cluster: #warn: counter VL15Dropped = 476 (threshold 100) lid 1 port 1 Error check on lid 1 (aurora HCA-1) port 1: FAILED I've googled around this morning, but haven't found anything helpful. Most of the hits turn up code with the phrase "VL15Dropped", but nothing explaining what this error means, what causes it, or how to fix it. After clearing the counters with 'perfquery -r', the VL15Dropped count starts increasing from zero almost immediately. Any ideas what this error represents or how to fix? Could it be a bad cable? -- Prentice From Shainer at mellanox.com Tue Dec 2 11:01:49 2008 From: Shainer at mellanox.com (Gilad Shainer) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] InfiniBand VL15 error Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F017A3A81@mtiexch01.mti.com> I can try to help you here, and would need to understand your setup and on which port the drop is occurring on. Bad cable causing this seems very unlikely. Gilad. -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Prentice Bisbal Sent: Tuesday, December 02, 2008 7:24 AM To: Beowulf Mailing List Subject: [Beowulf] InfiniBand VL15 error I'm getting this error when I run ibchecknet on my cluster: #warn: counter VL15Dropped = 476 (threshold 100) lid 1 port 1 Error check on lid 1 (aurora HCA-1) port 1: FAILED I've googled around this morning, but haven't found anything helpful. Most of the hits turn up code with the phrase "VL15Dropped", but nothing explaining what this error means, what causes it, or how to fix it. After clearing the counters with 'perfquery -r', the VL15Dropped count starts increasing from zero almost immediately. Any ideas what this error represents or how to fix? Could it be a bad cable? -- Prentice _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From tvixel at gmail.com Mon Dec 1 13:57:31 2008 From: tvixel at gmail.com (Thomas Vixel) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] cli alternative to cluster top? In-Reply-To: References: <571f1a060811292333q4aa31ab6x5b7995baa4145445@mail.gmail.com> Message-ID: The main requirements were that 1) "it must look like top", 2) it must be cli-based, 3) it should not introduce another piece of (server) software (and thus point of failure) into the system, 4) and it should not require any local hacks on our part. Since this is a web cluster, I suppose the most logical transport would be HTTP, but everything I've seen so far would require me to violate (4) to satisfy 1-3. SSH & SNMP are also available, I could see SNMP being problematic for a project like this. SSH *might* work and *might* be scalable for a project like this since the main expense is in the construction the connections (and they can be re-used across the course of the execution), but I have yet to find a top-like program that leverages it. xmlsysd & wulfstat would appear to be the least expensive solution to this problem, but as I said, management already nixed it. Honestly, if it weren't for (4), I'd probably just grab the top source and graft it onto a SSH library. It might not be the most efficient solution, but for a HA cluster it doesn't necessarily HAVE to be. On 11/30/08, Robert G. Brown wrote: > On Sat, 29 Nov 2008, Greg Kurtzer wrote: > >> Warewulf has a real time top like command for the cluster nodes and >> has been known to scale up to the thousands of nodes: >> >> http://www.runlevelzero.net/images/wwtop-screenshot.png >> >> We are just kicking off Warewulf development again now that Perceus >> has gotten to a critical mass and Caos NSA 1.0 has been released. We >> should have our repositories for Warewulf-3 pre-releases up shortly >> but if you need something ASAP, please contact me offline and I will >> get you what you need. >> >> Thanks! >> Greg >> >> On Wed, Nov 26, 2008 at 12:39 PM, Thomas Vixel wrote: >>> I've been googling for a top-like cli tool to use on our cluster, but >>> the closest thing that comes up is Rocks' "cluster top" script. That >>> could be tweaked to work via the cli, but due to factors beyond my >>> control (management) all functionality has to come from a pre-fab >>> program rather than a software stack with local, custom modifications. >>> >>> I'm sure this has come up more than once in the HPC sector as well -- >>> could anyone point me to any top-like apps for our cluster? >>> >>> For reference, wulfware/wulfstat was nixed as well because of the >>> xmlsysd dependency. > > That's fine, but I'm curious. How do you expect to run a cluster > information tool over a network without a socket at both ends? If not > xmlsysd, then something else -- sshd, xinetd, dedicated or general > purpose, where the latter almost certainly will have have higher > overhead? Or are you looking for something with a kernel level network > interface, more like scyld? > > rgb > >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> >> >> >> -- >> Greg Kurtzer >> http://www.infiscale.com/ >> http://www.runlevelzero.net/ >> http://www.perceus.org/ >> http://www.caoslinux.org/ >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > Robert G. Brown Phone(cell): 1-919-280-8443 > Duke University Physics Dept, Box 90305 > Durham, N.C. 27708-0305 > Web: http://www.phy.duke.edu/~rgb > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 > From tvixel at gmail.com Mon Dec 1 15:22:35 2008 From: tvixel at gmail.com (Thomas Vixel) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] cli alternative to cluster top? In-Reply-To: References: Message-ID: That does sound interesting, but more for some of my personal projects. It wouldn't work for the situation at hand because: 1) It sounds like it introduces a SPF (the head node). 2) Giving our developers cluster-wide 'killall' & 'kill' functionality makes me cringe. Most of them only know just enough about Linux to be dangerous. 3) It would require completely reworking our current cluster solution; a daunting task to say the least. 4) There isn't much love for commercial & non-OSS software at our company. On 11/30/08, Donald Becker wrote: > On Wed, 26 Nov 2008, Thomas Vixel wrote: > >> I've been googling for a top-like cli tool to use on our cluster, but >> the closest thing that comes up is Rocks' "cluster top" script. That >> could be tweaked to work via the cli, but due to factors beyond my >> control (management) all functionality has to come from a pre-fab >> program rather than a software stack with local, custom modifications. >> >> I'm sure this has come up more than once in the HPC sector as well -- >> could anyone point me to any top-like apps for our cluster? > > Most remote job mechanisms only think about starting remote processes, not > about the full create-monitor-control-report functionality. > > The Scyld system (currently branded "Clusterware") defaults to using a > built-in unified process space. That presents all of the processes > running over the cluster in a process space on the master machine, with > fully POSIX semantics. It neatly solves your need with... the standard > 'top' program. > > Most scheduling systems also have a way to monitor processes that they > start, but I haven't found one that takes advantage of all information > available and reports it quickly/efficiently. > > There are many advantages of the Scyld implementation > -- no new or modified process management tools need to be written. > Standard utilities such as 'top' and 'ps' work unmodified, > as well as tools we didn't specifically plan for e.g. GUI versions of > 'pstree'. > -- The 'killall' program works over the cluster, efficiently. > -- All signals work as expected, including 'kill -9'. (Most remote > process starting mechanisms will just kill off the local endpoint, > leaving the remote process running-but-confused.) > -- Process groups and controlling-TTY groups works properly for job > control and signals > -- Running jobs report their status and statistics accurately -- an > updated 'rusage' structure is sent once per second, and a final > rusage structure and exit status is sent when the process terminates. > > The "downside" is that we explicitly use Linux features and details, > relying on kernel-version-specific features. That's an issue if it's a > one-off hack, but we've been using this approach continuously for > a decade, since the Linux 2.2 kernel and over multiple > architectures. We've been producing supported commercial releases > since 2000, longer than anyone else in the business. > > -- > Donald Becker becker@scyld.com > Penguin Computing / Scyld Software > www.penguincomputing.com www.scyld.com > Annapolis MD and San Francisco CA > > From malcolm.croucher at gmail.com Tue Dec 2 03:05:57 2008 From: malcolm.croucher at gmail.com (malcolm croucher) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question Message-ID: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> Hi Guys , I am still thinking about my cluster and most probably will only begin development next year june/july . Question : If i develop my system on 10 computers (nodes) which are all normal desktops and then would like to place this in data hosting facility which has access to real time information . I am going to need to buy new servers (thin 1 u servers ). Would this be the best choice as desktops take up more space and therefore will be more expensive . how do you guys get around this problem ? or dont you ? Regards Malcolm -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081202/694005b0/attachment.html From lindahl at pbm.com Tue Dec 2 13:18:29 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] InfiniBand VL15 error In-Reply-To: <4935531F.6040807@ias.edu> References: <4935531F.6040807@ias.edu> Message-ID: <20081202211829.GB2920@bx9> On Tue, Dec 02, 2008 at 10:24:15AM -0500, Prentice Bisbal wrote: > #warn: counter VL15Dropped = 476 (threshold 100) lid 1 port 1 > Error check on lid 1 (aurora HCA-1) port 1: FAILED IB is blissfully fading from my brain, but I think this refers to control packets being dropped due to resource limits on the recipient. That takes talent if you're using a Mellanox HCA, as pretty much all of the VL15 packets are interpreted by the processor in the HCA. -- greg From lindahl at pbm.com Tue Dec 2 13:25:29 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Second source for IB switch silicon Message-ID: <20081202212529.GC2920@bx9> As you folks probably know, there is only 1 source for InfiniBand switch silicon today. When IB was first mooted, several companies built switch silicon, but only 2 made it to market, and now only Mellanox makes switch chips. One of the startups doing an IB switch had just gotten their first chip samples back when Intel dropped out of IB, so the startup never powered on the chip. This startup was subsequently bought by QLogic, and did several generations of Fibre Channel switch chips, including some pretty big ones. Well, QLogic announced at SC that they're going to be producing IB switch silicon. No announcement of a ship date, but it'll be nice to have a second source for both HCAs and switches. -- greg Disclaimer: I used to work for QLogic, but don't have any financial interest in them anymore. From niftyompi at niftyegg.com Tue Dec 2 13:24:14 2008 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] InfiniBand VL15 error In-Reply-To: <4935531F.6040807@ias.edu> References: <4935531F.6040807@ias.edu> Message-ID: <20081202212414.GA3175@compegg.wr.niftyegg.com> On Tue, Dec 02, 2008 at 10:24:15AM -0500, Prentice Bisbal wrote: > > I'm getting this error when I run ibchecknet on my cluster: > > #warn: counter VL15Dropped = 476 (threshold 100) lid 1 port 1 > Error check on lid 1 (aurora HCA-1) port 1: FAILED > > I've googled around this morning, but haven't found anything helpful. > Most of the hits turn up code with the phrase "VL15Dropped", but nothing > explaining what this error means, what causes it, or how to fix it. > > After clearing the counters with 'perfquery -r', the VL15Dropped count > starts increasing from zero almost immediately. > > Any ideas what this error represents or how to fix? Could it be a bad > cable? > Can you be specific about the hardware (HCA and switch) and software? How large is the fabric? What subnet manager is running and where? The host behind LID-1 is the one of interest. If I recall correctly, VL15 is reserved exclusively for subnet management and is not optional. Traffic to VL15 might be randomly dropped by the switch, SMA or interrupt handler. As long as the subnet is OK modest dropped traffic on VL15 may not be an issue. What is running on the fabric concurrently with ibchecknet (and on the LID-1 host)? Subnet management traffic should be light, very light. Tell us about the subnet manager situation on your fabric. There should only be one active subnet manager. Mixed and uncooperating SMs could cause this, as could basic IB errors (connectors, cables, connections). If the SM is running on LID-1 then traffic will reflect the fabric size. What other IB errors are you seeing.. If the port for LID-1 is not seeing IB errors other than VL15 you should be OK -- do look for multiple SMs. If you stop your subnet manager does the counter reflect the pause. -- T o m M i t c h e l l Found me a new hat, now what? From prentice at ias.edu Tue Dec 2 13:33:20 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] InfiniBand VL15 error In-Reply-To: <20081202211829.GB2920@bx9> References: <4935531F.6040807@ias.edu> <20081202211829.GB2920@bx9> Message-ID: <4935A9A0.2010005@ias.edu> Greg Lindahl wrote: > On Tue, Dec 02, 2008 at 10:24:15AM -0500, Prentice Bisbal wrote: > >> #warn: counter VL15Dropped = 476 (threshold 100) lid 1 port 1 >> Error check on lid 1 (aurora HCA-1) port 1: FAILED > > IB is blissfully fading from my brain, but I think this refers to > control packets being dropped due to resource limits on the recipient. > That takes talent if you're using a Mellanox HCA, as pretty much all > of the VL15 packets are interpreted by the processor in the HCA. > > -- greg > > Just my luck. I'm using Cisco HCAs, which are really Mellanox HCAs: # lspci | grep Infini 0b:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev 20) Fortunately, Gilad from Mellanox has offered me some assistance off-list. -- Prentice From prentice at ias.edu Tue Dec 2 14:02:59 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] InfiniBand VL15 error In-Reply-To: <20081202212414.GA3175@compegg.wr.niftyegg.com> References: <4935531F.6040807@ias.edu> <20081202212414.GA3175@compegg.wr.niftyegg.com> Message-ID: <4935B093.70006@ias.edu> See my answers inline. Nifty Tom Mitchell wrote: > On Tue, Dec 02, 2008 at 10:24:15AM -0500, Prentice Bisbal wrote: >> I'm getting this error when I run ibchecknet on my cluster: >> >> #warn: counter VL15Dropped = 476 (threshold 100) lid 1 port 1 >> Error check on lid 1 (aurora HCA-1) port 1: FAILED >> >> I've googled around this morning, but haven't found anything helpful. >> Most of the hits turn up code with the phrase "VL15Dropped", but nothing >> explaining what this error means, what causes it, or how to fix it. >> >> After clearing the counters with 'perfquery -r', the VL15Dropped count >> starts increasing from zero almost immediately. >> >> Any ideas what this error represents or how to fix? Could it be a bad >> cable? >> > > Can you be specific about the hardware (HCA and switch) and software? > How large is the fabric? > What subnet manager is running and where? > > The host behind LID-1 is the one of interest. IB Switch: Cisco 7012 D, 144-port HCAs: Cisco, which is really Mellanox: # lspci | grep Infini 0b:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex (Tavor compatibility mode) (rev 20) The subnet manager is OpenSM 3.1.8-1.el5, which is provided by my Linux Distro, PU_IAS 5.2, which is a rebuild of RHEL 5.2. It is running on the master node, aurora. The HCA with the error is on this node (see errors message in original post). > > If I recall correctly, VL15 is reserved exclusively for subnet management > and is not optional. Traffic to VL15 might be randomly dropped by the > switch, SMA or interrupt handler. As long as the subnet is OK modest > dropped traffic on VL15 may not be an issue. > > What is running on the fabric concurrently with ibchecknet (and on the LID-1 host)? Not sure what you mean. Do you want to see the output of ibchecknet? > > Subnet management traffic should be light, very light. Tell us about > the subnet manager situation on your fabric. There should only > be one active subnet manager. Mixed and uncooperating SMs could > cause this, as could basic IB errors (connectors, cables, connections). > If the SM is running on LID-1 then traffic will reflect the fabric size. There is only one SM running. It's running on the master node. The other nodes don't even have the OpenSM package installed. > > What other IB errors are you seeing.. If the port for LID-1 is not seeing > IB errors other than VL15 you should be OK -- do look for multiple SMs. I'm not seeing any other errors. This one is a new development, too. > If you stop your subnet manager does the counter reflect the pause. > Haven't tried yet. And since it's almost quitting time, I'm not going to try until tomorrow. -- Prentice From rgb at phy.duke.edu Tue Dec 2 14:12:09 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> Message-ID: On Tue, 2 Dec 2008, malcolm croucher wrote: > Hi? Guys , > > I am still thinking about my cluster and most probably will only begin > development next year june/july . > > Question : > > If i develop my system on? 10 computers (nodes) which are all normal > desktops and then would like to place this in data hosting facility which > has access to real time information . I am going to need to buy new servers > (thin 1 u servers ). Would this be the best choice as desktops take up more > space and therefore will be more expensive . how do you guys get around this > problem ? or dont you ? I'm not sure I understand you, but let me try. You plan to buy 10 computers to use for development of a cluster application. I presume that this means ten actual boxes, not a single box with two quad core processors and a dual core or something like that (my own choice for development would be no more than four machines, each with one quad core processor or two dual core processors depending on just where I expected to be internally bottlenecked, OR a single dual quad). Then you expect to place this in a data hosting facility -- basically buy rack space and somebody to punch reset for you plus high speed access to -- something you need in production but not in development. Is that right? The usual rule is that 1U nodes will cost a bit more than equivalently equipped desktops, primarily because the 1U cases cost more and because one has to work a bit harder to ensure that e.g. the CPUs stay cool and so on. I haven't checked marginal differences recently, but would guestimate $250-500 per box. This is a bit of a hit on ten boxes (if that's what you plan on) but not a crazy one, and if you spend it up front you a) have a MUCH smaller stack of systems in your house or regular office -- ten 1U nodes in a stack is the size of a dorm refrigerator, where ten minitowers stacked up is 2-3 times as much space/volume -- then you don't have to buy new systems to move them into your hosting facility rack, where they sell you space and where hosting the towers -- if they let you put towers there at all -- will cost you more on the far side. To put it another way, I THINK that you should probably just get 1U systems from the beginning, but there are so many variables you haven't given us I can't be sure. For example, your money or somebody else's? Academic research or entrepreneurial? HPC cluster or HA cluster? Why "ten nodes" -- a bit of an unusual number. What are the relative costs on the hosting side? Would a much smaller system (like a single $3300 eight core dual quad) work just as well for prototyping, leaving you with money later to buy as many 1U nodes as you need AFTER prototyping and estimating capacity and so on? rgb > > Regards > > Malcolm > > > > > > > > > > > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From lindahl at pbm.com Tue Dec 2 14:35:26 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> Message-ID: <20081202223526.GA12378@bx9> On Tue, Dec 02, 2008 at 05:12:09PM -0500, Robert G. Brown wrote: > The usual rule is that 1U nodes will cost a bit more than equivalently > equipped desktops, primarily because the 1U cases cost more and because > one has to work a bit harder to ensure that e.g. the CPUs stay cool and > so on. The single-socket 2U nodes that I buy cost the same as our developer desktops, which have a somewhat expensive case in order to be quiet. Since he's putting these into a hosting facility, it's likely that having 1U boxes doesn't gain him anything... the limit is generally power, not space. And 2U boxes are quieter and more reliable. -- greg From peter.st.john at gmail.com Tue Dec 2 14:52:04 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> Message-ID: Malcolm, Just a plug for the recherche' space-minimization solution (I'm still thinking about my own hypothetical cluster, as you are). This article http://www.linuxjournal.com/article/8177 is Ron Minnich's "Beowulf in a Lunchbox"; 16 mini motherboards with risers. You could possibly consider a rackable system with diskless nodes (hand-me-down desktop motherboards, with onboard NIC, would fit in 1U slots, right?) and put the disks and power supplies and routers in a dense 2 or 3U space. Then maybe you could just load it all into a rack lafter the initial development in your basement. But as RGB mentions it depends on lots of things; heat, dust in your office, the budget, etc. Incidentally, if you built say 4 single-slot quad-core nodes with 2 dual-slot 8 core (per board) nodes, then you could find out which bandwidth/mem-channel/per-core configuration works better for your app, instead of anticipating; in the case that your initial prototype needn't be ideal. Peter P.S. incidentally, I think that Ron Minnich is the one I knew in High School; which is a bit odd, parallel to RGB from Duke. Next I'll meet a bewulfer from my kindergarden class :-) On Tue, Dec 2, 2008 at 6:05 AM, malcolm croucher wrote: > Hi Guys , > > I am still thinking about my cluster and most probably will only begin > development next year june/july . > > Question : > > If i develop my system on 10 computers (nodes) which are all normal > desktops and then would like to place this in data hosting facility which > has access to real time information . I am going to need to buy new servers > (thin 1 u servers ). Would this be the best choice as desktops take up more > space and therefore will be more expensive . how do you guys get around this > problem ? or dont you ? > > Regards > > Malcolm > > > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081202/3cb7e326/attachment.html From dnlombar at ichips.intel.com Tue Dec 2 15:26:54 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> Message-ID: <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> On Tue, Dec 02, 2008 at 02:12:09PM -0800, Robert G. Brown wrote: > On Tue, 2 Dec 2008, malcolm croucher wrote: > > > Hi? Guys , > > > > I am still thinking about my cluster and most probably will only begin > > development next year june/july . > > > > Question : > > > > If i develop my system on? 10 computers (nodes) which are all normal > > desktops and then would like to place this in data hosting facility which > > has access to real time information . I am going to need to buy new servers > > (thin 1 u servers ). Would this be the best choice as desktops take up more > > space and therefore will be more expensive . how do you guys get around this > > problem ? or dont you ? > > I'm not sure I understand you, but let me try. > > (my own choice for development would be no more than four machines, each > with one quad core processor or two dual core processors depending on > just where I expected to be internally bottlenecked, OR a single dual > quad). Agreed. Four nodes should see any issues that 10 would also see. Less and you can miss concurrency issues. > To put it another way, I THINK that you should probably just get 1U > systems from the beginning, but there are so many variables you haven't > given us I can't be sure. An acoustic concern. A 1U is quite a bit louder than the normal desktop as (1) they use itty-bitty fans and (b) there's no incentive to make them quiet, as nobody is expected to have to put up with their screaming... -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From niftyompi at niftyegg.com Tue Dec 2 16:04:47 2008 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] InfiniBand VL15 error In-Reply-To: <4935B093.70006@ias.edu> References: <4935531F.6040807@ias.edu> <20081202212414.GA3175@compegg.wr.niftyegg.com> <4935B093.70006@ias.edu> Message-ID: <20081203000447.GA4279@compegg.wr.niftyegg.com> On Tue, Dec 02, 2008 at 05:02:59PM -0500, Prentice Bisbal wrote: > > See my answers inline. > > Nifty Tom Mitchell wrote: > > On Tue, Dec 02, 2008 at 10:24:15AM -0500, Prentice Bisbal wrote: > >> I'm getting this error when I run ibchecknet on my cluster: > >> > >> #warn: counter VL15Dropped = 476 (threshold 100) lid 1 port 1 > >> Error check on lid 1 (aurora HCA-1) port 1: FAILED > >> > >> I've googled around this morning, but haven't found anything helpful. > >> Most of the hits turn up code with the phrase "VL15Dropped", but nothing > >> explaining what this error means, what causes it, or how to fix it. > >> > >> After clearing the counters with 'perfquery -r', the VL15Dropped count > >> starts increasing from zero almost immediately. > >> > >> Any ideas what this error represents or how to fix? Could it be a bad > >> cable? > >> > > > > Can you be specific about the hardware (HCA and switch) and software? > > How large is the fabric? > > What subnet manager is running and where? > > > > The host behind LID-1 is the one of interest. > > IB Switch: Cisco 7012 D, 144-port > HCAs: Cisco, which is really Mellanox: > > # lspci | grep Infini > 0b:00.0 InfiniBand: Mellanox Technologies MT25208 InfiniHost III Ex > (Tavor compatibility mode) (rev 20) > > The subnet manager is OpenSM 3.1.8-1.el5, which is provided by my Linux > Distro, PU_IAS 5.2, which is a rebuild of RHEL 5.2. It is running on the > master node, aurora. The HCA with the error is on this node (see errors > message in original post). > > > > > If I recall correctly, VL15 is reserved exclusively for subnet management > > and is not optional. Traffic to VL15 might be randomly dropped by the > > switch, SMA or interrupt handler. As long as the subnet is OK modest > > dropped traffic on VL15 may not be an issue. > > > > What is running on the fabric concurrently with ibchecknet (and on the LID-1 host)? > > Not sure what you mean. Do you want to see the output of ibchecknet? What I was thinking was that on a compute bound system the subnet manager process might not get enough time to service all the management packets. In the Mellanox case the card on a local node can have many SMA actions handled inside the card larger fabric wide actions need interrupts and system time. If this was an overloaded IO or compute node the subnet manager may not wake up often enough to handle all the management packets. i.e. it may be normal and OK with this load, card, software stack and SM to see VL15 drops. Your Mellanox contact can help answer this... > > > > Subnet management traffic should be light, very light. Tell us about > > the subnet manager situation on your fabric. There should only > > be one active subnet manager. Mixed and uncooperating SMs could > > cause this, as could basic IB errors (connectors, cables, connections). > > If the SM is running on LID-1 then traffic will reflect the fabric size. > > There is only one SM running. It's running on the master node. The other > nodes don't even have the OpenSM package installed. > > > > What other IB errors are you seeing.. If the port for LID-1 is not seeing > > IB errorsu other than VL15 you should be OK -- do look for multiple SMs. > > I'm not seeing any other errors. This one is a new development, too. > > > If you stop your subnet manager does the counter reflect the pause. > > > > Haven't tried yet. And since it's almost quitting time, I'm not going to > try until tomorrow. Pausing the subnet manager can be diagnostic. If you pause/ stop the SM and reboot a free node, the free node will not be assigned a LID. If you have another SM on the fabric it will get a LID. While multiple subnet managers are legal the interactions between different versions has too many permutations for good test coverage. It can be good to 'test' for unexpected subnet managers. Do revisit your Open SM settings. Sweeps for node status may just be too aggressive. There is a chance that your opensm is dated. It look like: opensm-3.1.8-1.el5.x86_64.rpm. Build Date: Mon Mar 17 14:12:13 2008 Inspect the change log dates ;-0 ftp://ftp.cs.stanford.edu/pub/mirrors/scientific/52/x86_64/SL/repodata/repoview/opensm-0-3.1.8-1.el5.html The current OFED version looks like: opensm-3.1.11-1.ofed1.3.1.src.rpm While OFED and rpm versions do not always track consider an update. Also note RH is slow picking up many OFED changes as the OFED process is a big bang release process. Other on this list might know if the delta from 3.1.8 to 3.1.11 is important in this regard. -- T o m M i t c h e l l Found me a new hat, now what? From rgb at phy.duke.edu Tue Dec 2 16:11:37 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> Message-ID: On Tue, 2 Dec 2008, Lombard, David N wrote: > An acoustic concern. A 1U is quite a bit louder than the normal desktop as > (1) they use itty-bitty fans and (b) there's no incentive to make them > quiet, as nobody is expected to have to put up with their screaming... A good point. I actually like Greg's suggestion best -- consider (fewer) 2U nodes instead -- quieter, more robust, cooler. Perhaps four, but that strongly depends on the kind of thing you are trying to do -- tell us what it is if you can do so without having to kill and we'll try to help you estimate your communications issues and likely bottlenecks. For some tasks you are best off getting as few actual boxes as possible with as many as possible CPU cores per box. For others, having more boxes and fewer cores per box will be right. The reason I like four nodes with at least a couple of cores each is that if you don't KNOW what you are likely to need, you can find out (probably) with this many nodes and then "fix" your design if/when you scale up into production. Otherwise you buy eight single core node (if they still make single cores:-) and then learn that you would have been much better off buying a single eight core node. Or vice versa. rgb > > -- > David N. Lombard, Intel, Irvine, CA > I do not speak for Intel Corporation; all comments are strictly my own. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From eugen at leitl.org Tue Dec 2 23:37:57 2008 From: eugen at leitl.org (Eugen Leitl) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: <20081202223526.GA12378@bx9> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202223526.GA12378@bx9> Message-ID: <20081203073757.GN11544@leitl.org> On Tue, Dec 02, 2008 at 02:35:26PM -0800, Greg Lindahl wrote: > On Tue, Dec 02, 2008 at 05:12:09PM -0500, Robert G. Brown wrote: > > > The usual rule is that 1U nodes will cost a bit more than equivalently > > equipped desktops, primarily because the 1U cases cost more and because > > one has to work a bit harder to ensure that e.g. the CPUs stay cool and > > so on. Our staple box is the SunFire X2100 M2, which sells for about 400 EUR sans VAT (kit has 1.8 GHz dual-core Opteron, 512 MByte DDR2, no disk). I've measured 115 W at idle with 2x TByte SATA disk and 8 GByte DDR2 RAM which is even better than what Sun says http://www.sun.com/servers/entry/x2100/M2calc/index.jsp#calc > The single-socket 2U nodes that I buy cost the same as our developer > desktops, which have a somewhat expensive case in order to be quiet. > Since he's putting these into a hosting facility, it's likely that > having 1U boxes doesn't gain him anything... the limit is generally > power, not space. And 2U boxes are quieter and more reliable. From malcolm.croucher at gmail.com Tue Dec 2 23:25:54 2008 From: malcolm.croucher at gmail.com (malcolm croucher) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> Message-ID: <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> Its gonna be used for computational chemisty , not academic but more private / entrepreneurship. I been doing a lot of research in this area for a while and was hoping to do some more on my own. On Wed, Dec 3, 2008 at 2:11 AM, Robert G. Brown wrote: > On Tue, 2 Dec 2008, Lombard, David N wrote: > > An acoustic concern. A 1U is quite a bit louder than the normal desktop as >> (1) they use itty-bitty fans and (b) there's no incentive to make them >> quiet, as nobody is expected to have to put up with their screaming... >> > > A good point. I actually like Greg's suggestion best -- consider > (fewer) 2U nodes instead -- quieter, more robust, cooler. Perhaps four, > but that strongly depends on the kind of thing you are trying to do -- > tell us what it is if you can do so without having to kill and we'll try > to help you estimate your communications issues and likely bottlenecks. > For some tasks you are best off getting as few actual boxes as possible > with as many as possible CPU cores per box. For others, having more > boxes and fewer cores per box will be right. > > The reason I like four nodes with at least a couple of cores each is > that if you don't KNOW what you are likely to need, you can find out > (probably) with this many nodes and then "fix" your design if/when you > scale up into production. Otherwise you buy eight single core node (if > they still make single cores:-) and then learn that you would have been > much better off buying a single eight core node. Or vice versa. > > rgb > > >> -- >> David N. Lombard, Intel, Irvine, CA >> I do not speak for Intel Corporation; all comments are strictly my own. >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > Robert G. Brown Phone(cell): 1-919-280-8443 > Duke University Physics Dept, Box 90305 > Durham, N.C. 27708-0305 > Web: http://www.phy.duke.edu/~rgb > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 > -- Malcolm A.B Croucher -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081203/1792d1f6/attachment.html From rgb at phy.duke.edu Wed Dec 3 07:33:27 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> Message-ID: On Wed, 3 Dec 2008, malcolm croucher wrote: > Its gonna be used for computational chemisty , not academic but more private > / entrepreneurship. I been doing a lot of research in this area for a while > and was hoping to do some more on my own. Any idea of the specific software you plan to use? Or do you plan to write your own. There are lots of people on-list that can help you e.g. estimate the likely task granularity if you identify your toolset (only, not what you hope to invent:-). Basically, more important than the case you plan to put your system(s) in is the balance between computation (computer cores at some given clock), memory (bandwidth and contention between cores and memory), and interprocessor communications both within a system (between one core/thread and another) and between systems (network based IPCs). Each of the pathways from a core outward has an associated cost in latency and bandwidth, and very different investment strategies will yield the best bang for a limited supply of bucks for different "kinds" of parallel problems. So the very first step of cluster engineering is typically to analyze you tasks' patterns of computation, memory access and interprocessor communication. Once that is known, it is usually possible to identify (for example) whether it is better to have fewer processors and a faster network or more processors and a slow network. Since a really fast network can cost as much as two or more cores and since one has to balance network needs against ALL the cores per chassis, this can be a significant tradeoff. Ditto for tasks that tend to be memory bound -- in that case one might want to opt for fewer cores per box to ensure that each core can access memory at full speed with minimal lost efficiency due to contention. rgb > > On Wed, Dec 3, 2008 at 2:11 AM, Robert G. Brown wrote: > On Tue, 2 Dec 2008, Lombard, David N wrote: > > An acoustic concern. A 1U is quite a bit louder than > the normal desktop as > (1) they use itty-bitty fans and (b) there's no > incentive to make them > quiet, as nobody is expected to have to put up with > their screaming... > > > A good point. ?I actually like Greg's suggestion best -- consider > (fewer) 2U nodes instead -- quieter, more robust, cooler. ?Perhaps > four, > but that strongly depends on the kind of thing you are trying to do -- > tell us what it is if you can do so without having to kill and we'll > try > to help you estimate your communications issues and likely > bottlenecks. > For some tasks you are best off getting as few actual boxes as > possible > with as many as possible CPU cores per box. ?For others, having more > boxes and fewer cores per box will be right. > > The reason I like four nodes with at least a couple of cores each is > that if you don't KNOW what you are likely to need, you can find out > (probably) with this many nodes and then "fix" your design if/when you > scale up into production. ?Otherwise you buy eight single core node > (if > they still make single cores:-) and then learn that you would have > been > much better off buying a single eight core node. ?Or vice versa. > > ? rgb > > > -- > David N. Lombard, Intel, Irvine, CA > I do not speak for Intel Corporation; all comments are > strictly my own. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > Robert G. Brown ? ? ? ? ? ? ? ? ? ? ? ? ? ?Phone(cell): 1-919-280-8443 > Duke University Physics Dept, Box 90305 > Durham, N.C. 27708-0305 > Web: http://www.phy.duke.edu/~rgb > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 > > > > > -- > Malcolm A.B Croucher > > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From mathog at caltech.edu Wed Dec 3 09:47:17 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question Message-ID: "Lombard, David N" wrote: > An acoustic concern. A 1U is quite a bit louder than the normal desktop as > (1) they use itty-bitty fans and (b) there's no incentive to make them > quiet, as nobody is expected to have to put up with their screaming... Amen to that - the screech from those tiny fans is unbearable. Even a 2U system with larger (usually 80 mm) fans is going to be substantially louder than a typical desktop (mostly using 120mm now). The 2U, like the 1U, is designed for machine rooms, so cooling trumps quiet every time. To the OP: do not assume that your developer and 10 CPU system will happily work in the same room. (Unless your developer happens to be deaf and not at all sensitive to unusual air temperatures.) Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From eugen at leitl.org Wed Dec 3 10:02:18 2008 From: eugen at leitl.org (Eugen Leitl) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: References: Message-ID: <20081203180218.GG11544@leitl.org> On Wed, Dec 03, 2008 at 09:47:17AM -0800, David Mathog wrote: > To the OP: do not assume that your developer and 10 CPU system will > happily work in the same room. (Unless your developer happens to be > deaf and not at all sensitive to unusual air temperatures.) Arguably, it is too distracting even if you're in the room next to it, behind two doors. When switching them on it's a fair approximation of a starting jet. Less an issue in a cube farm, of course. From hearnsj at googlemail.com Wed Dec 3 10:30:59 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: <20081203180218.GG11544@leitl.org> References: <20081203180218.GG11544@leitl.org> Message-ID: <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> I would think about a blade system for this particular application. (Say) one of the Supermicro blade enclosures. You could start small for development with one or two blades, then expand. Should be easy enough to transport to the hosting location when you are ready. You can get acoustically quiet racks to put them in if it is going in an office, but that will inevitably cost more. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081203/3f9c85cf/attachment.html From hahn at mcmaster.ca Wed Dec 3 11:03:20 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> References: <20081203180218.GG11544@leitl.org> <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> Message-ID: > I would think about a blade system for this particular application. aren't blades still a significant premium? > (Say) one of the Supermicro blade enclosures. You could start small for > development with one or two blades, well, nothing beats the incremental expandability of a stack of separate boxes. given that modern cpus are drastically cooler (esp per flop) than even 2 years ago, fans rarely run fast. > then expand. Should be easy enough to transport to the hosting location when > you are ready. a blade chassis usually requires power other than the usual 15A circuit; nothing custom of course (L6-30 is my favorite) but something like a dual-board supermicro 1U is pretty convenient and flexible. > You can get acoustically quiet racks to put them in if it is going in an > office, but that will inevitably cost more. if office, I'd certainly just get minitowers. but I wouldn't consider putting more than a few in an office, since a ~250W minitower already corresponds to the power dissipation of two people... From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Dec 3 12:17:00 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: References: <20081203180218.GG11544@leitl.org> <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> Message-ID: On Wed, 3 Dec 2008, Mark Hahn wrote: > if office, I'd certainly just get minitowers. For office, I would recommend barebones in SFF (small form factor) cases, like those commonly advertised for HTPC (Home theater PC). I have built a cluster of 80 of those (Shuttle SB75G2) in 2004, you can see a rather bad picture in the "IWR Cluster 4 part a" section of: http://www.iwr.uni-heidelberg.de/services/equipment/parallel/ Because of their number, they are located in a well cooled computer room, the total consumption being close to 9KW under load. These barebones often contain what is needed for a simple cluster node, with only CPU, memory and possibly a disk to be added. The cooling is often better designed than in a normal (mini-)tower case and because of their intended usage they can be even quiter - but the difference between the different models or manufacturers can be huge. The price is slightly higher than a comparable (mini)tower; the impression that it makes on visitors is however much greater ;-) -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From hahn at mcmaster.ca Wed Dec 3 12:46:57 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: References: <20081203180218.GG11544@leitl.org> <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> Message-ID: >> if office, I'd certainly just get minitowers. > > For office, I would recommend barebones in SFF (small form factor) cases, > like those commonly advertised for HTPC (Home theater PC). I have built a well, SFF shoeboxes would be a good idea, but they tend to be somewhat specialized. for instance, they normally only have a single cpu socket. but they're good because the PSU is often sized modestly (the sweet spot for PSU efficiency is somewhere around 75% load.) I don't know whether there would be any problem putting a real interconnect card (10G, IB, etc) into one of these - some are designed for GPU cards, so would have 8 or 16x pcie slots. if you're only using gigabit, an SFF shoebox might be quite a good fit, including low-power integrated video. there are "book" format SFF's that might do well for the gigabit/integrated approach too. I wouldn't mention HTPC, though - in stores around here at least that term implies a box specialized to look like AV components, often with milled aluminum bezel, fancy displays, etc. > cluster of 80 of those (Shuttle SB75G2) in 2004, you can see a rather bad > picture in the "IWR Cluster 4 part a" section of: > http://www.iwr.uni-heidelberg.de/services/equipment/parallel/ nice. I think SFF's would be very nice, though probably would mean giving up any pretense of server-ish-ness, such as dual sockets or IPMI, and probably sticking to onboard gigabit. for an office, though, a stack of 10 of them would still probably be a problem in total dissipation. From ajt at rri.sari.ac.uk Wed Dec 3 13:40:16 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: References: <20081203180218.GG11544@leitl.org> <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> Message-ID: <4936FCC0.5050902@rri.sari.ac.uk> Mark Hahn wrote: >>> if office, I'd certainly just get minitowers. >> For office, I would recommend barebones in SFF (small form factor) cases, >> like those commonly advertised for HTPC (Home theater PC). I have built a > > well, SFF shoeboxes would be a good idea, but they tend to be somewhat > specialized. for instance, they normally only have a single cpu socket. > but they're good because the PSU is often sized modestly (the sweet spot > for PSU efficiency is somewhere around 75% load.) I don't know whether > there would be any problem putting a real interconnect card (10G, IB, etc) > into one of these - some are designed for GPU cards, so would have 8 or 16x > pcie slots. if you're only using gigabit, an SFF shoebox might be quite a > good fit, including low-power integrated video. there are "book" format > SFF's that might do well for the gigabit/integrated approach too. Hello, Mark. I bought some IWill Zmaxdp and Zmaxd2 SFF dual Opteron servers with registered ECC memory to use in our Beowulf, but the heat dissipation problems were really *terrible* and the boxes were incredibly noisy. > [...] > I think SFF's would be very nice, though probably would mean giving up any > pretense of server-ish-ness, such as dual sockets or IPMI, and probably > sticking to onboard gigabit. for an office, though, a stack of 10 of them > would still probably be a problem in total dissipation. The IWill Zmaxdp/d2 were the only SFF Opteron servers I could find that support registered ECC memory. To cut a long story short, I've had to replace the standard 80W Opterons with 55W Opteron HE's to get the heat burden under control. These are great little boxes, but when you make them work hard they are extremely noisy, and none of my colleagues will put up with them in an office environment! The end of this story for me was that Flextronics bought IWill for their 1U server designs and IWill immediately ceased production of Zmax retail products. They are still around on eBay and one or two vendors if anyone is interested + mine are working fine now :-) http://www.flextronics.com/iwill/product_2.asp?p_id=36 http://www.flextronics.com/iwill/product_2.asp?p_id=105 There was a lot of interest in these IWill SFF servers when they were launched, but many people said it was impossible to keep the systems cool under load. They were right: IWill said that standard 80W Opterons were supported - no way: These SFF servers do work fine with 55W HE's though and if that had been more widely known at the time they may have been more successful. Bye, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis@abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From Bogdan.Costescu at iwr.uni-heidelberg.de Wed Dec 3 13:54:43 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: References: <20081203180218.GG11544@leitl.org> <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> Message-ID: On Wed, 3 Dec 2008, Mark Hahn wrote: > I don't know whether there would be any problem putting a real > interconnect card (10G, IB, etc) into one of these - some are > designed for GPU cards, so would have 8 or 16x pcie slots. Yes, they often have 2 slots, a PCIe 16x one for a graphics card and a PCIe 1x or PCI one for a TV card (for HTPC use ;-)). > there are "book" format SFF's that might do well for the > gigabit/integrated approach too. If raw performance is not your main interest, yes. These often have previous generation or significantly lower speed CPUs and memory and a rather bad cooling solution dictated by the space contraints. In comparison, most "normal" SFFs can take current generation CPUs and memory. One advantage coming from their lower speed/power components is that some of the "book" ones use an external power supply, laptop style, which generates no noise and eliminates the possibility of broken fans. > I wouldn't mention HTPC, though - in stores around here at least > that term implies a box specialized to look like AV components, > often with milled aluminum bezel, fancy displays, etc. Is there a law against sexy cluster nodes ? :-) How about making those fancy displays show a job ID and the current CPU load ? Would there be any more need for nagios or ganglia ? :-) > I think SFF's would be very nice, though probably would mean giving up any > pretense of server-ish-ness, such as dual sockets or IPMI I have looked some years ago at such a SFF barebone with 2 CPU sockets from Iwill. For some reason I couldn't (maybe still can't) get easily Iwill products here, so I quickly lost interest. But this shows that someone did think of it and that it's technically possible. However today, with multi-core CPUs being mainstream, I can't really see a market case for a SFF with more than one CPU socket. IPMI is the one thing that I really miss in these SFF computers. Not so much for sensors monitoring as for power on/power off/reset and console redirection. I was hoping that at least the Intel vPro would be adopted for SFFs... -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From gdjacobs at gmail.com Wed Dec 3 15:42:13 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: References: <20081203180218.GG11544@leitl.org> <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> Message-ID: <49371955.20200@gmail.com> Bogdan Costescu wrote: > On Wed, 3 Dec 2008, Mark Hahn wrote: > >> I don't know whether there would be any problem putting a real >> interconnect card (10G, IB, etc) into one of these - some are designed >> for GPU cards, so would have 8 or 16x pcie slots. > > Yes, they often have 2 slots, a PCIe 16x one for a graphics card and a > PCIe 1x or PCI one for a TV card (for HTPC use ;-)). > >> there are "book" format SFF's that might do well for the >> gigabit/integrated approach too. > > If raw performance is not your main interest, yes. These often have > previous generation or significantly lower speed CPUs and memory and a > rather bad cooling solution dictated by the space contraints. In > comparison, most "normal" SFFs can take current generation CPUs and memory. > > One advantage coming from their lower speed/power components is that > some of the "book" ones use an external power supply, laptop style, > which generates no noise and eliminates the possibility of broken fans. > >> I wouldn't mention HTPC, though - in stores around here at least that >> term implies a box specialized to look like AV components, often with >> milled aluminum bezel, fancy displays, etc. > > Is there a law against sexy cluster nodes ? :-) > > How about making those fancy displays show a job ID and the current CPU > load ? Would there be any more need for nagios or ganglia ? :-) > >> I think SFF's would be very nice, though probably would mean giving up >> any >> pretense of server-ish-ness, such as dual sockets or IPMI > > I have looked some years ago at such a SFF barebone with 2 CPU sockets > from Iwill. For some reason I couldn't (maybe still can't) get easily > Iwill products here, so I quickly lost interest. But this shows that > someone did think of it and that it's technically possible. However > today, with multi-core CPUs being mainstream, I can't really see a > market case for a SFF with more than one CPU socket. > > IPMI is the one thing that I really miss in these SFF computers. Not so > much for sensors monitoring as for power on/power off/reset and console > redirection. I was hoping that at least the Intel vPro would be adopted > for SFFs... > What is the capability of EDAC on AM2 and AM2+ CPUs? Does the motherboard chipset impose any limitations? -- Geoffrey D. Jacobs From jan.heichler at gmx.net Thu Dec 4 00:19:27 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: References: <20081203180218.GG11544@leitl.org> <9f8092cc0812031030j4827b3fej93ad2fb528f7957d@mail.gmail.com> Message-ID: <1722728979.20081204091927@gmx.net> Hallo Bogdan, Mittwoch, 3. Dezember 2008, meintest Du: BC> On Wed, 3 Dec 2008, Mark Hahn wrote: >> I don't know whether there would be any problem putting a real >> interconnect card (10G, IB, etc) into one of these - some are >> designed for GPU cards, so would have 8 or 16x pcie slots. BC> Yes, they often have 2 slots, a PCIe 16x one for a graphics card and a BC> PCIe 1x or PCI one for a TV card (for HTPC use ;-)). But be careful here. It wouldn't be the first PCIe x16 where nothing else but a Graphic card work properly. If you browse through web-forums you see a lot of people trying to get a RAID-Controller (for example) working in a PCIe x16 - some boards just don't allow that. So better test before you buy.... Ja -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081204/363f5a2e/attachment.html From diep at xs4all.nl Thu Dec 4 04:23:22 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> Message-ID: Oh you mention realtime information. I assume that's stockexchange information. In these volatile markets there is a LOT to earn with that. Last time i discussed with some sysadmins that are busy with this subject, this is all about latency to the RAM, and a LOT of ram and i/o that needs to get replaced within 2 minutes realtime when a disk fails. What you need most likely is 2 identical machines A and B. Fastest latency you get for now with quad socket AMDs. The quad socket intels with CSI aren't there yet regrettably. Maybe end 2009. Quad socket AMD with a mainboard that you can later upgrade the cpu's to shanghai core. Initially you could start maybe even with high clocked dual cores. Memory controller is on die, so the higher the clock frequency of each core, the faster the latency to RAM. Equip each box with 64-128 GB ECC ddr2 ram and a BIG raid10 array U320. Most here will know how to deal with i/o in best manner. Then build a cluster of 2 nodes, so that machine B runs as a backup of A, so in case of a problem with A then B can take over glueless. You can build these nodes pretty cheap nowadays I saw these quad socket mainboards for like 800 euro and really a lot of dimm slots. Question is how fast you want to replace the disks. If you really want personnel that replaces 'em within 2 minutes and not a second slower, if a disk breaks, that's gonna be costly. In any case the network between the 2 nodes is pretty important. Good luck, Vincent On Dec 2, 2008, at 12:05 PM, malcolm croucher wrote: > Hi Guys , > > I am still thinking about my cluster and most probably will only > begin development next year june/july . > > Question : > > If i develop my system on 10 computers (nodes) which are all > normal desktops and then would like to place this in data hosting > facility which has access to real time information . I am going to > need to buy new servers (thin 1 u servers ). Would this be the best > choice as desktops take up more space and therefore will be more > expensive . how do you guys get around this problem ? or dont you ? > > Regards > > Malcolm > > > > > > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Thu Dec 4 09:58:34 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:06 2010 Subject: [Beowulf] Intro question In-Reply-To: <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> Message-ID: <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> On Dec 3, 2008, at 8:25 AM, malcolm croucher wrote: > Its gonna be used for computational chemisty , not academic but > more private / entrepreneurship. I been doing a lot of research in > this area for a while and was hoping to do some more on my own. > That's most interesting, if i google for your name i just get hits in the financial world. How's that possible? Vincent > On Wed, Dec 3, 2008 at 2:11 AM, Robert G. Brown > wrote: > On Tue, 2 Dec 2008, Lombard, David N wrote: > > An acoustic concern. A 1U is quite a bit louder than the normal > desktop as > (1) they use itty-bitty fans and (b) there's no incentive to make them > quiet, as nobody is expected to have to put up with their screaming... > > A good point. I actually like Greg's suggestion best -- consider > (fewer) 2U nodes instead -- quieter, more robust, cooler. Perhaps > four, > but that strongly depends on the kind of thing you are trying to do -- > tell us what it is if you can do so without having to kill and > we'll try > to help you estimate your communications issues and likely > bottlenecks. > For some tasks you are best off getting as few actual boxes as > possible > with as many as possible CPU cores per box. For others, having more > boxes and fewer cores per box will be right. > > The reason I like four nodes with at least a couple of cores each is > that if you don't KNOW what you are likely to need, you can find out > (probably) with this many nodes and then "fix" your design if/when you > scale up into production. Otherwise you buy eight single core node > (if > they still make single cores:-) and then learn that you would have > been > much better off buying a single eight core node. Or vice versa. > > rgb > > > -- > David N. Lombard, Intel, Irvine, CA > I do not speak for Intel Corporation; all comments are strictly my > own. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > Robert G. Brown Phone(cell): 1-919-280-8443 > Duke University Physics Dept, Box 90305 > > Durham, N.C. 27708-0305 > Web: http://www.phy.duke.edu/~rgb > Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php > Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 > > > > -- > Malcolm A.B Croucher > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From peter.st.john at gmail.com Thu Dec 4 11:55:07 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> Message-ID: Vincent, All the guys I know, myself, in finance, started in mathematics. Including one Cole medal logician. I'm sure there are some chemists and physicists who got into finance also. I did myself, that was what sucked me into software engineering long ago. we don't necessarily lose our theoretical interests. Peter On 12/4/08, Vincent Diepeveen wrote: > > > On Dec 3, 2008, at 8:25 AM, malcolm croucher wrote: > > Its gonna be used for computational chemisty , not academic but more >> private / entrepreneurship. I been doing a lot of research in this area for >> a while and was hoping to do some more on my own. >> >> > That's most interesting, if i google for your name i just get hits in the > financial world. How's that possible? > > Vincent > > On Wed, Dec 3, 2008 at 2:11 AM, Robert G. Brown wrote: >> On Tue, 2 Dec 2008, Lombard, David N wrote: >> >> An acoustic concern. A 1U is quite a bit louder than the normal desktop as >> (1) they use itty-bitty fans and (b) there's no incentive to make them >> quiet, as nobody is expected to have to put up with their screaming... >> >> A good point. I actually like Greg's suggestion best -- consider >> (fewer) 2U nodes instead -- quieter, more robust, cooler. Perhaps four, >> but that strongly depends on the kind of thing you are trying to do -- >> tell us what it is if you can do so without having to kill and we'll try >> to help you estimate your communications issues and likely bottlenecks. >> For some tasks you are best off getting as few actual boxes as possible >> with as many as possible CPU cores per box. For others, having more >> boxes and fewer cores per box will be right. >> >> The reason I like four nodes with at least a couple of cores each is >> that if you don't KNOW what you are likely to need, you can find out >> (probably) with this many nodes and then "fix" your design if/when you >> scale up into production. Otherwise you buy eight single core node (if >> they still make single cores:-) and then learn that you would have been >> much better off buying a single eight core node. Or vice versa. >> >> rgb >> >> >> -- >> David N. Lombard, Intel, Irvine, CA >> I do not speak for Intel Corporation; all comments are strictly my own. >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> >> Robert G. Brown Phone(cell): 1-919-280-8443 >> Duke University Physics Dept, Box 90305 >> >> Durham, N.C. 27708-0305 >> Web: http://www.phy.duke.edu/~rgb >> Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php >> Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 >> >> >> >> -- >> Malcolm A.B Croucher >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081204/7379f5d3/attachment.html From prentice at ias.edu Thu Dec 4 12:53:19 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> Message-ID: <4938433F.1030002@ias.edu> Vincent Diepeveen wrote: > > On Dec 3, 2008, at 8:25 AM, malcolm croucher wrote: > >> Its gonna be used for computational chemisty , not academic but more >> private / entrepreneurship. I been doing a lot of research in this >> area for a while and was hoping to do some more on my own. >> > > That's most interesting, if i google for your name i just get hits in > the financial world. How's that possible? > Vincent, You clearly haven't heard of David E. Shaw or know what he's up these days. Here's the Cliff Notes version: He made billions on Wall Street using computer models of the market, then started Schrodinger, a leading vendor of computational chemistry software (my previous employer used it heavily), where he actually wrote some of the code in their products himself. A couple of years ago, his company (D.E Shaw REsearch)had large (~1/4 page) ads in Linux Magazine looking for top Linux HPC admins to build/maintain a highly specialized computer that would be the fastest computer in the world for biochemistry applications (protein folding, etc.). He's very secretive, but the word on the street is that it will use custom processors (FPGAs?) specially designed for molecular bio/comp chem calculations, and is being built at a site in upstate NY. Google him. He's an interesting fellow. Here's a few links to get you started: http://en.wikipedia.org/wiki/David_E._Shaw http://www.deshawresearch.com/ http://www.deshawresearch.com/chiefscientist.html http://www.motherjones.com/news/special_reports/mojo_400/43_shaw.html -- Prentice From rgb at phy.duke.edu Thu Dec 4 17:44:37 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <4938433F.1030002@ias.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> <4938433F.1030002@ias.edu> Message-ID: On Thu, 4 Dec 2008, Prentice Bisbal wrote: > You clearly haven't heard of David E. Shaw or know what he's up these > days. Here's the Cliff Notes version: He has had ads in computer magazines -- at least small ads in the back -- for years and years. Usually for physicists and mathematicians. One of the few people it looked like it would be interesting to work for, actually. rgb > > He made billions on Wall Street using computer models of the market, > then started Schrodinger, a leading vendor of computational chemistry > software (my previous employer used it heavily), where he actually wrote > some of the code in their products himself. > > A couple of years ago, his company (D.E Shaw REsearch)had large (~1/4 > page) ads in Linux Magazine looking for top Linux HPC admins to > build/maintain a highly specialized computer that would be the fastest > computer in the world for biochemistry applications (protein folding, > etc.). He's very secretive, but the word on the street is that it will > use custom processors (FPGAs?) specially designed for molecular bio/comp > chem calculations, and is being built at a site in upstate NY. > > Google him. He's an interesting fellow. Here's a few links to get you > started: > > http://en.wikipedia.org/wiki/David_E._Shaw > http://www.deshawresearch.com/ > http://www.deshawresearch.com/chiefscientist.html > http://www.motherjones.com/news/special_reports/mojo_400/43_shaw.html > > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From hearnsj at googlemail.com Fri Dec 5 00:56:09 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> <4938433F.1030002@ias.edu> Message-ID: <9f8092cc0812050056l3f7c65c8ob676bc132a2bb97@mail.gmail.com> 2008/12/5 Robert G. Brown > > He has had ads in computer magazines -- at least small ads in the back > -- for years and years. Usually for physicists and mathematicians. One > of the few people it looked like it would be interesting to work for, > actually. > > I was very interested in working for DE Shaw - they have an office in London, and it would have been an easy commute for me. It sounded an interesting place to work, and they obviously have some very bright people there. Sadly I didn't make it past the application stage. Que sera sera. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081205/b9dc4a65/attachment.html From eugen at leitl.org Fri Dec 5 04:48:43 2008 From: eugen at leitl.org (Eugen Leitl) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers Message-ID: <20081205124843.GM11544@leitl.org> (Well, duh). http://www.spectrum.ieee.org/nov08/6912 Multicore Is Bad News For Supercomputers By Samuel K. Moore Image: Sandia Trouble Ahead: More cores per chip will slow some programs [red] unless there?s a big boost in memory bandwidth [yellow With no other way to improve the performance of processors further, chip makers have staked their future on putting more and more processor cores on the same chip. Engineers at Sandia National Laboratories, in New Mexico, have simulated future high-performance computers containing the 8-core, 16?core, and 32-core microprocessors that chip makers say are the future of the industry. The results are distressing. Because of limited memory bandwidth and memory-management schemes that are poorly suited to supercomputers, the performance of these machines would level off or even decline with more cores. The performance is especially bad for informatics applications?data-intensive programs that are increasingly crucial to the labs? national security function. High-performance computing has historically focused on solving differential equations describing physical systems, such as Earth?s atmosphere or a hydrogen bomb?s fission trigger. These systems lend themselves to being divided up into grids, so the physical system can, to a degree, be mapped to the physical location of processors or processor cores, thus minimizing delays in moving data. But an increasing number of important science and engineering problems?not to mention national security problems?are of a different sort. These fall under the general category of informatics and include calculating what happens to a transportation network during a natural disaster and searching for patterns that predict terrorist attacks or nuclear proliferation failures. These operations often require sifting through enormous databases of information. For informatics, more cores doesn?t mean better performance [see red line in ?Trouble Ahead?], according to Sandia?s simulation. ?After about 8 cores, there?s no improvement,? says James Peery, director of computation, computers, information, and mathematics at Sandia. ?At 16 cores, it looks like 2.? Over the past year, the Sandia team has discussed the results widely with chip makers, supercomputer designers, and users of high-performance computers. Unless computer architects find a solution, Peery and others expect that supercomputer programmers will either turn off the extra cores or use them for something ancillary to the main problem. At the heart of the trouble is the so-called memory wall?the growing disparity between how fast a CPU can operate on data and how fast it can get the data it needs. Although the number of cores per processor is increasing, the number of connections from the chip to the rest of the computer is not. So keeping all the cores fed with data is a problem. In informatics applications, the problem is worse, explains Richard C. Murphy, a senior member of the technical staff at Sandia, because there is no physical relationship between what a processor may be working on and where the next set of data it needs may reside. Instead of being in the cache of the core next door, the data may be on a DRAM chip in a rack 20 meters away and need to leave the chip, pass through one or more routers and optical fibers, and find its way onto the processor. In an effort to get things back on track, this year the U.S. Department of Energy formed the Institute for Advanced Architectures and Algorithms. Located at Sandia and at Oak Ridge National Laboratory, in Tennessee, the institute?s work will be to figure out what high-performance computer architectures will be needed five to 10 years from now and help steer the industry in that direction. ?The key to solving this bottleneck is tighter, and maybe smarter, integration of memory and processors,? says Peery. For its part, Sandia is exploring the impact of stacking memory chips atop processors to improve memory bandwidth. The results, in simulation at least, are promising [see yellow line in ?Trouble Ahead From hahn at mcmaster.ca Fri Dec 5 05:44:43 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <20081205124843.GM11544@leitl.org> References: <20081205124843.GM11544@leitl.org> Message-ID: > (Well, duh). yeah - the point seems to be that we (still) need to scale memory along with core count. not just memory bandwidth but also concurrency (number of banks), though "ieee spectrum online for tech insiders" doesn't get into that kind of depth :( I still usually explain this as "traditional (ie Cray) supercomputing requires a balanced system." commodity processors are always less balanced than ideal, but to varying degrees. intel dual-socket quad-core was probably the worst for a long time, but things are looking up as intel joins AMD with memory connected to each socket. stacking memory on the processor is a red herring IMO, though they appear to assumed that the number of dram banks will scale linearly with cores. to me that sounds more like dram-based per-core cache. From prentice at ias.edu Fri Dec 5 05:47:53 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <9f8092cc0812050056l3f7c65c8ob676bc132a2bb97@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> <4938433F.1030002@ias.edu> <9f8092cc0812050056l3f7c65c8ob676bc132a2bb97@mail.gmail.com> Message-ID: <49393109.40300@ias.edu> John Hearns wrote: > > > 2008/12/5 Robert G. Brown > > > > He has had ads in computer magazines -- at least small ads in the back > -- for years and years. Usually for physicists and mathematicians. One > of the few people it looked like it would be interesting to work for, > actually. > > I was very interested in working for DE Shaw - they have an office in > London, and it would > have been an easy commute for me. It sounded an interesting place to > work, and they obviously > have some very bright people there. > Sadly I didn't make it past the application stage. Que sera sera. I read an article about D.E. Shaw that I wanted to link to, but couldn't find it. In that article, it said that not only is his company very secretive, but you don't apply there as much as they find you. They allegedly read all the academic journals and find the top scientists in the world, and then try to court them to work for D.E. Shaw. Being offered a job there is like winning a Nobel, allegedly. I guess that doesn't necessarily apply to sys admins, since they were advertising very heavily for them a couple of years ago. -- Prentice From rgb at phy.duke.edu Fri Dec 5 05:58:07 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <20081205124843.GM11544@leitl.org> References: <20081205124843.GM11544@leitl.org> Message-ID: On Fri, 5 Dec 2008, Eugen Leitl wrote: > > (Well, duh). Good article, though, thanks. Of course the same could have been written (and probably was) back when dual processors came out sharing a single memory bus, and for every generation since. The memory lag has been around forever -- multicores simply widen the gap out of step with Moore's Law (again). Intel and/or AMD people on list -- any words you want to say about a "road map" or other plan to deal with this? In the context of ordinary PCs the marginal benefit of additional cores after (say) four seems minimal as most desktop users don't need all that much parallelism -- enough to manage multimedia decoding in parallel with the OS base function in parallel with "user activity". Higher numbers of cores seem to be primarily of interest to H[A,PC] users -- stacks of VMs or server daemons, large scale parallel numerical computation. In both of these general arenas increasing cores/processor/memory channel beyond a critical limit that I think we're already at simply ensures that a significant number of your cores will be idling as they wait for memory access at any given time... rgb > > http://www.spectrum.ieee.org/nov08/6912 > > Multicore Is Bad News For Supercomputers > > By Samuel K. Moore > > Image: Sandia > > Trouble Ahead: More cores per chip will slow some programs [red] unless > there?s a big boost in memory bandwidth [yellow > > With no other way to improve the performance of processors further, chip > makers have staked their future on putting more and more processor cores on > the same chip. Engineers at Sandia National Laboratories, in New Mexico, have > simulated future high-performance computers containing the 8-core, 16?core, > and 32-core microprocessors that chip makers say are the future of the > industry. The results are distressing. Because of limited memory bandwidth > and memory-management schemes that are poorly suited to supercomputers, the > performance of these machines would level off or even decline with more > cores. The performance is especially bad for informatics > applications?data-intensive programs that are increasingly crucial to the > labs? national security function. > > High-performance computing has historically focused on solving differential > equations describing physical systems, such as Earth?s atmosphere or a > hydrogen bomb?s fission trigger. These systems lend themselves to being > divided up into grids, so the physical system can, to a degree, be mapped to > the physical location of processors or processor cores, thus minimizing > delays in moving data. > > But an increasing number of important science and engineering problems?not to > mention national security problems?are of a different sort. These fall under > the general category of informatics and include calculating what happens to a > transportation network during a natural disaster and searching for patterns > that predict terrorist attacks or nuclear proliferation failures. These > operations often require sifting through enormous databases of information. > > For informatics, more cores doesn?t mean better performance [see red line in > ?Trouble Ahead?], according to Sandia?s simulation. ?After about 8 cores, > there?s no improvement,? says James Peery, director of computation, > computers, information, and mathematics at Sandia. ?At 16 cores, it looks > like 2.? Over the past year, the Sandia team has discussed the results widely > with chip makers, supercomputer designers, and users of high-performance > computers. Unless computer architects find a solution, Peery and others > expect that supercomputer programmers will either turn off the extra cores or > use them for something ancillary to the main problem. > > At the heart of the trouble is the so-called memory wall?the growing > disparity between how fast a CPU can operate on data and how fast it can get > the data it needs. Although the number of cores per processor is increasing, > the number of connections from the chip to the rest of the computer is not. > So keeping all the cores fed with data is a problem. In informatics > applications, the problem is worse, explains Richard C. Murphy, a senior > member of the technical staff at Sandia, because there is no physical > relationship between what a processor may be working on and where the next > set of data it needs may reside. Instead of being in the cache of the core > next door, the data may be on a DRAM chip in a rack 20 meters away and need > to leave the chip, pass through one or more routers and optical fibers, and > find its way onto the processor. > > In an effort to get things back on track, this year the U.S. Department of > Energy formed the Institute for Advanced Architectures and Algorithms. > Located at Sandia and at Oak Ridge National Laboratory, in Tennessee, the > institute?s work will be to figure out what high-performance computer > architectures will be needed five to 10 years from now and help steer the > industry in that direction. > > ?The key to solving this bottleneck is tighter, and maybe smarter, > integration of memory and processors,? says Peery. For its part, Sandia is > exploring the impact of stacking memory chips atop processors to improve > memory bandwidth. > > The results, in simulation at least, are promising [see yellow line in > ?Trouble Ahead > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Fri Dec 5 06:22:12 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <49393109.40300@ias.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> <4938433F.1030002@ias.edu> <9f8092cc0812050056l3f7c65c8ob676bc132a2bb97@mail.gmail.com> <49393109.40300@ias.edu> Message-ID: On Fri, 5 Dec 2008, Prentice Bisbal wrote: > John Hearns wrote: >> >> >> 2008/12/5 Robert G. Brown > >> >> >> He has had ads in computer magazines -- at least small ads in the back >> -- for years and years. Usually for physicists and mathematicians. One >> of the few people it looked like it would be interesting to work for, >> actually. >> >> I was very interested in working for DE Shaw - they have an office in >> London, and it would >> have been an easy commute for me. It sounded an interesting place to >> work, and they obviously >> have some very bright people there. >> Sadly I didn't make it past the application stage. Que sera sera. > > I read an article about D.E. Shaw that I wanted to link to, but couldn't > find it. In that article, it said that not only is his company very > secretive, but you don't apply there as much as they find you. They > allegedly read all the academic journals and find the top scientists in > the world, and then try to court them to work for D.E. Shaw. Being > offered a job there is like winning a Nobel, allegedly. I guess that > doesn't necessarily apply to sys admins, since they were advertising > very heavily for them a couple of years ago. Sure, but remember the numbers. There are a LOT of scientists, and most of them are busy, too busy to move to NY and work for DES. I think that they used the ad as one way of learning about physics, math, etc Ph.D's who were self-selected interested and demonstrably computer geeks, because the ads appeared as one tiny box at the end of e.g. Sun Expert or Byte -- a cheap ad in the classified section, no pictures, very mysterious, clearly a think-tank sort of thing -- in every issue. Clearly looking for e.g. physicists who were into neural networks and complex systems and so on. Which I was and am, but a) I don't publish in the field -- they're the basis for my own entrepreneurial activities and "secret"; and b) I wouldn't live in NYC for literally any money in the world. If one made a million a year and spent most of it one could probably live decently and not save very much. I really like living in one of the most civilized enclaves in the world in NC. Wait, forget I said that. North Carolina is a TERRIBLE place to live. Nobody should ever move here. It's just awful, hot in the summer and cold in the winter, and who cares about basketball and the arts anyway? Yes, you're really better off living where you do already...;-) rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From james.p.lux at jpl.nasa.gov Fri Dec 5 06:53:54 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <49393109.40300@ias.edu> Message-ID: On 12/5/08 5:47 AM, "Prentice Bisbal" wrote: > John Hearns wrote: >> >> >> 2008/12/5 Robert G. Brown > >> >> >> He has had ads in computer magazines -- at least small ads in the back >> -- for years and years. Usually for physicists and mathematicians. One >> of the few people it looked like it would be interesting to work for, >> actually. >> >> I was very interested in working for DE Shaw - they have an office in >> London, and it would >> have been an easy commute for me. It sounded an interesting place to >> work, and they obviously >> have some very bright people there. >> Sadly I didn't make it past the application stage. Que sera sera. > > I read an article about D.E. Shaw that I wanted to link to, but couldn't > find it. In that article, it said that not only is his company very > secretive, but you don't apply there as much as they find you. They > allegedly read all the academic journals And this list? > and find the top scientists in > the world, and then try to court them to work for D.E. Shaw. Being > offered a job there is like winning a Nobel, allegedly. I guess that > doesn't necessarily apply to sys admins, since they were advertising > very heavily for them a couple of years ago. > From rgb at phy.duke.edu Fri Dec 5 07:08:33 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: References: Message-ID: On Fri, 5 Dec 2008, Lux, James P wrote: >> I read an article about D.E. Shaw that I wanted to link to, but couldn't >> find it. In that article, it said that not only is his company very >> secretive, but you don't apply there as much as they find you. They >> allegedly read all the academic journals > > And this list? Omygawsh. All right, will the D. E. Shaw spy please raise his or her hand? rgb > >> and find the top scientists in >> the world, and then try to court them to work for D.E. Shaw. Being >> offered a job there is like winning a Nobel, allegedly. I guess that >> doesn't necessarily apply to sys admins, since they were advertising >> very heavily for them a couple of years ago. >> > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From Dan.Kidger at quadrics.com Fri Dec 5 07:12:44 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger@quadrics.com) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <9f8092cc0812050056l3f7c65c8ob676bc132a2bb97@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <20081202232654.GA16003@nlxdcldnl2.cl.intel.com> <386fa5610812022325g3b8eb4fl185aa8936399beea@mail.gmail.com> <2A28BD5B-CD38-4DA4-AF77-8308848FEB7D@xs4all.nl> <4938433F.1030002@ias.edu> <9f8092cc0812050056l3f7c65c8ob676bc132a2bb97@mail.gmail.com> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A064D922B98@quadbrsex1.quadrics.com> I too had an interview with DE Shaw Research a while back before I took up my current position At the time they did not have a UK office, and moving to NY was out of the question for me. In recent times they have been more open - and even gave a Keynote talk at this years' ISC in Dresden. As presented, they are designing custom hardware for computational chemistry. The aim is not how many teraflops they can do, but how quickly they can do each timestep of the simulation (irrespective of the molecule size). Since to model reactions with computation chemistry you need billions of timesteps - simulations would take months/years of wallclock - even for a tiny molecule. The target is to push down the wallclock of a single timestep from say 1ms wallclock to perhaps 0.1us. With current interconnects being no better than 1us latency this must be quite a challenge. Daniel. From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of John Hearns Sent: 05 December 2008 08:56 To: beowulf@beowulf.org Subject: Re: [Beowulf] Intro question 2008/12/5 Robert G. Brown > He has had ads in computer magazines -- at least small ads in the back -- for years and years. Usually for physicists and mathematicians. One of the few people it looked like it would be interesting to work for, actually. I was very interested in working for DE Shaw - they have an office in London, and it would have been an easy commute for me. It sounded an interesting place to work, and they obviously have some very bright people there. Sadly I didn't make it past the application stage. Que sera sera. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081205/b7f55bbc/attachment.html From larry.stewart at sicortex.com Fri Dec 5 08:17:30 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: References: Message-ID: <4939541A.4000600@sicortex.com> I've been to a couple of DE Shaw talks and I always come away puzzled. It's tempting to conclude that they are just smarter than I am, but maybe they are just wrong. My understanding is they are building a special purpose molecular dynamics machine because it will be far faster than a general purpose machine programmed to do MD. In principle this might work, if you get the problem statement right, and you can design and build the machine before the general purpose machines catch up, and you don't make any mistakes, and after it is built you can keep designing new ones. In practice it always seems to take longer than you expected and cost more, and maybe that 7 bit ALU really has to be changed to an 8 bit ALU to keep the precision up. The most effective example I know of are the QCD machines like QCDOC that led to BlueGene, but it was far more general purpose than Shaw's machine. Trying it seems harmless, and a better use of excess capital than buying basketball teams or yachts, but it does divert smart people from other activities. Of course if they succeed I'll have been behind it all the way. -- -Larry / Sector IX From Dan.Kidger at quadrics.com Fri Dec 5 08:43:19 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger@quadrics.com) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <4939541A.4000600@sicortex.com> References: <4939541A.4000600@sicortex.com> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A064D922BA8@quadbrsex1.quadrics.com> If I had that much money, I too would try and buy a Nobel Prize in preference to a yacht. D. -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Lawrence Stewart Sent: 05 December 2008 16:18 To: Robert G. Brown Cc: Beowulf Mailing List; Lux, James P Subject: Re: [Beowulf] Intro question I've been to a couple of DE Shaw talks and I always come away puzzled. It's tempting to conclude that they are just smarter than I am, but maybe they are just wrong. My understanding is they are building a special purpose molecular dynamics machine because it will be far faster than a general purpose machine programmed to do MD. In principle this might work, if you get the problem statement right, and you can design and build the machine before the general purpose machines catch up, and you don't make any mistakes, and after it is built you can keep designing new ones. In practice it always seems to take longer than you expected and cost more, and maybe that 7 bit ALU really has to be changed to an 8 bit ALU to keep the precision up. The most effective example I know of are the QCD machines like QCDOC that led to BlueGene, but it was far more general purpose than Shaw's machine. Trying it seems harmless, and a better use of excess capital than buying basketball teams or yachts, but it does divert smart people from other activities. Of course if they succeed I'll have been behind it all the way. -- -Larry / Sector IX _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From james.p.lux at jpl.nasa.gov Fri Dec 5 08:59:22 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <4939541A.4000600@sicortex.com> Message-ID: On 12/5/08 8:17 AM, "Lawrence Stewart" wrote: > I've been to a couple of DE Shaw talks and I always come away puzzled. > > It's tempting to conclude that they are just smarter than I am, but > maybe they are just wrong. > > My understanding is they are building a special purpose molecular > dynamics machine because it will be far faster than a general purpose > machine programmed to do MD. > > In principle this might work, if you get the problem statement right, > and you can design and build the machine before the general purpose > machines catch up, and you don't make any mistakes, and after it is > built you can keep designing new ones. In practice it always seems to > take longer than you expected and cost more, and maybe that 7 bit ALU > really has to be changed to an 8 bit ALU to keep the precision up. If the machine is built of reconfigurable FPGAs, then such a change is pretty quick. If you have a basic hardware infrastructure, and just respin an ASIC, and that's a bit more time consuming, but not particularly expensive. e.g. Say it costs, in round numbers, $1M to do an ASIC. That's 2-3 work years labor costs, so in the overall scheme of things, it's not very expensive, in a relative way. If your overall research effort is, say, $20M/yr (which is big, but not huge), then budgeting for a complete machine rebuild every year is only 5-10%. If that gives you a factor of 3 speed increase, it's probably worth it. Think about it.. You check out your design in FPGAs to make sure it works, then do FPGA>ASIC and crank out a quick 10,000 customized processors, have them assembled into boards, fire it up and go. There are all sorts of economies of scale possible (if you're building 1000 PC boards, on an automated line, it's just not that expensive. For comparison, we regularly have prototype boards made with more than 20 layers and a dozen or so fairly high density parts (a couple Xilinx Virtex II FPGAs, RAMs, CPUs, etc.) and all the stuff around them. In single quantities, it might cost around $15K-$20K each to do these (parts cost included). If we were doing 100 of them, so we could spread the cost of the pick-and-place programming over all of them, etc., it would probably be down in the $5-10K/each range. Get into the 1000 unit quantities where it pays to go to a higher volume house, and you might be down in the few hundred bucks each to fab the board, and now you're just talking parts cost. Consider PC mobos.. The manufacturing cost (including parts) is well under $100. Now consider using that nifty compchem box to go examine thousands of possible drugs. Get a hit, and it can be a real money maker. Consider that Claritin was responsible for about $2B of Schering-Plough's revenue in just 2001. Plavix was almost $4B in 2005. That ED drug that starts with a V that we all get mail about was in the $1B/yr area, although its dropping. (One article comments that when it comes off patent in 2012 that they'll see a bump in sales:"Recreational use of the product could also be expected to generate substantial revenues.") In this context, spending $100M isn't a huge sum, now, is it. Jim From landman at scalableinformatics.com Fri Dec 5 09:00:18 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <0D49B15ACFDF2F46BF90B6E08C90048A064D922BA8@quadbrsex1.quadrics.com> References: <4939541A.4000600@sicortex.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BA8@quadbrsex1.quadrics.com> Message-ID: <49395E22.6090707@scalableinformatics.com> Dan.Kidger@quadrics.com wrote: > If I had that much money, I too would try and buy a Nobel Prize in preference to a yacht. > > D. > > > -----Original Message----- > From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Lawrence Stewart > Sent: 05 December 2008 16:18 > To: Robert G. Brown > Cc: Beowulf Mailing List; Lux, James P > Subject: Re: [Beowulf] Intro question > > I've been to a couple of DE Shaw talks and I always come away puzzled. > > It's tempting to conclude that they are just smarter than I am, but > maybe they are just wrong. > > My understanding is they are building a special purpose molecular > dynamics machine because it will be far faster than a general purpose > machine programmed to do MD. > > In principle this might work, if you get the problem statement right, > and you can design and build the machine before the general purpose > machines catch up, and you don't make any mistakes, and after it is > built you can keep designing new ones. In practice it always seems to > take longer than you expected and cost more, and maybe that 7 bit ALU > really has to be changed to an 8 bit ALU to keep the precision up. The MDGrape guys might have a thing or three to say. They have been demonstrating some pretty awesome performance for years. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From coutinho at dcc.ufmg.br Fri Dec 5 09:09:12 2008 From: coutinho at dcc.ufmg.br (Bruno Coutinho) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: References: <20081205124843.GM11544@leitl.org> Message-ID: 2008/12/5 Robert G. Brown > On Fri, 5 Dec 2008, Eugen Leitl wrote: > > >> (Well, duh). >> > > Good article, though, thanks. > > Of course the same could have been written (and probably was) back when > dual processors came out sharing a single memory bus, and for every > generation since. The memory lag has been around forever -- multicores > simply widen the gap out of step with Moore's Law (again). > > Intel and/or AMD people on list -- any words you want to say about a > "road map" or other plan to deal with this? In the context of ordinary > PCs the marginal benefit of additional cores after (say) four seems > minimal as most desktop users don't need all that much parallelism -- > enough to manage multimedia decoding in parallel with the OS base > function in parallel with "user activity". Higher numbers of cores seem > to be primarily of interest to H[A,PC] users -- stacks of VMs or server > daemons, large scale parallel numerical computation. Datamining is useful for both commercial and scientific world and is very data-intensive, so I think this issue will be adressed, or at least someone (Sun, for example) will build processors for data intensive applications that are more balanced, but several times more expensive. > In both of these > general arenas increasing cores/processor/memory channel beyond a > critical limit that I think we're already at simply ensures that a > significant number of your cores will be idling as they wait for > memory access at any given time... > > rgb > > > >> http://www.spectrum.ieee.org/nov08/6912 >> >> Multicore Is Bad News For Supercomputers >> >> By Samuel K. Moore >> >> Image: Sandia >> >> Trouble Ahead: More cores per chip will slow some programs [red] unless >> there's a big boost in memory bandwidth [yellow >> >> With no other way to improve the performance of processors further, chip >> makers have staked their future on putting more and more processor cores >> on >> the same chip. Engineers at Sandia National Laboratories, in New Mexico, >> have >> simulated future high-performance computers containing the 8-core, >> 16?core, >> and 32-core microprocessors that chip makers say are the future of the >> industry. The results are distressing. Because of limited memory bandwidth >> and memory-management schemes that are poorly suited to supercomputers, >> the >> performance of these machines would level off or even decline with more >> cores. The performance is especially bad for informatics >> applications?data-intensive programs that are increasingly crucial to the >> labs' national security function. >> >> High-performance computing has historically focused on solving >> differential >> equations describing physical systems, such as Earth's atmosphere or a >> hydrogen bomb's fission trigger. These systems lend themselves to being >> divided up into grids, so the physical system can, to a degree, be mapped >> to >> the physical location of processors or processor cores, thus minimizing >> delays in moving data. >> >> But an increasing number of important science and engineering problems?not >> to >> mention national security problems?are of a different sort. These fall >> under >> the general category of informatics and include calculating what happens >> to a >> transportation network during a natural disaster and searching for >> patterns >> that predict terrorist attacks or nuclear proliferation failures. These >> operations often require sifting through enormous databases of >> information. >> >> For informatics, more cores doesn't mean better performance [see red line >> in >> "Trouble Ahead"], according to Sandia's simulation. "After about 8 cores, >> there's no improvement," says James Peery, director of computation, >> computers, information, and mathematics at Sandia. "At 16 cores, it looks >> like 2." Over the past year, the Sandia team has discussed the results >> widely >> with chip makers, supercomputer designers, and users of high-performance >> computers. Unless computer architects find a solution, Peery and others >> expect that supercomputer programmers will either turn off the extra cores >> or >> use them for something ancillary to the main problem. >> >> At the heart of the trouble is the so-called memory wall?the growing >> disparity between how fast a CPU can operate on data and how fast it can >> get >> the data it needs. Although the number of cores per processor is >> increasing, >> the number of connections from the chip to the rest of the computer is >> not. >> So keeping all the cores fed with data is a problem. In informatics >> applications, the problem is worse, explains Richard C. Murphy, a senior >> member of the technical staff at Sandia, because there is no physical >> relationship between what a processor may be working on and where the next >> set of data it needs may reside. Instead of being in the cache of the core >> next door, the data may be on a DRAM chip in a rack 20 meters away and >> need >> to leave the chip, pass through one or more routers and optical fibers, >> and >> find its way onto the processor. >> >> In an effort to get things back on track, this year the U.S. Department of >> Energy formed the Institute for Advanced Architectures and Algorithms. >> Located at Sandia and at Oak Ridge National Laboratory, in Tennessee, the >> institute's work will be to figure out what high-performance computer >> architectures will be needed five to 10 years from now and help steer the >> industry in that direction. >> >> "The key to solving this bottleneck is tighter, and maybe smarter, >> integration of memory and processors," says Peery. For its part, Sandia is >> exploring the impact of stacking memory chips atop processors to improve >> memory bandwidth. >> >> The results, in simulation at least, are promising [see yellow line in >> "Trouble Ahead >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081205/a2264a98/attachment.html From diep at xs4all.nl Fri Dec 5 09:15:01 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: References: <20081205124843.GM11544@leitl.org> Message-ID: <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> Well every scientist who says he needs a lot of RAM now, ECC-DDR2 ram has a cost of near nothing right now. Very cheaply you can build nodes now with like 4 cheapo cpu's and 128 GB ram inside. There is no excuse for those who beg for big RAM to not buy a bunch of those nodes. What happens each time is that at the moment that finally the price of some sort of RAM drops (note that ECC-Registered DDR ram never has gotten cheap, much to my disappointment), that a newer generation RAM is there which again is really expensive. I tend to believe that many algorithms that require really a lot of ram can do with a bit less and profit from todays huge cpu power, using some clever tricks and enhancements and/or new algorithms (sometimes it is difficult to define what is a new algorithm, if it looks so much like a previous one with just a few new enhancements), which probably are far from trivial. Usually programming the 'new' algorithm efficiently low level is the big killerproblem why it doesn't get used yet (as there is no budget to hire people who are specialized here, or simply because they work for some other company or other government body). I would really argue that sometimes you have to give industry some time to mass produce memory, just design a new generation cpu based upon the RAM that's there now and just read massively parallel from that RAM. That also gives a HUGE bandwidth. If some older GPU based upon DDR3 ram claims 106GB/s bandwidth to RAM, versus todays Nehalem claims 32GB/s and is achieving a 17 to 18GB/s, then obviously it wasn't important enough for intel to give us more bandwidth to the RAM. If nvidia/amd GPU's can do it years before, and latest cpu is a factor 4+ off then discussions about bandwidth to RAM are quite artificial. The reason for that is the limitations of SPEC to RAM consumption. They design a benchmark years beforehand to use an amount of RAM that is "common" now. I would argue that those most hungry for bandwidth/core crunching power is the scientific world and/or safety research (air and car industry). Note that i'm speaking of streaming bandwidth above. Most scientists do not know the difference between bandwidth and latency, basically because they are right that in the end it is all bandwidth related from theoretical viewpoint. Yet practical there is so many factors influencing the latency. Intel/ AMD/IBM are doing big efforts of course to reduce latency a lot. Maybe 95% of all their work onto a cpu (blindfolded guess from a computer science guy - so not hardware designer)? In the end it is all about the testsets in spec. If we manage to get a bunch of real WELL OPTIMIZED low level codes that eat gigabytes of RAM finally into that spec then within years AMD and Intel will show up with some real fast cpu's for scientific workloads. If all "professors" type RGB make a lot of noise world wide to get that done, then they have to follow. Any criticism against intel and amd with respect to: "why not do this and that", i'm doing it also all the time, but at the same time if you look to what happens in spec, spec is only about "who has the best compiler and the biggest L2 cache that nearly can contain the entire working set size of this tiny RAM program". Get some serious software into SPEC i'd argue. To start looking at myself: the reason i didn't donate Diep is because competitors can also obtain my code, whereas all those compiler and hardware manufacturers i don't care if they have my proggies source code. Vincent On Dec 5, 2008, at 2:44 PM, Mark Hahn wrote: >> (Well, duh). > > yeah - the point seems to be that we (still) need to scale memory > along with core count. not just memory bandwidth but also concurrency > (number of banks), though "ieee spectrum online for tech insiders" > doesn't get into that kind of depth :( > > I still usually explain this as "traditional (ie Cray) supercomputing > requires a balanced system." commodity processors are always less > balanced > than ideal, but to varying degrees. intel dual-socket quad-core > was probably the worst for a long time, but things are looking up > as intel > joins AMD with memory connected to each socket. > > stacking memory on the processor is a red herring IMO, though they > appear > to assumed that the number of dram banks will scale linearly with > cores. > to me that sounds more like dram-based per-core cache. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From gdjacobs at gmail.com Fri Dec 5 10:24:10 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: References: <20081205124843.GM11544@leitl.org> Message-ID: <493971CA.9000307@gmail.com> Bruno Coutinho wrote: > Datamining is useful for both commercial and scientific world and is > very data-intensive, so I think this issue will be addressed, or at least > someone (Sun, for example) will build processors for data intensive > applications that are more balanced, but several times more expensive. Here's some current hardware which is superior to the norm in terms of I/O (so the manufacturers would claim). http://en.wikipedia.org/wiki/POWER6 http://en.wikipedia.org/wiki/Ultrasparc http://en.wikipedia.org/wiki/Itanium For those with really deep pockets: http://en.wikipedia.org/wiki/NEC_SX-9 Q: Do Hitachi and Fujitsu still do vector machines? I guess my point is that the article itself is a little fluffy. This is just the old problem of the kernel size overflowing cache/memory boundaries inconveniently. The answer is always more I/O and tighter integration. -- Geoffrey D. Jacobs From prentice at ias.edu Fri Dec 5 10:30:36 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> References: <20081205124843.GM11544@leitl.org> <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> Message-ID: <4939734C.9060604@ias.edu> Vincent Diepeveen wrote: > Very cheaply you can build nodes now with like 4 cheapo cpu's > and 128 GB ram inside. > Not exactly. 2 GB DIMMs are cheap, but as soon as you go to larger DIMMs (4 GB, 8 GB, etc.), the price goes up exponentially. Less than a year ago, we purchased a couple of server with 32 GB RAM. We then wanted to purchase one with 64 GB RAM. The cost of the system tripled! Instead, we bought 3 more 32 GB systems. Dell and others advertise systems that support up to 128 GB RAM, but I have yet to meet someone who can afford to put all 128 GB RAM in a single box. -- Prentice From lindahl at pbm.com Fri Dec 5 10:53:04 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <4939734C.9060604@ias.edu> References: <20081205124843.GM11544@leitl.org> <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> <4939734C.9060604@ias.edu> Message-ID: <20081205185304.GA27201@bx9> On Fri, Dec 05, 2008 at 01:30:36PM -0500, Prentice Bisbal wrote: > Not exactly. 2 GB DIMMs are cheap, but as soon as you go to larger DIMMs > (4 GB, 8 GB, etc.), the price goes up exponentially. My last quote for 2GB and 4GB dimms was linear. -- greg From landman at scalableinformatics.com Fri Dec 5 10:57:25 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <4939734C.9060604@ias.edu> References: <20081205124843.GM11544@leitl.org> <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> <4939734C.9060604@ias.edu> Message-ID: <49397995.4080701@scalableinformatics.com> Prentice Bisbal wrote: > Vincent Diepeveen wrote: > >> Very cheaply you can build nodes now with like 4 cheapo cpu's >> and 128 GB ram inside. >> > > Not exactly. 2 GB DIMMs are cheap, but as soon as you go to larger DIMMs > (4 GB, 8 GB, etc.), the price goes up exponentially. > > Less than a year ago, we purchased a couple of server with 32 GB RAM. We > then wanted to purchase one with 64 GB RAM. The cost of the system > tripled! Instead, we bought 3 more 32 GB systems. The cost of 64GB went down quite recently. We have sold quite a few of these at this size due to memory cost drops. > > Dell and others advertise systems that support up to 128 GB RAM, but I > have yet to meet someone who can afford to put all 128 GB RAM in a > single box. We have seen a few. Its not as expensive as you think. 256 GB ... yeah thats more. And again, if you don't mind a quick plug for ScaleMP (we are not currently a reseller of theirs, we make no money from them), you can tie a few machines together (various constraints) with lower cost memory/components, and get some nice sized single system images. Of course YMMV. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From lindahl at pbm.com Fri Dec 5 11:03:10 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Intro question In-Reply-To: <4939541A.4000600@sicortex.com> References: <4939541A.4000600@sicortex.com> Message-ID: <20081205190309.GB27201@bx9> On Fri, Dec 05, 2008 at 11:17:30AM -0500, Lawrence Stewart wrote: > In principle this might work, if you get the problem statement right, > and you can design and build the machine before the general purpose > machines catch up, and you don't make any mistakes, and after it is > built you can keep designing new ones. In practice it always seems to > take longer than you expected and cost more, and maybe that 7 bit ALU > really has to be changed to an 8 bit ALU to keep the precision up. I've seen David talk about this machine a couple of time, and he addressed this issue: he realizes it's risky, and he was hoping to advance the state of the art by 5 years over a commodity cluster. While I was at D. E. Shaw (1996), the most effective headhunter for the systems department was the guy who cold-called sysadmins at good computer science departments. My office-mate was formally a sysadmin at Princeton. He still lived there, too; rgb might want to keep mass transit in mind when he's dissing living near NYC. For the strategies, they mostly hired folks who'd just finished degrees in the hard sciences. -- greg From prentice at ias.edu Fri Dec 5 11:31:15 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <49397995.4080701@scalableinformatics.com> References: <20081205124843.GM11544@leitl.org> <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> <4939734C.9060604@ias.edu> <49397995.4080701@scalableinformatics.com> Message-ID: <49398183.3060901@ias.edu> Joe Landman wrote: > Prentice Bisbal wrote: >> Vincent Diepeveen wrote: >> >>> Very cheaply you can build nodes now with like 4 cheapo cpu's >>> and 128 GB ram inside. >>> >> >> Not exactly. 2 GB DIMMs are cheap, but as soon as you go to larger DIMMs >> (4 GB, 8 GB, etc.), the price goes up exponentially. >> >> Less than a year ago, we purchased a couple of server with 32 GB RAM. We >> then wanted to purchase one with 64 GB RAM. The cost of the system >> tripled! Instead, we bought 3 more 32 GB systems. > > The cost of 64GB went down quite recently. We have sold quite a few of > these at this size due to memory cost drops. > My experience was 6-9 months ago. >> >> Dell and others advertise systems that support up to 128 GB RAM, but I >> have yet to meet someone who can afford to put all 128 GB RAM in a >> single box. > > We have seen a few. Its not as expensive as you think. 256 GB ... yeah > thats more. > > And again, if you don't mind a quick plug for ScaleMP (we are not > currently a reseller of theirs, we make no money from them), you can tie > a few machines together (various constraints) with lower cost > memory/components, and get some nice sized single system images. Of > course YMMV. > > Funny you mention ScaleMP. I was thinking the same thing, but forgot to mention it in my last e-mail. I was talking to them at SC08. about that sort of thing. I wonder of cost of ScaleMP is less than large RAM premium. -- Prentice From rpnabar at gmail.com Fri Dec 5 16:21:50 2008 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] mpi error: mca_oob_tcp_accept: accept() failed: Too many open files (24). Message-ID: I'm getting huge logs with repeated errors of this sort: mca_oob_tcp_accept: accept() failed: Too many open files (24). I googled a bit and see that this seems to be an MPI complaint about too many files open. I checked ulimit and that says "unlimited" Any tips about what I ought to be looking at; I'm a bit lost as to how I can get to the source of this particular error. Furthermore it does not occur systematically but only once in a while. Unfortunately once it happens it seems to be a catastrophe! Any suggestions? -- Rahul -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081205/a39de30c/attachment.html From niftyompi at niftyegg.com Fri Dec 5 16:36:21 2008 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <20081205124843.GM11544@leitl.org> References: <20081205124843.GM11544@leitl.org> Message-ID: <20081206003621.GB3134@compegg.wr.niftyegg.com> On Fri, Dec 05, 2008 at 01:48:43PM +0100, Eugen Leitl wrote: > > (Well, duh). > > http://www.spectrum.ieee.org/nov08/6912 > > Multicore Is Bad News For Supercomputers > Where do GPUs fit in this? On the surface a handful of cores in a system with decent cache would quickly displace the need for GPUs and would have about as simple a programming+ compiler model as can be had today. Additional cores are not magic but can set the stage for better math and IO libraries. Me I would rather see more transistors thrown at 128+ bit math. In the back of my mind I suspect that current 64bit IEEE math is getting in the way of global science (Weather, Global warming...). Perhaps 128 integer math ops would be a better place to start. And, Any day now we may need 256 bit integers to manage the national debt. -- T o m M i t c h e l l Found me a new hat, now what? From jhh3851 at yahoo.com Fri Dec 5 18:01:10 2008 From: jhh3851 at yahoo.com (Joseph Han) Date: Thu Mar 18 01:08:07 2010 Subject: [Beowulf] Re: Intro question In-Reply-To: <200812051709.mB5H9iZl030028@bluewest.scyld.com> Message-ID: <692200.60027.qm@web55007.mail.re4.yahoo.com> > > Message: 1 > Date: Fri, 5 Dec 2008 08:59:22 -0800 > From: "Lux, James P" > Subject: Re: [Beowulf] Intro question > To: Lawrence Stewart , "Robert G. Brown" > > Cc: Beowulf Mailing List > Message-ID: > Content-Type: text/plain; charset="iso-8859-1" > > > > > On 12/5/08 8:17 AM, "Lawrence Stewart" wrote: > > > I've been to a couple of DE Shaw talks and I always come away puzzled. > > > > It's tempting to conclude that they are just smarter than I am, but > > maybe they are just wrong. > > > > My understanding is they are building a special purpose molecular > > dynamics machine because it will be far faster than a general purpose > > machine programmed to do MD. > > > > In principle this might work, if you get the problem statement right, > > and you can design and build the machine before the general purpose > > machines catch up, and you don't make any mistakes, and after it is > > built you can keep designing new ones. In practice it always seems to > > take longer than you expected and cost more, and maybe that 7 bit ALU > > really has to be changed to an 8 bit ALU to keep the precision up. > > If the machine is built of reconfigurable FPGAs, then such a change is > pretty quick. > > If you have a basic hardware infrastructure, and just respin an ASIC, and > that's a bit more time consuming, but not particularly expensive. e.g. Say > it costs, in round numbers, $1M to do an ASIC. That's 2-3 work years labor > costs, so in the overall scheme of things, it's not very expensive, in a > relative way. If your overall research effort is, say, $20M/yr (which is > big, but not huge), then budgeting for a complete machine rebuild every year > is only 5-10%. If that gives you a factor of 3 speed increase, it's > probably worth it. > > Think about it.. You check out your design in FPGAs to make sure it works, > then do FPGA>ASIC and crank out a quick 10,000 customized processors, have > them assembled into boards, fire it up and go. There are all sorts of > economies of scale possible (if you're building 1000 PC boards, on an > automated line, it's just not that expensive. For comparison, we regularly > have prototype boards made with more than 20 layers and a dozen or so fairly > high density parts (a couple Xilinx Virtex II FPGAs, RAMs, CPUs, etc.) and > all the stuff around them. In single quantities, it might cost around > $15K-$20K each to do these (parts cost included). If we were doing 100 of > them, so we could spread the cost of the pick-and-place programming over all > of them, etc., it would probably be down in the $5-10K/each range. Get into > the 1000 unit quantities where it pays to go to a higher volume house, and > you might be down in the few hundred bucks each to fab the board, and now > you're just talking parts cost. > > Consider PC mobos.. The manufacturing cost (including parts) is well under > $100. > > Now consider using that nifty compchem box to go examine thousands of > possible drugs. Get a hit, and it can be a real money maker. Consider that > Claritin was responsible for about $2B of Schering-Plough's revenue in just > 2001. Plavix was almost $4B in 2005. That ED drug that starts with a V that > we all get mail about was in the $1B/yr area, although its dropping. (One > article comments that when it comes off patent in 2012 that they'll see a > bump in sales:"Recreational use of the product could also be expected to > generate substantial revenues.") > > In this context, spending $100M isn't a huge sum, now, is it. > > > > Jim > > They've actually become quite a bit more transparent lately because I think that they are close to "releasing" a product. Their website actually has quite a bit of detail now: http://www.deshawresearch.com/publications.html This paper was a good introduction IMHO: David E. Shaw, Martin M. Deneroff, Ron O. Dror, Jeffrey S. Kuskin, Richard H. Larson, John K. Salmon, Cliff Young, Brannon Batson, Kevin J. Bowers, Jack C. Chao, Michael P. Eastwood, Joseph Gagliardo, J.P. Grossman, C. Richard Ho, Douglas J. Ierardi, Istv?n Kolossv?ry, John L. Klepeis, Timothy Layman, Christine McLeavey, Mark A. Moraes, Rolf Mueller, Edward C. Priest, Yibing Shan, Jochen Spengler, Michael Theobald, Brian Towles, and Stanley C. Wang, "Anton, A Special-Purpose Machine for Molecular Dynamics Simulation," Communications of the ACM, vol. 51, no. 7, July 2008, pp. 91?97. Text And there is even a free link on their website. Joseph From bill at cse.ucdavis.edu Fri Dec 5 18:32:24 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: References: <20081205124843.GM11544@leitl.org> Message-ID: <4939E438.4040803@cse.ucdavis.edu> Mark Hahn wrote: >> (Well, duh). > > yeah - the point seems to be that we (still) need to scale memory > along with core count. Which seems to be happening. Suddenly designers can get more real world performance by adding bandwidth. This isn't new in the GPU world of course where ATI and Nvidia have been selling devices for $250-$600 with 70-140GB/sec. This is however rather new for CPUs, Intel's been dominating the market with sub 10GB/sec memory systems for some time now, while AMD has had > 10GB/sec for er, 3 generations now to little effect. So the older machines had less cores and were more sensitive to latency (and the resulting nasty laws of physics) are transforming into bandwidth limited problems that are very friendly to multicore. So now Intel's shipping a CPU that can run 8 threads and suddenly has 2-3 times the memory bandwidth. Suddenly intel's gone from trailing AMD by a factor of 2 or more to matching AMD dual sockets with a single socket. AMD dual socket shanghai: Number of Threads requested = 8 Function Rate (MB/s) Avg time Min time Max time Copy: 21638.5276 0.0370 0.0370 0.0371 Scale: 21605.3675 0.0371 0.0370 0.0371 Add: 21451.1315 0.0560 0.0559 0.0562 Triad: 21399.5102 0.0562 0.0561 0.0563 techreport.com reports 21GB/sec on sandra memory bandwidth with a core i7 and 3 x 1333 MHz. If anyone has a core i7 around I'd be interested in the stream numbers. > not just memory bandwidth but also concurrency Indeed, so now amd dual sockets have 4 memory systems, Intel single sockets have 3. Not familiar with ATI/Nvidia details but I assume to make useful of 100-140GB/sec memory systems that they much have a high degree of parallelism. AMD dual socket shanghai: min threads=1 max threads=8 pagesize=4096 cacheline=64 Each threads will access a 262144 KB array 20 times 1 thread(s), a random cacheline per 73.31 ns, 73.31 ns per thread 2 thread(s), a random cacheline per 37.45 ns, 74.90 ns per thread. 4 thread(s), a random cacheline per 19.28 ns, 77.11 ns per thread. 8 thread(s), a random cacheline per 9.84 ns, 78.74 ns per thread. > (number of banks), though "ieee spectrum online for tech insiders" > doesn't get into that kind of depth :( > > I still usually explain this as "traditional (ie Cray) supercomputing > requires a balanced system." commodity processors are always less balanced > than ideal, but to varying degrees. If you ignore multicore bandwidth and the effective use of bandwidth (read that as application performance) is going up. Who cares if the "unbalanced" machines are running at 5% of peak, as long as HPC application performance (more closely tied to bandwidth) keeps increasing. > intel dual-socket quad-core was > probably the worst for a long time, but things are looking up as intel > joins AMD with memory connected to each socket. Indeed, so maybe bandwidth will become more of a design constraint. Possibly a fixed amount of memory per CPU, surface mounted memory, and memory busses wider than is practical with the traditional socket with 4-6 dimms a few inches away.... till it's feasible to put ram and CPU on the same die anyways... IRam here we come. In the mean time maybe motherboards will start looking like more video cards. So maybe something like: * 32-64 cores per socket, less than 5 GHz * 4 GB of high speed ram ( > 150GB/sec) per socket * multiple hypertransport like connections to slower memory From lindahl at pbm.com Fri Dec 5 19:20:25 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <4939E438.4040803@cse.ucdavis.edu> References: <20081205124843.GM11544@leitl.org> <4939E438.4040803@cse.ucdavis.edu> Message-ID: <20081206032025.GA26076@bx9> On Fri, Dec 05, 2008 at 06:32:24PM -0800, Bill Broadley wrote: > This is however rather new for CPUs, Intel's been dominating the market with > sub 10GB/sec memory systems for some time now, while AMD has had > 10GB/sec > for er, 3 generations now to little effect. Hey, now, that's a huge overgeneralization. The HPC people who bought AMD after Core2 came out mostly did so for memory bandwidth reasons. Before that, AMD was better on flops as well as stream. -- greg From richard.walsh at comcast.net Fri Dec 5 20:52:35 2008 From: richard.walsh at comcast.net (richard.walsh@comcast.net) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <20081205124843.GM11544@leitl.org> Message-ID: <1216473113.715371228539155366.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> All, Yes, the stacked DRAM stuff is interesting.? Anyone visit the siXis booth at SC08?? They are stacking DRAM and FPGA dies directly onto SiCBs (Silicon Circuits Boards).? This allows for dramatically more IOs per chip and finer traces throughout?the board which is small, but made entirely of silicon.? They promise better byte/flop ratios and more total memory per unit volume. rbw ----- Original Message ----- From: "Eugen Leitl" To: info@postbiota.org, Beowulf@beowulf.org Sent: Friday, December 5, 2008 7:48:43 AM GMT -05:00 US/Canada Eastern Subject: [Beowulf] Multicore Is Bad News For Supercomputers (Well, duh). http://www.spectrum.ieee.org/nov08/6912 Multicore Is Bad News For Supercomputers By Samuel K. Moore Image: Sandia Trouble Ahead: More cores per chip will slow some programs [red] unless there?s a big boost in memory bandwidth [yellow With no other way to improve the performance of processors further, chip makers have staked their future on putting more and more processor cores on the same chip. Engineers at Sandia National Laboratories, in New Mexico, have simulated future high-performance computers containing the 8-core, 16?core, and 32-core microprocessors that chip makers say are the future of the industry. The results are distressing. Because of limited memory bandwidth and memory-management schemes that are poorly suited to supercomputers, the performance of these machines would level off or even decline with more cores. The performance is especially bad for informatics applications?data-intensive programs that are increasingly crucial to the labs? national security function. High-performance computing has historically focused on solving differential equations describing physical systems, such as Earth?s atmosphere or a hydrogen bomb?s fission trigger. These systems lend themselves to being divided up into grids, so the physical system can, to a degree, be mapped to the physical location of processors or processor cores, thus minimizing delays in moving data. But an increasing number of important science and engineering problems?not to mention national security problems?are of a different sort. These fall under the general category of informatics and include calculating what happens to a transportation network during a natural disaster and searching for patterns that predict terrorist attacks or nuclear proliferation failures. These operations often require sifting through enormous databases of information. For informatics, more cores doesn?t mean better performance [see red line in ?Trouble Ahead?], according to Sandia?s simulation. ?After about 8 cores, there?s no improvement,? says James Peery, director of computation, computers, information, and mathematics at Sandia. ?At 16 cores, it looks like 2.? Over the past year, the Sandia team has discussed the results widely with chip makers, supercomputer designers, and users of high-performance computers. Unless computer architects find a solution, Peery and others expect that supercomputer programmers will either turn off the extra cores or use them for something ancillary to the main problem. At the heart of the trouble is the so-called memory wall?the growing disparity between how fast a CPU can operate on data and how fast it can get the data it needs. Although the number of cores per processor is increasing, the number of connections from the chip to the rest of the computer is not. So keeping all the cores fed with data is a problem. In informatics applications, the problem is worse, explains Richard C. Murphy, a senior member of the technical staff at Sandia, because there is no physical relationship between what a processor may be working on and where the next set of data it needs may reside. Instead of being in the cache of the core next door, the data may be on a DRAM chip in a rack 20 meters away and need to leave the chip, pass through one or more routers and optical fibers, and find its way onto the processor. In an effort to get things back on track, this year the U.S. Department of Energy formed the Institute for Advanced Architectures and Algorithms. Located at Sandia and at Oak Ridge National Laboratory, in Tennessee, the institute?s work will be to figure out what high-performance computer architectures will be needed five to 10 years from now and help steer the industry in that direction. ?The key to solving this bottleneck is tighter, and maybe smarter, integration of memory and processors,? says Peery. For its part, Sandia is exploring the impact of stacking memory chips atop processors to improve memory bandwidth. The results, in simulation at least, are promising [see yellow line in ?Trouble Ahead _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081206/7074cbb0/attachment.html From james.p.lux at jpl.nasa.gov Fri Dec 5 21:18:38 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <1216473113.715371228539155366.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> Message-ID: On 12/5/08 8:52 PM, "richard.walsh@comcast.net" wrote: > Yes, the stacked DRAM stuff is interesting. Anyone visit the siXis booth at > SC08? They are stacking DRAM and FPGA dies directly onto SiCBs (Silicon > Circuits Boards). This allows for dramatically more IOs per chip and finer > traces throughout the board which is small, but made entirely of silicon. > They > promise better byte/flop ratios and more total memory per unit volume. > > 3dplus also does this sort of thing. They take standard Ics and machine the package and stack them with little PC boards on the leads. We use this sort of thing in space applications to get the density up. Particularly if you're looking at something like Flash or DRAM, where the power dissipation isn't huge, it's a very clever idea. 3dplus has done much more sophisticated stacks, too. From csamuel at vpac.org Sat Dec 6 04:03:07 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <460867941.3143441228564946208.JavaMail.root@mail.vpac.org> Message-ID: <1195007640.3143461228564987619.JavaMail.root@mail.vpac.org> ----- "Eugen Leitl" wrote: > (Well, duh). Hmm, they seem to be rehashing the whole SC'06 "Multicore: Breakthrough or Breakdown?" session which looked at this through various peoples eyes. Presentations by the various speakers (including a certain list admin) here: http://www.cct.lsu.edu/~tron/SC06.html I'm wondering though if we're starting to see a subtle shift in direction with more and more emphasis getting placed on accelerators (mainly GPGPU, but including Cell, FPGA's, etc) ? If OpenCL can deliver on its promise of a hardware independent platform that's open source then perhaps it could assist with the proliferation of cores, considering that GPGPUs aren't known for a large RAM:core ratio ? Caveat: I'm a sysadmin, not a programmer, so be gentle.. :) -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Sat Dec 6 04:21:35 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <1064990643.3143521228565902389.JavaMail.root@mail.vpac.org> Message-ID: <276455194.3143541228566095953.JavaMail.root@mail.vpac.org> ----- "Prentice Bisbal" wrote: > Dell and others advertise systems that support up > to 128 GB RAM, but I have yet to meet someone who > can afford to put all 128 GB RAM in a single box. The geophysics people at Monash University who we do a lot of work with got one back in mid 2007 for the AuScope project. :-) http://www.xenon.com.au/press/releases/?i=5 cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From franz.marini at mi.infn.it Sat Dec 6 06:13:15 2008 From: franz.marini at mi.infn.it (Franz Marini) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <1195007640.3143461228564987619.JavaMail.root@mail.vpac.org> References: <1195007640.3143461228564987619.JavaMail.root@mail.vpac.org> Message-ID: <1228572795.15695.15.camel@merlino.mi.infn.it> On Sat, 2008-12-06 at 23:03 +1100, Chris Samuel wrote: > I'm wondering though if we're starting to see a > subtle shift in direction with more and more > emphasis getting placed on accelerators (mainly > GPGPU, but including Cell, FPGA's, etc) ? Starting ? Am I the only one remembering accelerators boards (based on FPGA, Transputers, Motorola 88k, Intel i960, various DSPs and other processors) being produced and advertised in, e.g., Byte magazine back in the 80s and early 90s ? The problems with those solutions have always been the extremely proprietary nature of the products, and therefore the lack of libraries and (community) support, and last but not least, cost. Things are better now with, say, CUDA, mainly because of the huge installed base and the low cost. OpenCL may shape to be an interesting solution. Should someone develop a, e.g., FPGA-based accelerator board, he would (only) need to support OpenCL to overcome all, except maybe cost, the problems that plagued the older solutions I mentioned before... Interesting times ahead ;) F. --------------------------------------------------------- Franz Marini Prof. R. A. Broglia Theoretical Physics of Nuclei, Atomic Clusters and Proteins Research Group Dept. of Physics, University of Milan, Italy. web : http://merlino.mi.infn.it/proteins/ email : franz.marini@mi.infn.it phone : +39 02 50317226 --------------------------------------------------------- From james.p.lux at jpl.nasa.gov Sat Dec 6 07:26:20 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <1228572795.15695.15.camel@merlino.mi.infn.it> Message-ID: On 12/6/08 6:13 AM, "Franz Marini" wrote: > > > On Sat, 2008-12-06 at 23:03 +1100, Chris Samuel wrote: >> I'm wondering though if we're starting to see a >> subtle shift in direction with more and more >> emphasis getting placed on accelerators (mainly >> GPGPU, but including Cell, FPGA's, etc) ? > > Starting ? Am I the only one remembering accelerators boards (based on > FPGA, Transputers, Motorola 88k, Intel i960, various DSPs and other > processors) being produced and advertised in, e.g., Byte magazine back > in the 80s and early 90s ? > > The problems with those solutions have always been the extremely > proprietary nature of the products, and therefore the lack of libraries > and (community) support, and last but not least, cost. I don't think "proprietary" is quite the right word here, at least in the sense of a closed architecture. A lot of those coprocessor boards had complete documentation and anyone who knew how to program, say, a TMS320, could use them. I think the real problem was that they were always sort of niche products (often, a commercial product derived from a specific custom device meeting a specific custom need) and unless you had just the right problem to solve, they didn't buy you very much in performance. The other problem was toolchains. Back then, there was no gnu tool chain. The FPGA folks (like xilinx and altera) were using the ASIC design model for their tools (i.e. Charge a huge amount, because they save enough engineer time over graph paper and rubylith that you can charge a FTE's wages as an annual license fee and still come out ahead). The boards themselves weren't particularly expensive compared to other add-on boards for your PC or (dare I say it) S-100 chassis. (I note that some of these things are really still available, at least in functionally similar form. A lot of FPGA development is done on various cards that plug into a PCI bus..See the offerings from, e.g., Nallatech) > > Things are better now with, say, CUDA, mainly because of the huge > installed base and the low cost. That's exactly it. The special purpose hardware has become commodity. > > OpenCL may shape to be an interesting solution. Should someone develop > a, e.g., FPGA-based accelerator board, he would (only) need to support > OpenCL to overcome all, except maybe cost, the problems that plagued the > older solutions I mentioned before... My general impression is that it is an order of magnitude more difficult to build a FPGA solution for a given computational problem than for a general purpose CPU/VonNeumann style machine. So, you're not going to see compilers that take an algorithm description (at a high level) and crank out optimized FPGA bitstreams any time soon. After all, we've had 50 years to do compilers for conventional architectures. (I'm not talking here about generating code for a CPU instantiated on an FPGA.. I'm talking purpose specific gate designs). There are high level design tools for FPGAs (Signal Processing Workbench, etc.) but they're hardly common or cheap. For all intents and purposes, doing FPGA designs today is basically like coding in assembler on a bare machine with no operating system, etc. There are libraries of standard components available under GPL (e.g. Gaisler's GRLIB), but it's still pretty low level. (in software terms: Oh, we've got MACROS in our assembler! And include files! And a linker!) Jim From mathog at caltech.edu Sat Dec 6 11:05:34 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] cloning issue, hidden module dependency Message-ID: Short version: there is an odd problem cloning a Mandriva 2008.1 system (2.6.24.7 kernel) from one type of motherboard to another. (Source Tyan S2466 Athlon MP, destination Gigabyte Athlon XP). On previous releases all that was needed was: 1. image /boot and / (S2466->Gigabyte) 2. put in appropriate /etc/modprobe.conf and /etc/modprobe.preload 3. run lilo 4. reboot Here that doesn't work. First, I obtained the proper files by doing a fresh install (with as few packages as possible) on the gigabyte system, tested that it would reboot ok, then saving /etc before doing the preceding steps, copying the two modprobe files from the saved /etc. Once treated as above, the new system (gigabyte) tries to start using a module from the old system (amd74xx, it should use via82cxxx) and it is all downhill from there. I tried making a zero length /etc/sysconfig/harddrake2/previous_hw (which initially contained the amd74 string), and replacing the entire /etc/udev directory structure from the saved one, neither of which made any difference. I could not even find the string amd74 elsewhere in the cloned /etc, other than in modprobe.conf.s2466 and modprobe.preload.s2466, which were also present but should not have had any effect. More details here: http://groups.google.com/group/alt.os.linux.mandriva/browse_thread/thread/e83093381c57058d?hl=en&q=mathog+modprobe.conf#50fbb0e96cc09180 Or google in groups for: mathog "where is Mandriva 2008.1 hiding" Contents of modprobe.conf: alias eth0 8139too install usb-interface /sbin/modprobe uhci_hcd; /bin/true install ide-controller /sbin/modprobe via82cxxx; /bin/true alias pci:v000010ECd00008139sv000010ECsd00008139bc02sc00i00 8139too Contents of modprobe.preload: via_agp Contents of lilo.conf default="linux" boot=/dev/hda map=/boot/map install=menu menu-scheme=wb:bw:wb:bw compact prompt nowarn timeout=100 message=/boot/message image=/boot/vmlinuz label="linux" root="/dev/hda3" initrd=/boot/initrd.img append=" resume=/dev/hda2" image=/boot/vmlinuz label="failsafe" root="/dev/hda3" initrd=/boot/initrd.img append=" failsafe" Am I missing something obvious? Any ideas where else the amd74xx module load command might be hidden away? Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From hahn at mcmaster.ca Sat Dec 6 11:33:21 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] cloning issue, hidden module dependency In-Reply-To: References: Message-ID: > Any ideas where else the amd74xx module load command might be hidden away? in the initrd, I bet... From csamuel at vpac.org Sun Dec 7 19:33:57 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <1678446336.3242931228707093445.JavaMail.root@mail.vpac.org> Message-ID: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Hi folks, We've been tearing our hair out over this for a little while and so I'm wondering if anyone else has seen anything like this before, or has any thoughts about what could be happening ? Very occasionally we find one of our Barcelona nodes with a SuperMicro H8DM8-2 motherboard powered off. IPMI reports it as powered down too. No kernel panic, no crash, nothing in the system logs. Nothing in the IPMI logs either, it's just sitting there as if someone has yanked the power cable (and we're pretty sure that's not the cause!). There had not been any discernible pattern to the nodes affected, and we've only a couple nodes where it's happened twice, the rest only have had it happen once and scattered over the 3 racks of the cluster. For the longest time we had no way to reproduce it, but then we noticed that for 3 of the power off's there was a particular user running Fluent on there. They've provided us with a copy of their problem and we can (often) reproduce it now with that problem. Sometimes it'll take 30 minutes or so, sometimes it'll take 4-5 hours, sometimes it'll take 3 days or so and sometimes it won't do it at all. It doesn't appear to be thermal issues as (a) there's nothing in the IPMI logs about such problems and (b) we inject CPU and system temperature into Ganglia and we don't see anything out of the ordinary in those logs. :-( We've tried other codes, including HPL, and Advanced Clustering's Breakin PXE version, but haven't managed to (yet) get one of the nodes to fail with anything except Fluent. :-( The only oddity about Fluent is that it's the only code on the system that uses HP-MPI, but we used the command line switches to tell it to use the Intel MPI it ships with and it did the same then too! I just cannot understand what is special about Fluent, or even how a user code could cause a node to just turn off without a trace in the logs. Obviously we're pursuing this through the local vendor and (through them) SuperMicro, but to be honest we're all pretty stumped by this. Does anyone have any bright ideas ? cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From award at uda.ad Mon Dec 8 02:39:39 2008 From: award at uda.ad (Alan Ward) Date: Thu Mar 18 01:08:08 2010 Subject: RS: [Beowulf] Odd SuperMicro power off issues References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Message-ID: Hi. Dunno if this is a bright idea, but what about the power supply temperature? There are usually no measurements done in there, and a hot power supply could easily have a thermal fuse that gets tripped. It maybe worthwhile trying with a different power box, if possible with a higher power rating. Cheers, -Alan -----Missatge original----- De: beowulf-bounces@beowulf.org en nom de Chris Samuel Enviat el: dl. 08/12/2008 04:33 Per a: Beowulf List A/c: David Bannon; Brett Pemberton Tema: [Beowulf] Odd SuperMicro power off issues Hi folks, We've been tearing our hair out over this for a little while and so I'm wondering if anyone else has seen anything like this before, or has any thoughts about what could be happening ? Very occasionally we find one of our Barcelona nodes with a SuperMicro H8DM8-2 motherboard powered off. IPMI reports it as powered down too. No kernel panic, no crash, nothing in the system logs. Nothing in the IPMI logs either, it's just sitting there as if someone has yanked the power cable (and we're pretty sure that's not the cause!). There had not been any discernible pattern to the nodes affected, and we've only a couple nodes where it's happened twice, the rest only have had it happen once and scattered over the 3 racks of the cluster. For the longest time we had no way to reproduce it, but then we noticed that for 3 of the power off's there was a particular user running Fluent on there. They've provided us with a copy of their problem and we can (often) reproduce it now with that problem. Sometimes it'll take 30 minutes or so, sometimes it'll take 4-5 hours, sometimes it'll take 3 days or so and sometimes it won't do it at all. It doesn't appear to be thermal issues as (a) there's nothing in the IPMI logs about such problems and (b) we inject CPU and system temperature into Ganglia and we don't see anything out of the ordinary in those logs. :-( We've tried other codes, including HPL, and Advanced Clustering's Breakin PXE version, but haven't managed to (yet) get one of the nodes to fail with anything except Fluent. :-( The only oddity about Fluent is that it's the only code on the system that uses HP-MPI, but we used the command line switches to tell it to use the Intel MPI it ships with and it did the same then too! I just cannot understand what is special about Fluent, or even how a user code could cause a node to just turn off without a trace in the logs. Obviously we're pursuing this through the local vendor and (through them) SuperMicro, but to be honest we're all pretty stumped by this. Does anyone have any bright ideas ? cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081208/e40e8550/attachment.html From atchley at myri.com Mon Dec 8 04:10:06 2008 From: atchley at myri.com (Scott Atchley) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Message-ID: <03CF324A-B4E7-4942-9595-97E9EB8AF5CF@myri.com> Hi Chris, We had a customer with Opterons experience reboots with nothing in the logs, etc. The only thing we saw with "ipmitool sel list" was: 1 | 11/13/2007 | 10:49:44 | System Firmware Error | We traced to a HyperTransport deadlock, which by default reboots the node. Our engineer found this AMD note: reset through sync-flooding is described in chapter "13.15 Error Handling" in the following document: http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/32559.pdf When we changed the default PCI setting for this option (0x50) to off (i.e. no reboot, 0x40), the node did not reboot but it did hang and required a IPMI reboot. Our working assumption is that the traffic of one particular application running over our NICs induced some pattern of traffic that caused a flow-control deadlock in HT. Scott On Dec 7, 2008, at 10:33 PM, Chris Samuel wrote: > Hi folks, > > We've been tearing our hair out over this for a little > while and so I'm wondering if anyone else has seen anything > like this before, or has any thoughts about what could be > happening ? > > Very occasionally we find one of our Barcelona nodes with > a SuperMicro H8DM8-2 motherboard powered off. IPMI reports > it as powered down too. > > No kernel panic, no crash, nothing in the system logs. > > Nothing in the IPMI logs either, it's just sitting there > as if someone has yanked the power cable (and we're pretty > sure that's not the cause!). > > There had not been any discernible pattern to the nodes > affected, and we've only a couple nodes where it's happened > twice, the rest only have had it happen once and scattered > over the 3 racks of the cluster. > > For the longest time we had no way to reproduce it, but then > we noticed that for 3 of the power off's there was a particular > user running Fluent on there. They've provided us with a copy > of their problem and we can (often) reproduce it now with that > problem. Sometimes it'll take 30 minutes or so, sometimes it'll > take 4-5 hours, sometimes it'll take 3 days or so and sometimes > it won't do it at all. > > It doesn't appear to be thermal issues as (a) there's nothing in > the IPMI logs about such problems and (b) we inject CPU and system > temperature into Ganglia and we don't see anything out of the > ordinary in those logs. :-( > > We've tried other codes, including HPL, and Advanced Clustering's > Breakin PXE version, but haven't managed to (yet) get one of the > nodes to fail with anything except Fluent. :-( > > The only oddity about Fluent is that it's the only code on > the system that uses HP-MPI, but we used the command line > switches to tell it to use the Intel MPI it ships with and > it did the same then too! > > I just cannot understand what is special about Fluent, > or even how a user code could cause a node to just turn > off without a trace in the logs. > > Obviously we're pursuing this through the local vendor > and (through them) SuperMicro, but to be honest we're > all pretty stumped by this. > > Does anyone have any bright ideas ? > > cheers, > Chris > -- > Christopher Samuel - (03) 9925 4751 - Systems Manager > The Victorian Partnership for Advanced Computing > P.O. Box 201, Carlton South, VIC 3053, Australia > VPAC is a not-for-profit Registered Research Agency > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From larry.stewart at sicortex.com Mon Dec 8 04:10:21 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Message-ID: <3D3A6B5B-4C93-4ADA-87A6-2048C0C3B210@sicortex.com> I agree with Alan, this sort of sounds like power. Proving it might be difficult, but some ideas are: * Use a different PS on a unit you can make fail * Reduce the power demand somehow: unplug memories, disks, whatever is unpluggable that you don't need The ugliest idea I have is that fluent might have a pattern of power demand that is resonant with something in the power system, so it causes cyclical voltage or current demand that trips the power supply. Proving that could be really hard. These are multicore processors so does it depend on how many of them are running fluent and how many are doing something else? -L From smulcahy at aplpi.com Mon Dec 8 04:59:00 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Message-ID: <493D1A14.7060300@aplpi.com> Chris Samuel wrote: > Very occasionally we find one of our Barcelona nodes with > a SuperMicro H8DM8-2 motherboard powered off. IPMI reports > it as powered down too. Hi Chris, We had a similar exerience with one of our compute nodes - intermittent power-offs when running our model and absolutely nothing in the logs. I modified Ganglia to track voltage and temp in an effort to see if anything unusual happened to those before-hand but there was no discernable trends. I can memtest86+ a number of times on the problem node and neither it nor mcelog showed any problems. Subsequent to that, I found aBIOS upgrade for those systems which included an Opteron microcode update to fix an AMD processor erratum (sp?) - I can dig out the details if the specific problem is of interest. Around the same time, we finally started to see memory errors, so we also replaced the bad mmory in the system. Unfortunately I can't tell you which was responsible for fixing the problem. My understanding is that Fluent is quite memory and I/O intensive - do you run other equally intensive models without seeing the failure? Anyways, in summary - if you're totally stumped - try swapping out the memory and/or rolling to the latest firmware and see if that improves the stability. -stephen -- Stephen Mulcahy Applepie Solutions Ltd. http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From Bogdan.Costescu at iwr.uni-heidelberg.de Mon Dec 8 05:29:45 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Message-ID: On Mon, 8 Dec 2008, Chris Samuel wrote: > Very occasionally we find one of our Barcelona nodes with > a SuperMicro H8DM8-2 motherboard powered off. IPMI reports > it as powered down too. > > No kernel panic, no crash, nothing in the system logs. So IPMI still works ? Then this is _not_ like yanking the power cable, in which case IPMI would not work anymore. I've seen this exact behaviour (computer is off, IPMI works and reports that the computer is off) being triggered by computational loads on SuperMicro H8QC8. I've had several nodes and I was able to swap power supplies - the problem moved with the power supplies, so exchanging the "faulty" ones made this behaviour disappear. There is no Fluent running here, but other codes like Gromacs that are known to load the system quite well. The power supplies are supposed to deliver a max. of 1KW for a system with 4 Opteron 875, 8GB RAM and 2 internal disks. The "turning off" behaviour was also quite random, sometimes appearing within an hour, sometimes taking hours-days; it has started to appear about 5-6 months after the nodes were purchased. I still have one node where this occurs so rarely (about once a month) that it's not accepted as an excuse for exchange ;-( -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From gerry.creager at tamu.edu Mon Dec 8 05:51:32 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Message-ID: <493D2664.2030703@tamu.edu> Bogdan Costescu wrote: > On Mon, 8 Dec 2008, Chris Samuel wrote: > >> Very occasionally we find one of our Barcelona nodes with >> a SuperMicro H8DM8-2 motherboard powered off. IPMI reports >> it as powered down too. >> >> No kernel panic, no crash, nothing in the system logs. > > So IPMI still works ? Then this is _not_ like yanking the power cable, > in which case IPMI would not work anymore. > > I've seen this exact behaviour (computer is off, IPMI works and reports > that the computer is off) being triggered by computational loads on > SuperMicro H8QC8. I've had several nodes and I was able to swap power > supplies - the problem moved with the power supplies, so exchanging the > "faulty" ones made this behaviour disappear. There is no Fluent running > here, but other codes like Gromacs that are known to load the system > quite well. The power supplies are supposed to deliver a max. of 1KW for > a system with 4 Opteron 875, 8GB RAM and 2 internal disks. The "turning > off" behaviour was also quite random, sometimes appearing within an > hour, sometimes taking hours-days; it has started to appear about 5-6 > months after the nodes were purchased. I still have one node where this > occurs so rarely (about once a month) that it's not accepted as an > excuse for exchange ;-( Continuing on the thread of power-related issues, this is beginning to sound like a thermal-related mechanical problem. In the power industry it is common to assume that there is a finite life for circuit breakers based on the number of times they cycle (are tripped and reset). I'm extrapolating here, as I've not had time to track down my power supply guru and ask him... however, some time back there was a company that introduced the "polyfuse" which is a thermal-trip breaker that auto-resets after it trips, upon cooling down. I used a number of these years ago while at NASA, and saw some evidence of a phenomenon similar to the breaker limited life scenario described above. I'm wondering if there might be a single voltage that's over-taxed and that opening a breaker in that supply might cause the halt-to-quiescent while leaving IPMI alive... gerry -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From prentice at ias.edu Mon Dec 8 05:58:03 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <276455194.3143541228566095953.JavaMail.root@mail.vpac.org> References: <276455194.3143541228566095953.JavaMail.root@mail.vpac.org> Message-ID: <493D27EB.2060606@ias.edu> Chris Samuel wrote: > ----- "Prentice Bisbal" wrote: > >> Dell and others advertise systems that support up >> to 128 GB RAM, but I have yet to meet someone who >> can afford to put all 128 GB RAM in a single box. > > The geophysics people at Monash University who we do > a lot of work with got one back in mid 2007 for the > AuScope project. :-) > > http://www.xenon.com.au/press/releases/?i=5 Technically, my statement is still correct - I haven't met anyone from Monash University. Being that Monash U. is in Australia and I'm in the US, my statement may still be true for some time. ;) -- Prentice From Bogdan.Costescu at iwr.uni-heidelberg.de Mon Dec 8 06:11:04 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Intro question In-Reply-To: References: Message-ID: On Fri, 5 Dec 2008, Lux, James P wrote: > Now consider using that nifty compchem box to go examine thousands > of possible drugs. I don't think that this is the main purpose of the DEShaw machine. Thousands of independent trials can be done today with thousands of machines: f.e. one trial per machine, much like Monte Carlo calculations. Machines like the DEShaw one are useful in 2 cases: - simulating a much larger system, getting to the point of simulating whole cells or at least mitochondria - simulating a much longer period of simulated time in the same real time, as processes of biological interest happen on timescales that are many orders of magnitude larger than what can be currently simulated. In each of these cases there is only one simulation running - a very fine grained one... -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From Bogdan.Costescu at iwr.uni-heidelberg.de Mon Dec 8 06:20:31 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Intro question In-Reply-To: <49395E22.6090707@scalableinformatics.com> References: <4939541A.4000600@sicortex.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BA8@quadbrsex1.quadrics.com> <49395E22.6090707@scalableinformatics.com> Message-ID: On Fri, 5 Dec 2008, Joe Landman wrote: > The MDGrape guys might have a thing or three to say. They have been > demonstrating some pretty awesome performance for years. True, but my impression was that they were focused on getting the most performance from one unit, while the DEShaw approach factored in a high degree of parallelization from the beginning. I've heard the talk in Dresden earlier this year and I liked hearing about an idea that I've also had some time ago but not heard talking about on this list: interconnect hardware being able to DMA directly to/from CPU cache. I don't know how useful such a feature is for a general purpose interconnect (or MPI library) but it certainly fits well in the specialized frame of molecular dynamics (or rather, of how MD is implemented today). -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From r.rankin at qub.ac.uk Mon Dec 8 06:22:18 2008 From: r.rankin at qub.ac.uk (Richard Rankin) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] RE: moab In-Reply-To: <8E50F960A9F3F6448D39155B882544E817D4FE2E@EX2K7-VIRT-1.ads.qub.ac.uk> References: <8E50F960A9F3F6448D39155B882544E817D4FE2E@EX2K7-VIRT-1.ads.qub.ac.uk> Message-ID: <8E50F960A9F3F6448D39155B882544E818717F79@EX2K7-VIRT-1.ads.qub.ac.uk> I will have funding available to purchase some new clusters in the new year. I was hoping to be able to have a cluster with a mix of Linux and windows nodes so that the mix could be varied depending on the work load. I have been pointed to http://www.clusterresources.com/pages/products/moab-hybrid-cluster.php Has anyone any experience of this Ricky ______________________ Principal Analyst Information Services Queen's University Belfast tel: 02890 974824 fax: 02890 976586 email: r.rankin@qub.ac.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081208/cab575ba/attachment.html From Bogdan.Costescu at iwr.uni-heidelberg.de Mon Dec 8 07:00:40 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <493D2664.2030703@tamu.edu> References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> <493D2664.2030703@tamu.edu> Message-ID: On Mon, 8 Dec 2008, Gerry Creager wrote: > I'm wondering if there might be a single voltage that's over-taxed > and that opening a breaker in that supply might cause the > halt-to-quiescent while leaving IPMI alive... I don't quite understand the "opening a breaker in that supply might cause the halt-to-quiescent" part, but just to clear up some of the things I've written before: if the computer crashes, I expect IPMI to tell me that the computer is in "on" power state; I might be able to use the IPMI console redirection to see what (if any) is printed on the console, like OOM, kernel oops, etc. In the behaviour that I have described previously for these SuperMicro boards, IPMI reported power to be "off", similar to the result of running "/sbin/poweroff" from Linux or sending a "power off" IPMI command; at the end of these 2 commands there is no console output anymore as the CPU is powered off. So somehow the BMC was notified that the computer is not "on" anymore... or maybe it was the BMC which made the decision to turn off in the fisrt place. -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From larry.stewart at sicortex.com Mon Dec 8 07:19:17 2008 From: larry.stewart at sicortex.com (Lawrence Stewart) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Intro question In-Reply-To: References: <4939541A.4000600@sicortex.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BA8@quadbrsex1.quadrics.com> <49395E22.6090707@scalableinformatics.com> Message-ID: <493D3AF5.2090806@sicortex.com> Bogdan Costescu wrote: > On Fri, 5 Dec 2008, Joe Landman wrote: > >> The MDGrape guys might have a thing or three to say. They have been >> demonstrating some pretty awesome performance for years. > > True, but my impression was that they were focused on getting the most > performance from one unit, while the DEShaw approach factored in a > high degree of parallelization from the beginning. > > I've heard the talk in Dresden earlier this year and I liked hearing > about an idea that I've also had some time ago but not heard talking > about on this list: interconnect hardware being able to DMA directly > to/from CPU cache. I don't know how useful such a feature is for a > general purpose interconnect (or MPI library) but it certainly fits > well in the specialized frame of molecular dynamics (or rather, of how > MD is implemented today). > Well the NIC should read from cache or update the cache if the data happens to be there. Don't all well designed I/O systems do that? A mathematician woke up one night to find his wastebasket on fire. He poured water into it and went back to sleep. The next night, he woke up again to find the desk lamp on fire, so he put it in the wastebasket, reducing the issue to a previously solved problem. Flushing caches for I/O is like that. -- -Larry / Sector IX From patrick at myri.com Mon Dec 8 07:21:03 2008 From: patrick at myri.com (Patrick Geoffray) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Intro question In-Reply-To: References: <4939541A.4000600@sicortex.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BA8@quadbrsex1.quadrics.com> <49395E22.6090707@scalableinformatics.com> Message-ID: <493D3B5F.3030208@myri.com> Bogdan Costescu wrote: > about on this list: interconnect hardware being able to DMA directly > to/from CPU cache. I don't know how useful such a feature is for a You can do something similar today using Direct Cache Access (DCA) on (recent) Intel chips with IOAT. It's an indirect cache access, you tag a DMA to automatically prefetch the data in the L3 of a specific socket. It does nothing for latency, since polling will fetch the cache line just as fast, but it works well if there is a delay between the data being delivered and the data being used. The best example is a communication overlapped by computation: cache prefetching is overlapped as well, no more memory latency. Patrick From herborn at usna.edu Mon Dec 8 08:12:15 2008 From: herborn at usna.edu (Steve Herborn) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> Message-ID: <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> Good day to the group. I would like to make a brief introduction to myself and raise my first question to the forum. My name is Steve Herborn and I am a new employee at the United States Naval Academy in the Advanced Research Computing group which supports the IT systems used for faculty research. Part of my responsibilities will be the care & feeding of our Beowulf Cluster which is a commercially procured Cluster from Aspen Systems. It purchased & installed about four or five years ago. As delivered the system was originally configured with two Head nodes each with 32 compute nodes. One head node was running SUSE 9.x and the other Head Node was running Scyld (version unknown) also with 32 compute nodes. While I don't know all of the history, apparently this system was not very actively maintain and had numerous hardware & software issues, to include losing the array on which Scyld was installed. Prior to my arrival a decision was made to reconfigure the system from having two different head nodes running two different OS Distributions to one Head Node controlling all 64 Compute Nodes. In addition SUSE Linux Enterprise Server (10SP2) (X86-64) was selected as the OS for all of the nodes. Now on to my question which will more then likely be the first of many. In the collective group wisdom what would be the most efficient & effective way to "push" the SLES OS out to all of the compute nodes once it is fully installed & configured on the Head Node. In my research I've read about various Cluster packages/distributions that have that capability built in, such as ROCKS & OSCAR which appear to have the innate capability to do this as well as some additional tools that would be very nice to use in managing the system. However, from my current research in appears that they do not support SLES 10sp2 for the AMD 64-bit Architecture (although since I am so new at this I could be wrong). Are there any other "free" (money is always an issue) products or methodologies I should be looking at to push the OS out & help me manage the system? It appears that a commercial product Moab Cluster Builder will do everything I need & more, but I do not have the funds to purchase a solution. I also certainly do not want to perform a manual OS install on all 64 Compute Nodes. Thanks in advance for any & all help, advice, guidance, or pearls of wisdom that you can provide this Neophyte. Oh and please don't ask why SLES 10sp2, I've already been through that one with management. It is what I have been provided & will make work. Steven A. Herborn U.S. Naval Academy Advanced Research Computing 410-293-6480 (Desk) 757-418-0505 (Cell) -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081208/abb20a3a/attachment.html From hearnsj at googlemail.com Mon Dec 8 09:14:20 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> Message-ID: <9f8092cc0812080914t27a2c87o606d6d02c7650c98@mail.gmail.com> 2008/12/8 Steve Herborn > * While I don't know all of the history, apparently this system was not > very actively maintain and had numerous hardware & software issues, to > include losing the array on which Scyld was installed. * > Cough. Splutter. Out of the mouths of babes etc... Seriously, have you thought of either: a) arranging a convenient fire which (sadly) gets oh-so-close to this system (*) b) contacting Aspen Systems and seeing to what extent they will still support this system c) as in my previous email, tell your bosses this system is too old/unreliable and get pricing for a new one (*) Not a good idea in Great Britain, where arson in the Queen's Dockyard is still a hanging offence, and I'd bet the judges would say that a Naval Academy was part of a Dockyard. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081208/b6b7568f/attachment.html From hearnsj at googlemail.com Mon Dec 8 09:40:29 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> Message-ID: <9f8092cc0812080940k47176cfpaf0fa551d660b567@mail.gmail.com> Aha! Are you referring to the OS Detect scripts in: /opt/oscar/lib/OSCAR/OCA/OS_Detect? On an SGI Tempo system there are the following: CentOS.pm Debian.pm Mandriva.pm RedHat.pm SLES.pm ScientificLinux.pm SuSE.pm A quick look reveals that SLES.pm looks for different strings in /etc/SuSE-release - which of course it should. Can anyone confirm/deny if SLES.pm is present in a 'plain vanilla' Oscar install? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081208/0094fd87/attachment.html From landman at scalableinformatics.com Mon Dec 8 10:34:44 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> Message-ID: <493D68C4.8060400@scalableinformatics.com> Steve Herborn wrote: > > > Good day to the group. I would like to make a brief introduction to > myself and raise my first question to the forum. > > > > My name is Steve Herborn and I am a new employee at the United States > Naval Academy in the Advanced Research Computing group which supports Greetings Steve > the IT systems used for faculty research. Part of my responsibilities > will be the care & feeding of our Beowulf Cluster which is a > commercially procured Cluster from Aspen Systems. It purchased & > installed about four or five years ago. As delivered the system was > originally configured with two Head nodes each with 32 compute nodes. > One head node was running SUSE 9.x and the other Head Node was running > // Scyld (version unknown) also with 32 compute nodes. While I don?t > know all of the history, apparently this system was not very actively > maintain and had numerous hardware & software issues, to include losing > the array on which Scyld was installed. //Prior to my arrival a Ouch ... if you call the good folks at Aspen, they could help with that (ping me if you need a contact) > decision was made to reconfigure the system from having two different > head nodes running two different OS Distributions to one Head Node > controlling all 64 Compute Nodes. In addition SUSE Linux Enterprise > Server (10SP2) (X86-64) was selected as the OS for all of the nodes. Ok. > Now on to my question which will more then likely be the first of many. > In the collective group wisdom what would be the most efficient & Danger Will Robinson ... for the N people who answer, you are likely to get N+2 answers, and N/2 arguments going ... not a bad thing, but to steal from the Perl motto "there is more than one way to do these things ..." > effective way to ?push? the SLES OS out to all of the compute nodes once > it is fully installed & configured on the Head Node. In my research First: Stateless (e.g. diskless) versus Stateful (e.g. local installation). Scyld is "stateless" though Don will likely correct me (as this is massively oversimpilfied). SuSE can be installed Stateless or Stateful. Its installation can be automated ... we have been doing this for years (one of the few vendors to have done this with SuSE). It can also be run diskless ... we have booted compute nodes with Infiniband to fully operational compute nodes visible in all aspects within the cluster in under 60 seconds. This is the case for 9.3, 10.x SuSE flavors. > I?ve read about various Cluster packages/distributions that have that > capability built in, such as ROCKS & OSCAR which appear to have the > innate capability to do this as well as some additional tools that would > be very nice to use in managing the system. However, from my current > research in appears that they do not support SLES 10sp2 for the AMD Rocks only supports Redhat and rebuilds, I wouldn't recommend it for the task as you have indicated. Oscar might be able to handle this, though I haven't kept up on it, so I am not sure how active it is. You want to look at xCat v2 (open source), and Warewulf/Perceus (open source). Our package (Tiburon) is not ready to be released, and we will likely make it a meta package atop Perceus at some point soon. Though it is used in production at several large commercial companies specifically for SuSE clusters. > 64-bit Architecture (although since I am so new at this I could be > wrong). Are there any other ?free? (money is always an issue) products > or methodologies I should be looking at to push the OS out & help me > manage the system? It appears that a commercial product Moab Cluster See above. If you want a prepackaged system, likely you are going to need to spend money. Moab is a possibility, though for SuSE, I would recommend looking at Concurrent Thinking's appliance. It will cost money, but they solve pretty much all of the problems for you. > Builder will do everything I need & more, but I do not have the funds to > purchase a solution. I also certainly do not want to perform a manual > OS install on all 64 Compute Nodes. No... in all likelihood, you really don't want to do any installation to the nodes (stateless if possible). > > > > Thanks in advance for any & all help, advice, guidance, or pearls of > wisdom that you can provide this Neophyte. Oh and please don?t ask why > SLES 10sp2, I?ve already been through that one with management. It is > what I have been provided & will make work. It's not an issue, though we recommend better kernels/kernel updates. Compared to the RHEL kernels, it uses modern stuff. Joe > > > > > > ** Steven A. Herborn ** > > * * U.S. * * ** Naval Academy ** > > ** Advanced Research Computing ** > > ** 410-293-6480 (Desk) ** > > ** 757-418-0505 (Cell) **** ** > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From gus at ldeo.columbia.edu Mon Dec 8 10:45:23 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> Message-ID: <493D6B43.2090004@ldeo.columbia.edu> Hello Steve and list In the likely case that the original vendor will no longer support this 5-year old cluster, you can try installing the Rocks cluster suite, which is free from SDSC, and you already came across to: http://www.rocksclusters.org/wordpress/ This would be a path or least resistance, and may get your cluster up and running again with relatively small effort. Of course there are many other solutions, but they may require more effort from the system administrator. Rocks is well supported and documented. It is based on CentOS (free version of RHEL). There is no support for SLES on Rocks, so if you must keep the current OS distribution, it won't work for you. I read your last paragraph, but you may argue with your bosses that the age of this machine doesn't justify being picky about the particular OS flavor. Bringing it back to life, making it an useful asset, with a free software stack, would be a great benefit. You would spend money only in application software (e.g. Fortran compiler, Matlab, etc). Other solutions (e.g. Moab) will cost money, and may not work with this old hardware. Sticking to SLES may be a catch-22, a shot on the foot. Rocks has a relatively large user base, and an active mailing list for help. Moreover, for Rocks minimally you must have 1GB of RAM on every node, two Ethernet ports on the head node, and one Ethernet port on each compute node. Check the hardware you have. Although PXE boot capability is not strictly required, it makes installation much easier. Check your motherboard and BIOS. I have a small cluster made of five salvaged Dell Precision 410 (dual Pentium III) running Rocks 4.3, and it works well. For old hardware Rocks is a very good solution, requiring a modest investment of time, and virtually no money. (In my case I only had to buy cheap SOHO switches and Ethernet cables, but you probably already have switches.) If you are going to run parallel programs with MPI, the cheapest thing would be to have GigE ports and switches. I wouldn't invest on fancier interconnect on such an old machine. (Do you have any fancier interconnect already, say Myrinet?) However, you can buy cheap GigE NICs for $15-$20, and high end ones (say Intel Pro 1000) for $30 or less. This would be needed only if you don't have GigE ports on the nodes already. Probably your motherboards have dual GigE ports, I don't know. MPI over 100T Ethernet is a real pain, don't do it, unless you are a masochist. A 64-port GigE switch to support MPI traffic would also be a worthwhile investment. Keeping MPI on a separate network, distinct from the I/O and cluster control net, is a good thing. It avoids contention and improves performance. A natural precaution would be to backup all home directories before you start, and any precious data or filesystems. I suggest sorting out the hardware issues before anything else. It would be good to evaluate the status of your RAID, and perhaps use that particular node as a separate storage appliance. You can try just rebuilding the RAID, and see if it works, or perhaps replace the defective disk(s), if the RAID controller is still good. Another thing to look at is how functional your Ethernet (or GigE) switch or switches are, and if you have more than one switch how they are/can be connected to each other. (One for the whole cluster? Two or more separate? Some specific topology connecting many switches?) I hope this helps, Gus Correa -- --------------------------------------------------------------------- Gustavo J. Ponce Correa, PhD - Email: gus@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- Steve Herborn wrote: > Good day to the group. I would like to make a brief introduction to > myself and raise my first question to the forum. > > My name is Steve Herborn and I am a new employee at the United States > Naval Academy in the Advanced Research Computing group which supports > the IT systems used for faculty research. Part of my responsibilities > will be the care & feeding of our Beowulf Cluster which is a > commercially procured Cluster from Aspen Systems. It purchased & > installed about four or five years ago. As delivered the system was > originally configured with two Head nodes each with 32 compute nodes. > One head node was running SUSE 9.x and the other Head Node was running > //Scyld (version unknown) also with 32 compute nodes. While I don?t > know all of the history, apparently this system was not very actively > maintain and had numerous hardware & software issues, to include > losing the array on which Scyld was installed. //Prior to my arrival a > decision was made to reconfigure the system from having two different > head nodes running two different OS Distributions to one Head Node > controlling all 64 Compute Nodes. In addition SUSE Linux Enterprise > Server (10SP2) (X86-64) was selected as the OS for all of the nodes. > > Now on to my question which will more then likely be the first of > many. In the collective group wisdom what would be the most efficient > & effective way to ?push? the SLES OS out to all of the compute nodes > once it is fully installed & configured on the Head Node. In my > research I?ve read about various Cluster packages/distributions that > have that capability built in, such as ROCKS & OSCAR which appear to > have the innate capability to do this as well as some additional tools > that would be very nice to use in managing the system. However, from > my current research in appears that they do not support SLES 10sp2 for > the AMD 64-bit Architecture (although since I am so new at this I > could be wrong). Are there any other ?free? (money is always an issue) > products or methodologies I should be looking at to push the OS out & > help me manage the system? It appears that a commercial product Moab > Cluster Builder will do everything I need & more, but I do not have > the funds to purchase a solution. I also certainly do not want to > perform a manual OS install on all 64 Compute Nodes. > > Thanks in advance for any & all help, advice, guidance, or pearls of > wisdom that you can provide this Neophyte. Oh and please don?t ask why > SLES 10sp2, I?ve already been through that one with management. It is > what I have been provided & will make work. > > **Steven A. Herborn** > > **U.S.**** Naval Academy** > > **Advanced Research Computing** > > **410-293-6480 (Desk)** > > **757-418-0505 (Cell)****** > >------------------------------------------------------------------------ > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From djholm at fnal.gov Mon Dec 8 11:05:19 2008 From: djholm at fnal.gov (Don Holmgren) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> References: <1782429857.3242961228707237079.JavaMail.root@mail.vpac.org> Message-ID: Hi Chris - We've had similar problems on two different clusters using Barcelonas with two different motherboards. Our new cluster uses SuperMicro TwinU's (two H8DMT-INF+ motherboards in each) and was delivered in early November. Out of the roughly 590 motherboards, we had maybe 20 that powered down under load. Like yours, IPMI was still working, and so we could power these up remotely. For nearly all of these, swapping memory fixed the problem. For systems that multiple memory swaps did not fix the problem, the vendor swapped motherboards. I do not believe we've had to swap a power supply yet for this. On an older, smaller cluster, which uses Asus KFSN4-DRE motherboards, the incidence rate has been much higher - 20% or so - and swapping memory has not fixed the problem. On some of the systems, slowing the memory clock fixes this, but of course this causes lower computational throughput. We are still working with the vendor to fix the problem nodes; for now, we are scheduling only 6 of 8 available cores. For the job mix on that cluster, this has been a temporary solution for most of the power off issues. Like you, many of the codes that our users run do not cause a problem. On the Asus-based cluster, a computational cosmology code will trigger the power shutdowns. The best torture code that we've found has been xhpl (linpack) built using a threaded version of libgoto; when this is executed on a single dual Barcelona node with "-np 8", each of the 8 MPI processes spawns 8 threads. This particular binary will cause our bad nodes to power off very quickly (you are welcome to a copy of the binary - just let me know). The power draw from our Barcelona systems is very strongly dependent on the code. The power draw difference between the xhpl binary mentioned above and the typical Lattice QCD codes we run is at least 25%. Because of this we've always suspected thermal or power issues, but the vendor of our Asus-based cluster has done the obvious things to check both (eg, using active coolers on the CPU's, using larger power supplies, and so forth) and hasn't had any luck. Also, the fact that swapping memory on our SuperMicro systems helps without affecting computational performance probably means that it is not a thermal issue on the CPU's. Don Holmgren Fermilab On Mon, 8 Dec 2008, Chris Samuel wrote: > Hi folks, > > We've been tearing our hair out over this for a little > while and so I'm wondering if anyone else has seen anything > like this before, or has any thoughts about what could be > happening ? > > Very occasionally we find one of our Barcelona nodes with > a SuperMicro H8DM8-2 motherboard powered off. IPMI reports > it as powered down too. > > No kernel panic, no crash, nothing in the system logs. > > Nothing in the IPMI logs either, it's just sitting there > as if someone has yanked the power cable (and we're pretty > sure that's not the cause!). > > There had not been any discernible pattern to the nodes > affected, and we've only a couple nodes where it's happened > twice, the rest only have had it happen once and scattered > over the 3 racks of the cluster. > > For the longest time we had no way to reproduce it, but then > we noticed that for 3 of the power off's there was a particular > user running Fluent on there. They've provided us with a copy > of their problem and we can (often) reproduce it now with that > problem. Sometimes it'll take 30 minutes or so, sometimes it'll > take 4-5 hours, sometimes it'll take 3 days or so and sometimes > it won't do it at all. > > It doesn't appear to be thermal issues as (a) there's nothing in > the IPMI logs about such problems and (b) we inject CPU and system > temperature into Ganglia and we don't see anything out of the > ordinary in those logs. :-( > > We've tried other codes, including HPL, and Advanced Clustering's > Breakin PXE version, but haven't managed to (yet) get one of the > nodes to fail with anything except Fluent. :-( > > The only oddity about Fluent is that it's the only code on > the system that uses HP-MPI, but we used the command line > switches to tell it to use the Intel MPI it ships with and > it did the same then too! > > I just cannot understand what is special about Fluent, > or even how a user code could cause a node to just turn > off without a trace in the logs. > > Obviously we're pursuing this through the local vendor > and (through them) SuperMicro, but to be honest we're > all pretty stumped by this. > > Does anyone have any bright ideas ? > > cheers, > Chris From Bogdan.Costescu at iwr.uni-heidelberg.de Mon Dec 8 11:32:07 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <4939734C.9060604@ias.edu> References: <20081205124843.GM11544@leitl.org> <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> <4939734C.9060604@ias.edu> Message-ID: On Fri, 5 Dec 2008, Prentice Bisbal wrote: > Dell and others advertise systems that support up to 128 GB RAM, but > I have yet to meet someone who can afford to put all 128 GB RAM in a > single box. Rather than saying "we're doing this for a long time", I'll mention that we've had lots of problems with some AMD Opteron based systems. We've always filled up all possible memory slots with the highest capacity (but still payable ;-)) memory modules in mainboards with 4 or 8 sockets; this allowed f.e. reaching 64GB in 2006 and 128GB in 2007, but created lots of problems with instability under load. Although we've been given many assurances that the configurations were fully supported by CPU, mainboard and memory manufacturers, in practice random memory errors occured and they could only be eliminated by running the memory at a lower speed or halving the memory size - unacceptable as these computers were by contract required to run the full memory at the full speed. Some of the involved manufacturers denied any knowledge of problems on similar configurations, only to say 6 months later that such problems do exist in many cases; after having many memory modules, CPUs and mainboards exchanged, we could have arrived to the same conclusion by ourselves ;-| For the latest purchase of this type, we have chosen a Tier 1 vendor and also changed the memory architecture to Intel shared bus - but for a different reason - and so far the 128GB didn't show any errors. Hope they stay that way ;-) -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From mathog at caltech.edu Mon Dec 8 11:37:07 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] cloning issue, hidden module dependency Message-ID: Mark Hahn wrote: > > Any ideas where else the amd74xx module load command might be hidden away? > > in the initrd, I bet... That was it, unfortunately. Mandriva went from a very small, and very general sort of "init" script to a larger, very machine specific "init". See the thread cited in the original post for more details. Sort of a PITA for cloning purposes - much easier to shuffle around a couple of small files like modprobe.conf than to have to build a new initrd for each node with different hardware. FC, RedHat, CentOS etc. might have similar changes, since they are all closely related. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From landman at scalableinformatics.com Mon Dec 8 13:05:20 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] cloning issue, hidden module dependency In-Reply-To: References: Message-ID: <493D8C10.5080007@scalableinformatics.com> David Mathog wrote: > Mark Hahn wrote: >>> Any ideas where else the amd74xx module load command might be hidden > away? >> in the initrd, I bet... > > That was it, unfortunately. Mandriva went from a very small, and very > general sort of "init" script to a larger, very machine specific "init". > See the thread cited in the original post for more details. Sort of a > PITA for cloning purposes - much easier to shuffle around a couple of > small files like modprobe.conf than to have to build a new initrd for > each node with different hardware. FC, RedHat, CentOS etc. might > have similar changes, since they are all closely related. Well RHEL is annoying in that if you decide to use a custom kernel and a software raid, you are, for lack of a better term, toast (if you stick with their tools/config). This is not to say that it is impossible, in fact it works very well in other distributions. We worked around this for some customers, but the surgery is neither easy nor pleasant. It involves upgrading nash, initrd-tools, and quite a few other things. This is because RHEL 5.x still uses dmraid for building software RAID while FCx (x>=8) have switched to mdadm (go figure). The latter works. Unfortunately, all of this is buried in initrd. Doing initrd surgery is not for the faint of heart. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From gus at ldeo.columbia.edu Mon Dec 8 13:17:47 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D6B43.2090004@ldeo.columbia.edu> Message-ID: <493D8EFB.5080004@ldeo.columbia.edu> Hello Steve and list Steve Herborn wrote: >The hardware suite is actually quite sweet, but has been mismanaged rather >badly. It has been left in a machine room that is too hot & on power that >is more then flaky with no line conditioners. One of the very first things >I had to do was replace almost two-dozen Power Supplies that were DOA. > > Yes, 24 power supplies may cost as much as the savings in UPS, plus the headache of replacing them, plus failing nodes. >I think I have most of the hardware issues squared away right now and need >to focus on getting here up & running, but even installing the OS on a >head-Node is proving to be troublesome. > > Besides my naive encouragement to use Rocks, I remember some recent discussions here on the Beowulf list about different techniques to setup a cluster. See this thread, and check the postings by Bogdan Cotescu, from the University of Heidelberg. He seems to administer a number of clusters, some of which have constraints comparable to yours, and to use a variety of tools for this: http://www.beowulf.org/archive/2008-October/023433.html http://www.iwr.uni-heidelberg.de/services/equipment/parallel/ >I really wish I could get away with using ROCKS as there would be such a >greater reach back for me over SUSE. Right now I am exploring AutoYast to >push the OS out to the compute nodes, > Long ago I looked into System Imager, which was then part of Oscar, but I don't know if it is current/maintained: http://wiki.systemimager.org/index.php/Main_Page >but that is still going to leave me >short on any management tools. > > > > That is true. Tell bosses they are asking you to reinvent the Rocks wheel. Good luck, Gus Correa -- --------------------------------------------------------------------- Gustavo J. Ponce Correa, PhD - Email: gus@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- >Steven A. Herborn >U.S. Naval Academy >Advanced Research Computing >410-293-6480 (Desk) >757-418-0505 (Cell) > > >-----Original Message----- >From: Gus Correa [mailto:gus@ldeo.columbia.edu] >Sent: Monday, December 08, 2008 1:45 PM >To: Beowulf >Cc: Steve Herborn >Subject: Re: [Beowulf] Personal Introduction & First Beowulf Cluster >Question > >Hello Steve and list > >In the likely case that the original vendor will no longer support this >5-year old cluster, >you can try installing the Rocks cluster suite, which is free from SDSC, >and you already came across to: > >http://www.rocksclusters.org/wordpress/ > >This would be a path or least resistance, and may get your cluster up and >running again with relatively small effort. >Of course there are many other solutions, but they may require more effort >from the system administrator. > >Rocks is well supported and documented. >It is based on CentOS (free version of RHEL). > >There is no support for SLES on Rocks, >so if you must keep the current OS distribution, it won't work for you. >I read your last paragraph, but you may argue with your bosses that the >age of this >machine doesn't justify being picky about the particular OS flavor. >Bringing it back to life, making it an useful asset, >with a free software stack, would be a great benefit. >You would spend money only in application software (e.g. Fortran >compiler, Matlab, etc). >Other solutions (e.g. Moab) will cost money, and may not work with >this old hardware. >Sticking to SLES may be a catch-22, a shot on the foot. > >Rocks has a relatively large user base, and an active mailing list for help. > >Moreover, for Rocks minimally you must have 1GB of RAM on every node, >two Ethernet ports on the head node, and one Ethernet port on each >compute node. >Check the hardware you have. >Although PXE boot capability is not strictly required, it makes >installation much easier. >Check your motherboard and BIOS. > >I have a small cluster made of five salvaged Dell Precision 410 (dual >Pentium III) >running Rocks 4.3, and it works well. >For old hardware Rocks is a very good solution, requiring a modest >investment of time, >and virtually no money. >(In my case I only had to buy cheap SOHO switches and Ethernet cables, >but you probably already have switches.) > >If you are going to run parallel programs with MPI, >the cheapest thing would be to have GigE ports and switches. >I wouldn't invest on fancier interconnect on such an old machine. >(Do you have any fancier interconnect already, say Myrinet?) >However, you can buy cheap GigE NICs for $15-$20, and high end ones (say >Intel Pro 1000) for $30 or less. >This would be needed only if you don't have GigE ports on the nodes already. >Probably your motherboards have dual GigE ports, I don't know. >MPI over 100T Ethernet is a real pain, don't do it, unless you are a >masochist. >A 64-port GigE switch to support MPI traffic would also be a worthwhile >investment. >Keeping MPI on a separate network, distinct from the I/O and cluster >control net, is a good thing. >It avoids contention and improves performance. > >A natural precaution would be to backup all home directories before you >start, >and any precious data or filesystems. > >I suggest sorting out the hardware issues before anything else. > >It would be good to evaluate the status of your RAID, >and perhaps use that particular node as a separate storage appliance. >You can try just rebuilding the RAID, and see if it works, or perhaps >replace the defective disk(s), >if the RAID controller is still good. > >Another thing to look at is how functional your Ethernet (or GigE) >switch or switches are, >and if you have more than one switch how they are/can be connected to >each other. >(One for the whole cluster? Two or more separate? Some specific topology >connecting many switches?) > >I hope this helps, >Gus Correa > > > From landman at scalableinformatics.com Mon Dec 8 13:31:12 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <493D8EFB.5080004@ldeo.columbia.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D6B43.2090004@ldeo.columbia.edu> <493D8EFB.5080004@ldeo.columbia.edu> Message-ID: <493D9220.3080200@scalableinformatics.com> Gus Correa wrote: >> I really wish I could get away with using ROCKS as there would be such a >> greater reach back for me over SUSE. Right now I am exploring >> AutoYast to >> push the OS out to the compute nodes, This is what we use for building SuSE clusters. It is actually not painful. We have operational autoyast.xml for 10.x and 9.3. Basically you boot it with a pointer to the autoyast file and get out of the way. [...] > That is true. > Tell bosses they are asking you to reinvent the Rocks wheel. Hmmm .... Rocks isn't everything, and there are a number of criticisms that certainly could be leveled at the system. I won't go there right now, but it is worth noting, just as with Linux != Redhat, Linux Cluster != Rocks clusters. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From mathog at caltech.edu Mon Dec 8 13:55:09 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] cloning issue, hidden module dependency Message-ID: Joe Landman wrote: > Well RHEL is annoying in that if you decide to use a custom kernel and a > software raid, you are, for lack of a better term, toast (if you stick > with their tools/config). My compute nodes aren't that complicated. The straw that broke this particular camel's back was a decision (presumably by Mandriva, maybe by RedHat) to change in the kernel config BLK_DEV_IDE and BLK_DEV_IDEDISK from y to m, similarly, DEV_AMD74XX (and etc.) also changed from y to m. As a consequence, they went from a system where a simple initrd would boot anywhere (as all the needed drivers were built into the kernel) to one where a much more complex initrd ended up being highly machine specific. In terms of having it "just work", having the disk drivers built into the kernel is lot simpler. Ubuntu 8.04.1 also has these as modules, and it has an immense initrd. (Even so, it might not have worked on my S2466 systems because the initrd did not build an AMD74XX module.) On the plus side, I did finally figure out how to PXE boot the PLD rescue CD, which in the last week has helped me escape from a couple of tight spots. The tricky part was that the online instructions said to use this in the APPEND line: initrd=rescue_pld_201/rescue.cpi,rescue_pld_201/custom/custom.cpi and that doesn't work, at least not for me. The two cpio archives are treated as if it was one file name, which of course does not exist. For future reference, it is done this way: 1. download and mount the PLD rescue cd ISO, copy the file hierarchy into /tftpboot/rescue_pld_201 (or whatever) 2. Put this in /tftpboot/pxelinux.cfg/default LABEL PLD_X86_201 KERNEL rescue_pld_201/boot/isolinux/vmlinuz APPEND initrd=rescue_pld_201/rescue.cpi root=/dev/ram0 3. Put this in /tftpboot/message.txt PLD_X86_201 : PLD rescue disk 2.01 (hdX disks) PXE boot a node, chose PLD_X86_201 on the PXE menu on its console, and it comes up at the text prompt for the PLD rescue CD. This provides a lot more tools than boel, but it is still light enough at 58M to boot over a 100baseT network. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From gus at ldeo.columbia.edu Mon Dec 8 14:41:00 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <493D9220.3080200@scalableinformatics.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D6B43.2090004@ldeo.columbia.edu> <493D8EFB.5080004@ldeo.columbia.edu> <493D9220.3080200@scalableinformatics.com> Message-ID: <493DA27C.40707@ldeo.columbia.edu> Hi Joe, Steve, list Joe Landman wrote: > Gus Correa wrote: > >> That is true. >> Tell bosses they are asking you to reinvent the Rocks wheel. > > > Hmmm .... Rocks isn't everything, There is no doubt about this ... > and there are a number of criticisms that certainly could be leveled > at the system. I won't go there right now, but it is worth noting, > just as with Linux != Redhat, Linux Cluster != Rocks clusters. > ... or about that either. The argument here, which was lost on previous emails, is that, for 5-year old cluster hardware as this one, from which very high performance is not expected, Rocks is a quite convenient and cost-effective solution. It won't take much effort and time to install it out of the box, have the cluster up and running, and the basic tools to administer the cluster will be also available, no major tweaking required. That is how I maintain a Pentium III little cluster, and my 1993 Honda. :) Would you take such a jewel to the dealership for an oil change? In any case, Steve has a requirement to use SUSE, and this rules Rocks out. Cheers, Gus Correa -- --------------------------------------------------------------------- Gustavo J. Ponce Correa, PhD - Email: gus@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- From Bogdan.Costescu at iwr.uni-heidelberg.de Mon Dec 8 14:40:35 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] cloning issue, hidden module dependency In-Reply-To: References: Message-ID: On Mon, 8 Dec 2008, David Mathog wrote: > The straw that broke this particular camel's back was a decision > (presumably by Mandriva, maybe by RedHat) to change in the kernel > config BLK_DEV_IDE and BLK_DEV_IDEDISK from y to m, similarly, > DEV_AMD74XX (and etc.) also changed from y to m. You were just lucky previously that Red Hat engineers found a good idea to put those into the kernel. How would you have felt if you were booting an all-SCSI (to stay with old tech) system, where the IDE drivers present in the kernel would not have helped ? > As a consequence, they went from a system where a simple initrd > would boot anywhere (as all the needed drivers were built into the > kernel) to one where a much more complex initrd ended up being > highly machine specific. Sorry to disapoint you... the initrd was always machine specific. All Red Hat docs specify that after modifying /etc/modules.conf or /etc/modprobe.conf the initrd should be regenerated via mkinitrd so that the next boot will use the proper drivers/settings. As to the complexity of initrd: my current method choice for setting up compute nodes is to sync a root FS from the master server during the initrd, which means that I have to build an initrd. As I already know what hardware components are in the node (which is also the case f.e. when I run mkinitrd), it's easy to just add these modules to the initrd archive and insert a few 'insmod module.ko' in the proper order in the init script. Having a monolithic kernel that "just works" on a large variety of hardware means answering "y" to most drivers; the kernel itself would then grow as large as the "immense initrd" that you mention. How would that be better ? -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From alsimao at gmail.com Fri Dec 5 09:52:36 2008 From: alsimao at gmail.com (Alcides Simao) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Re: Beowulf Digest, Vol 58, Issue 9 In-Reply-To: <200812051644.mB5GhqRt029376@bluewest.scyld.com> References: <200812051644.mB5GhqRt029376@bluewest.scyld.com> Message-ID: <7be8c36b0812050952h4225e5d3hd15bc9431906ead3@mail.gmail.com> Hello all! I was thinking of how to 'enpower' a Beowulf cluster. I remember back a while ago that a Intel Atom was overclocked sucessfully to 2.4 GHz Could it be possible to build a cooling apparatus sufficient to upgrade the velocity of the beowulf cpu? Cumpz Alcides -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081205/0f83ffb9/attachment.html From rssr at lncc.br Fri Dec 5 12:21:36 2008 From: rssr at lncc.br (rssr@lncc.br) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <20081205185304.GA27201@bx9> References: <20081205124843.GM11544@leitl.org> <5FB03D13-79AF-48B2-8B68-A60A2559E26B@xs4all.nl> <4939734C.9060604@ias.edu> <20081205185304.GA27201@bx9> Message-ID: <42238.128.83.67.198.1228508496.squirrel@webmail.lncc.br> Hi The problem is not only the size, the most important is? how we can access the menory, Bus or other topology.? Do not forget? the memory refresh time that is always bigger than the memory access time Renato > On Fri, Dec 05, 2008 at 01:30:36PM -0500, Prentice Bisbal wrote: > >> Not exactly. 2 GB DIMMs are cheap, but as soon as you go to larger DIMMs >> (4 GB, 8 GB, etc.), the price goes up exponentially. > > My last quote for 2GB and 4GB dimms was linear. > > -- greg > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081205/8c3bcd3b/attachment.html From spambox at emboss.co.nz Fri Dec 5 12:36:44 2008 From: spambox at emboss.co.nz (Michael Brown) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: References: <20081205124843.GM11544@leitl.org> Message-ID: <1280DBE407554B99A636961A75B1DDD1@Forethought> Mark Hahn wrote: >> (Well, duh). > > yeah - the point seems to be that we (still) need to scale memory > along with core count. not just memory bandwidth but also concurrency > (number of banks), though "ieee spectrum online for tech insiders" > doesn't get into that kind of depth :( I think this needs to be elaborated a little for those who don't know the layout of SDRAM ... A typical chip that may be used in a 4 GB DIMM would be a 2 Gbit SDRAM chip, of which there would be 16 (total 32 Gbits = 4 Gbytes). Each chip contributes 8 bits towards the 64-bit DIMM interface, so there's two "ranks", each comprised of 8 chips. Each rank operates independently from the other, but share (and are limited by) the bandwidth of the memory channel. From here I'm going to be using the Micron MT47H128M16 as the SDRAM chip, because I have the datasheet, though other chips are probably very similar. Each SDRAM chip internally is make up of 8 banks of 32 K * 8 Kbit memory arrays. Each bank can be controlled seperately but shares the DIMM bandwidth, much like each rank does. Before accessing a particular memory cell, the whole 8 Kbit "row" needs to be activated. Only one row can be active per bank at any point in time. Once the memory controller is done with a particular row, it needs to be "precharged", which basically equates to writing it back into the main array. Activating and precharging are relatively expensive operations - precharging one row and activating another takes at least 11 cycles (tRTP + tRP) and 7 cycles (tRCD) respectively at top speed (DDR2-1066) for the Micron chips mentioned, during which no data can be read from or written to the bank. Precharging takes another 4 cycles if you've just written to the bank. The second thing to know is that processors operate in cacheline sized blocks. Current x86 cache lines are 64 bytes, IIRC. In a dual-channel system with channel interleaving, odd-numbered cachelines come from one channel, and even numbered cachelines from the other. So each cacheline fill requires 8 bytes read per chip (which fits in nicely with the standard burst length of 8, since each read is 8 bits), coming out to 128 cachelines per row. Like channel interleaving, bank interleaving is also used. So: [] Cacheline 0 comes from channel 0, bank 0 [] Cacheline 1 comes from channel 1, bank 0 [] Cacheline 2 comes from channel 0, bank 1 [] Cacheline 3 comes from channel 1, bank 1 : : [] Cacheline 14 comes from channel 0, bank 7 [] Cacheline 15 comes from channel 1, bank 7 So this pattern repeats every 1 KB, and every 128 KB a new row needs to be opened on each bank. IIRC, rank interleaving is done on AMD quad-core processors, but not the older dual-core processors nor Intel's discrete northbridges. I'm not sure about Nehalem. This is all fine and dandy on a single-core system. The bank interleaving allows the channel to be active by using another bank when one bank is being activated or precharged. With a good prefetcher, you can hit close to 100% utilization of the channel. However, it can cause problems on a multi-core system. Say if you have two cores, each scanning through separate 1 MB blocks of memory. Each core is demanding a different row from the same bank, so the memory controller has to keep on changing rows. This may not appear to be an issue at first glance - after all, we have 128 cycles between each CPU hitting a particular bank (8 bursts * 8 cycles per burst * 2 processors sharing bandwidth), so we've got 64 cycles between row changes. That's over twice what we need (unless we're using 1 GB or smaller DIMMS, which only have 4 pages so things become tight). The killer though is latency - instead of 4-ish cycles CAS delay per read, we're now looking at 22 for a precharge + activate + CAS. In a streaming situation, this doesn't hurt too much as a good prefetcher would already be indicating it needs the next cacheline. But if you've got access patterns that aren't extremely prefetcher-friendly, you're going to suffer. Simply cranking up the number of banks doesn't help this. You've still got thrashing, you're just thrashing more banks. Turning up the cacheline size can help, as you transfer more data per stall. The extreme solution is to turn off bank interleaving. Our memory layout now looks like: [] Cacheline 0 comes from channel 0, bank 0, row 0, offset 0 bits [] Cacheline 1 comes from channel 1, bank 0, row 0, offset 0 bits [] Cacheline 2 comes from channel 0, bank 0, row 0, offset 64 bits [] Cacheline 3 comes from channel 1, bank 0, row 0, offset 64 bits : : [] Cacheline 254 comes from channel 0, bank 0, row 0, offset 8 K - 64 bits [] Cacheline 255 comes from channel 1, bank 0, row 0, offset 8 K - 64 bits [] Cacheline 256 comes from channel 0, bank 0, row 1, offset 0 bits [] Cacheline 257 comes from channel 1, bank 0, row 1, offset 0 bits So a new row every 16 KB, and a new bank every 512 MB (and a new rank every 4 GB). For a single core, this generally doesn't have a big effect, since the 18 cycle precharge+activate delay can often be hidden by a good prefetcher, and in any case only comes around every 16 KB (as opposed to every 128 KB for bank interleaving, so it's a bit more frequent, though for large memory blocks it's a wash). However, this is a big killer for multicore - if you have two cores walking through the same 512 MB area, they'll be thrashing the same bank. Not only does latency suffer, but bandwidth as well since the other 7 banks can't be used to cover up the wasted time. Every 8 cycles of reading will require 18 cycles of sitting around waiting for the bank, dropping bandwidth by about 70%. However, with proper OS support this can be a bit of a win. By associating banks (512 MB memory blocks) to cores in the standard NUMA way, each core can be operating out of its own bank. There's no bank thrashing at all, which allows much looser requirements on activation and precharge, which in turn can allow higher speeds. With channel interleaving, we can have up to 8 cores/threads operating in this way. With independent channels (ala Barcelona) we can do 16. Of course, this isn't ideal either. A row change will stall the associated CPU and can't be hidden, so ideally we want at least 2 banks per CPU, interleaved. Also, shared memory will be hurt under this scheme (bandwidth and latency) since it will experience bank thrashing and will only have 2 banks. To cover the activate and precharge times, we need at least 4 banks, so for a quad core CPU we need a total of 16 memory banks in the system, partly interleaved. 8 banks per core can improve performance further with certain access patterns. Also, to keep good single-core performance, we'll need to use both channels. In this case, 4-way bank interleaving per channel (so two sets of 4-way interleaves), with channel interleaving and no rank interleaving would work, though again 8-way bank interleaving would be better if there's enough to go around. This setup is electronically obtainable in current systems, if you use two dual-rank DIMMS per channel and no rank interleaving. In this case, you have 8-way bank interleaving, with channel interleaving and with the 4 ranks in contiguous memory blocks. With AMD's Barcelona, you can get away with a single dual-rank DIMM per channel if you run the two channels independently (though in this case single-threaded performance is compromised, because each core will tend to only access memory on a single controller). An 8-thread system like Nehalam + hyperthreading would ideally like 64 banks. Because of Nehalem's wonky memory controller (seriously, who was the guy in charge who settled on three channels? I can imagine the joy of the memory controller engineers when they found out they'd have to implement a divide-by-three in a critical path) it'd be a little more difficult to get working there, though there's still enough banks to go around (12 banks per thread). However, I'm not sure of any OSes that support this quasi-NUMA. I'm guessing it could be hacked into Linux without too much trouble, given that real NUMA support is already there. It's something I've been meaning to look into for a while, but I've never had the time to really get my hands dirty trying to figure out Linux's NUMA architecture. Cheers, Michael From crhea at mayo.edu Sat Dec 6 13:22:38 2008 From: crhea at mayo.edu (Cris Rhea) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Re: Multicore Is Bad News For Supercomputers In-Reply-To: <200812062000.mB6K07M9029867@bluewest.scyld.com> References: <200812062000.mB6K07M9029867@bluewest.scyld.com> Message-ID: <20081206212238.GA23215@kaizen.mayo.edu> ----- "Prentice Bisbal" wrote: > Dell and others advertise systems that support up > to 128 GB RAM, but I have yet to meet someone who > can afford to put all 128 GB RAM in a single box. They aren't *that* expensive these days... the key for these boxes is that they have 4 CPU sockets-- this allows one to use lower-density DIMMS than trying to put 128GB on a dual socket board. Without getting into discounts, a fairly decked-out Dell R905 (4 x quad-core 2.7GHz Opteron, 128GB memory) is under $35K (USD) (Assuming no Microsoft Licenses). If you have large memory apps (and users who don't want to break them down to run on cluster nodes), these are sweet machines for the money. --- Cris -- Cristopher J. Rhea Mayo Clinic - Research Computing Facility 200 First St SW, Rochester, MN 55905 crhea@Mayo.EDU (507) 284-0587 From james.p.lux at jpl.nasa.gov Mon Dec 8 15:14:52 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <493DA27C.40707@ldeo.columbia.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D6B43.2090004@ldeo.columbia.edu> <493D8EFB.5080004@ldeo.columbia.edu> <493D9220.3080200@scalableinformatics.com> <493DA27C.40707@ldeo.columbia.edu> Message-ID: > very high performance is not expected, Rocks is a quite > convenient and cost-effective solution. > > That is how I maintain a Pentium III little cluster, and my > 1993 Honda. :) Would you take such a jewel to the dealership > for an oil change? > You put rocks in the crankcase of your 93 Honda? Doesn't that make a lot of noise at high revs? Jim From mathog at caltech.edu Mon Dec 8 15:15:40 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] cloning issue, hidden module dependency Message-ID: Bogdan Costescu wrote: > Having a monolithic kernel that "just works" on a large variety of > hardware means answering "y" to most drivers; the kernel itself would > then grow as large as the "immense initrd" that you mention. I don't think so. It only has to work with a large variety of disks, and at that, not necessarily at optimal speeds. Basically it just has to function well enough to access the OS files on disk, where the rest of the modules are, so that those drivers can be loaded later. The boot kernel need not have every video, network, etc. driver in it. In any case, the sizes of the vmlinuz/initrd files discussed so far are: Distro Kernel vmlinuz initrd Kernel has IDE builtin Mandriva 2007.1 (2.6.19.3) 1607583 357892 Y Mandriva 2008.1 (2.6.24.7) 1787352 2214302 N Ubuntu 8.04.1 (2.6.24.16) 1903448 7906356 N Sure this is apples and oranges, but to me it looks like taking the IDE stuff (and maybe other drivers) out of the boot kernel is resulting in larger and larger initrd files, with the size of initrd going up faster than the size of vmlinuz, by a lot. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From gus at ldeo.columbia.edu Mon Dec 8 16:32:20 2008 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D6B43.2090004@ldeo.columbia.edu> <493D8EFB.5080004@ldeo.columbia.edu> <493D9220.3080200@scalableinformatics.com> <493DA27C.40707@ldeo.columbia.edu> Message-ID: <493DBC94.9030201@ldeo.columbia.edu> Hello James, list Lux, James P wrote: >>very high performance is not expected, Rocks is a quite >>convenient and cost-effective solution. >> >>That is how I maintain a Pentium III little cluster, and my >>1993 Honda. :) Would you take such a jewel to the dealership >>for an oil change? >> >> >> > >You put rocks in the crankcase of your 93 Honda? Doesn't that make a lot of noise at high revs? > >Jim > > Had I figured how to break polymers of Silicon, rather than polymers of Carbon, to fuel my 1993 Honda (that lovely XX century relic), I might as well sell the technology to NASA, for the rovers. No more heavy batteries or big solar panels required. Get a sack of Martian pebbles, and move! No global warming either, environmentally friendly just like the Flintstones. Unfortunately the reactions are not exothermic, as Bowen discovered long ago: http://en.wikipedia.org/wiki/Bowen's_reaction_series To break pyroxene into olivine, you need to heat things up, and oxidation (weathering) takes a looong time ... Not like burning octane into CO2, where to make a big boom all you need is air and a spark. In any case, I use rocks in the trunk of my Honda, for winter ballast against skidding on the snow. Very effective, and high-tech. I recommended to BMW and other brands, for improved stability. Gus Correa -- --------------------------------------------------------------------- Gustavo J. Ponce Correa, PhD - Email: gus@ldeo.columbia.edu Lamont-Doherty Earth Observatory - Columbia University P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- From lindahl at pbm.com Mon Dec 8 17:08:08 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Intro question In-Reply-To: <493D3AF5.2090806@sicortex.com> References: <4939541A.4000600@sicortex.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BA8@quadbrsex1.quadrics.com> <49395E22.6090707@scalableinformatics.com> <493D3AF5.2090806@sicortex.com> Message-ID: <20081209010808.GC28677@bx9> On Mon, Dec 08, 2008 at 10:19:17AM -0500, Lawrence Stewart wrote: > Well the NIC should read from cache or update the cache if the > data happens to be there. Don't all well designed I/O systems do that? There are a small number of systems that don't. Needless to say, it's a bit confusing for library writers to get I/O right in that circumstances. BTW, any interconnect that sends messages using PIO sends from cache. I wish people would invent a real receive-to-cache, since it would be nice overhead reducer for small messages on InfiniPath -- a small latency benefit (~ 7%), a bigger overhead benefit (~ 15%). -- greg From csamuel at vpac.org Mon Dec 8 19:06:49 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Odd SuperMicro power off issues In-Reply-To: <1227275645.3297031228791704093.JavaMail.root@mail.vpac.org> Message-ID: <1100390970.3297111228792009220.JavaMail.root@mail.vpac.org> ----- "Chris Samuel" wrote: > Does anyone have any bright ideas ? Wow, thanks so much to everyone who responded on this both to the list and in private, very much appreciated! Given there were so many of these I thought I'd try and comment on the main points that people raised rather than reply individually. 1) Power (lots of people) The vendor swapped in a new PSU in one of these nodes this morning, so we are resuming attempts to reproduce this failure now. The odd thing that we've noticed is that this often seems to happen when the node is only partly loaded (though not exclusively); for instance at one point we saw a node fail with Fluent running on 4 cores and a home grown code on another core (3 spare). 2) HT lockups (Scott and potentially Don) We've seen the same "System Firmware Error" messages on some of our nodes, sometimes associated with a system lockup, so we're going to look into BIOS upgrades. 3) Fluent Well we had a node power off this morning that wasn't running Fluent, but instead had a 4 CPU Gaussian job, some NAMD processes from various jobs and some random user compiled code. I don't know whether to be glad that I Fluent isn't so special or worried that other code can kill nodes. :-/ 4) IPMI (Bogdan) We wondered if the IPMI/BMC module might have done the power off too, but we would hope that we would see something in the logs. Anyway, we'll carry on with this using the hints and tips that people have provided and when (if?) we solve this I'll certainly update the list with what we find! Once again thanks so much to all of you who took the time to reply. All the best, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Mon Dec 8 19:48:04 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <1395903999.3297661228794272014.JavaMail.root@mail.vpac.org> Message-ID: <484419194.3297711228794483965.JavaMail.root@mail.vpac.org> ----- "John Hearns" wrote: > (*) Not a good idea in Great Britain, where arson in the Queen's > Dockyard is still a hanging offence, and I'd bet the judges would > say that a Naval Academy was part of a Dockyard. No longer the case I'm afraid (well, actually quite glad!): http://www.capitalpunishmentuk.org/abolish.html # On the 10th of December 1999, International Human Rights Day, # the government ratified Second Optional Protocol to the # International Covenant on Civil and Political Rights thus # totally abolishing capital punishment in Britain. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From iioleynik at gmail.com Mon Dec 8 21:54:31 2008 From: iioleynik at gmail.com (Ivan Oleynik) Date: Thu Mar 18 01:08:08 2010 Subject: [Beowulf] Cluster quote Message-ID: I know that many readers of this forum work for cluser vendors. Therefore, I am sending this email to get some responses from interested parties. I am going to purchase a computational cluster very soon (by the end of this year) and would like to get a quote for the configuration: 36 compute nodes (no dedicated master node), node config: 2x AMD Shanghai Opteron 2380, 2.5 GHz CPUs per node, 8 GB (2x4) DDR2 667 GHz memory, 250 GB HD, IPMI, Infiniband DDR card. Networking: 36 port Infiniband DDR switch (Melanox, not interested in expensive Qlogic), 48 port managed Gigabit switch Rack: standard size, 3 simple PDUs (no expensive network managed) I would appreciate receiving your quotes asap. Best wishes, Ivan Oleynik -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/95231efa/attachment.html From jan.heichler at gmx.net Mon Dec 8 22:03:39 2008 From: jan.heichler at gmx.net (Jan Heichler) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Cluster quote In-Reply-To: References: Message-ID: <152838081.20081209070339@gmx.net> Hallo Ivan, since that list is read by readers from many countries and vendors are normally active in certain geographical areas you should specify where the cluster will be located... Jan Dienstag, 9. Dezember 2008, meintest Du: I know that many readers of this forum work for cluser vendors. Therefore, I am sending this email to get some responses from interested parties. I am going to purchase a computational cluster very soon (by the end of this year) and would like to get a quote for the configuration: 36 compute nodes (no dedicated master node), node config: 2x AMD Shanghai Opteron 2380, 2.5 GHz CPUs per node, 8 GB (2x4) DDR2 667 GHz memory, 250 GB HD, IPMI, Infiniband DDR card. Networking: 36 port Infiniband DDR switch (Melanox, not interested in expensive Qlogic), 48 port managed Gigabit switch Rack: standard size, 3 simple PDUs (no expensive network managed) I would appreciate receiving your quotes asap. Best wishes, Ivan Oleynik Bye Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/e1993213/attachment.html From iioleynik at gmail.com Mon Dec 8 22:18:30 2008 From: iioleynik at gmail.com (Ivan Oleynik) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Re: Cluster quote In-Reply-To: References: Message-ID: In my previous post I forgot to mention location of my cluster: Tampa, FL, USA. Thanks to Jan Heichler (from Germany?) who sent me this reminder. Ivan On Tue, Dec 9, 2008 at 12:54 AM, Ivan Oleynik wrote: > I know that many readers of this forum work for cluser vendors. Therefore, > I am sending this email to get some responses from interested parties. > > I am going to purchase a computational cluster very soon (by the end of > this year) and would like to get a quote for the configuration: > > 36 compute nodes (no dedicated master node), > > node config: 2x AMD Shanghai Opteron 2380, 2.5 GHz CPUs per node, 8 GB > (2x4) DDR2 667 GHz memory, 250 GB HD, IPMI, Infiniband DDR card. > > Networking: 36 port Infiniband DDR switch (Melanox, not interested in > expensive Qlogic), 48 port managed Gigabit switch > > Rack: standard size, 3 simple PDUs (no expensive network managed) > > I would appreciate receiving your quotes asap. > > Best wishes, > > Ivan Oleynik > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/ed91ac0f/attachment.html From eugen at leitl.org Mon Dec 8 23:51:40 2008 From: eugen at leitl.org (Eugen Leitl) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Multicore Is Bad News For Supercomputers In-Reply-To: <1280DBE407554B99A636961A75B1DDD1@Forethought> References: <20081205124843.GM11544@leitl.org> <1280DBE407554B99A636961A75B1DDD1@Forethought> Message-ID: <20081209075140.GZ11544@leitl.org> On Sat, Dec 06, 2008 at 07:36:44AM +1100, Michael Brown wrote: > I think this needs to be elaborated a little for those who don't know the > layout of SDRAM ... Thank you, most useful information. [SNIP] I don't think this is very applicable to custom DRAM stacked on top of core, or SRAM/eDRAM (eventually MRAM?) in the core (e.g. like the Cell does it). There the most natural way is structure it into very wide words, and access it a that way. Add an array of ALUs on top of it along with shifts, n-bit swaps and the like and you'll get a very beefy machine on each die. Add a router to each die, and you've got potential for wafer-scale integration, by routing around dead dies from production or dynamically remapping failed grains during operation. This might not look like commodity, but eventually graphics accelerators must go there due to memory bandwidth limitations, and eventually CPUs will converge. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From tjrc at sanger.ac.uk Tue Dec 9 01:07:14 2008 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] cloning issue, hidden module dependency In-Reply-To: References: Message-ID: <09FE4A3A-EB6B-4A10-A30F-7FA659806BD8@sanger.ac.uk> On 8 Dec 2008, at 11:15 pm, David Mathog wrote: > Bogdan Costescu wrote: > >> Having a monolithic kernel that "just works" on a large variety of >> hardware means answering "y" to most drivers; the kernel itself would >> then grow as large as the "immense initrd" that you mention. > > I don't think so. It only has to work with a large variety of disks, > and > at that, not necessarily at optimal speeds. Basically it just has to > function well enough to access the OS files on disk, where the rest of > the modules are, so that those drivers can be loaded later. The boot > kernel need not have every video, network, etc. driver in it. > > In any case, the sizes of the vmlinuz/initrd files discussed so far > are: > > Distro Kernel vmlinuz initrd Kernel has IDE builtin > Mandriva 2007.1 (2.6.19.3) 1607583 357892 Y > Mandriva 2008.1 (2.6.24.7) 1787352 2214302 N > Ubuntu 8.04.1 (2.6.24.16) 1903448 7906356 N > > Sure this is apples and oranges, but to me it looks like taking the > IDE > stuff (and maybe other drivers) out of the boot kernel is resulting > in larger and larger initrd files, with the size of initrd going up > faster than the size of vmlinuz, by a lot. Ubuntu put quite a lot of other stuff into the initrd which has nothing to do with device drivers. For example, the initrd includes casper and all its support scripts, which provide support for things like persistent USB storage when running as a Live CD. But they also do put the entire kitchen sink in there in terms of device drivers; Ubuntu does aim to cover as wide as possible a range of possible hardware. It's very easy to build a custom kernel package with just what you want using the 'make-kpkg' command, and then you can strip out all the extraneous cruft if you want (I never bother - it's only modules that don't get loaded, and the only performance issue would be if you're PXE booting) Tim -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ajt at rri.sari.ac.uk Tue Dec 9 04:07:02 2008 From: ajt at rri.sari.ac.uk (Tony Travis) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] cloning issue, hidden module dependency In-Reply-To: <09FE4A3A-EB6B-4A10-A30F-7FA659806BD8@sanger.ac.uk> References: <09FE4A3A-EB6B-4A10-A30F-7FA659806BD8@sanger.ac.uk> Message-ID: <493E5F66.3050006@rri.sari.ac.uk> Tim Cutts wrote: > [...] > It's very easy to build a custom kernel package > with just what you want using the 'make-kpkg' command, and then you > can strip out all the extraneous cruft if you want (I never bother - > it's only modules that don't get loaded, and the only performance > issue would be if you're PXE booting) Hello, Tim. That's right, I PXE boot openMosix without an initrd, with the drivers needed to access the root filesystem built-in: Everything else is loaded as a module from /lib. Bye, Tony. -- Dr. A.J.Travis, University of Aberdeen, Rowett Institute of Nutrition and Health, Greenburn Road, Bucksburn, Aberdeen AB21 9SB, Scotland, UK tel +44(0)1224 712751, fax +44(0)1224 716687, http://www.rowett.ac.uk mailto:a.travis@abdn.ac.uk, http://bioinformatics.rri.sari.ac.uk/~ajt From herborn at usna.edu Tue Dec 9 05:45:30 2008 From: herborn at usna.edu (Steve Herborn) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <493D68C4.8060400@scalableinformatics.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D68C4.8060400@scalableinformatics.com> Message-ID: <85E4A12B2A64449A88808239EB84667C@dynamic.usna.edu> Joe; In relation to your Perl Motto; I'm more then appear that there is always more then one way to skin a cat and great debate will surround the subject. Sometimes the exercise can be useful, if not bloody. Unfortunately for me I'm not currently in a decision maker position on any of this and am being "directed" to do certain things in conjunction with a path that somebody already established, but it was in their mind not written down. The system's compute nodes were originally built to be "Stateful" and the current power player on my team wants it to remain that way. As things sit as of today I'm looking at either using AutoYast and am also evaluating Xcat to perform the task. The biggest issue with AutoYast is that it will assist me in getting the OS out to the Nodes; it really doesn't provide any of the Cluster Management Tools that I would like to get installed. Now you maybe asking yourself why "Stateful" Compute Nodes as I did. It appears to me at this time that along with occasionally using these nodes as part of a Cluster, they also use them as plain old Servers/Workstations as I've found User Accounts & home directories on some of the compute nodes. As I said in my first post I'm new to this position & organization and not quite sure with exactly how & for what the system is even used for. I was simply told to get'er up. Steven A. Herborn U.S. Naval Academy Advanced Research Computing 410-293-6480 (Desk) 757-418-0505 (Cell) -----Original Message----- From: Joe Landman [mailto:landman@scalableinformatics.com] Sent: Monday, December 08, 2008 1:35 PM To: Steve Herborn Cc: beowulf@beowulf.org Subject: Re: [Beowulf] Personal Introduction & First Beowulf Cluster Question Steve Herborn wrote: > > > Good day to the group. I would like to make a brief introduction to > myself and raise my first question to the forum. > > > > My name is Steve Herborn and I am a new employee at the United States > Naval Academy in the Advanced Research Computing group which supports Greetings Steve > the IT systems used for faculty research. Part of my responsibilities > will be the care & feeding of our Beowulf Cluster which is a > commercially procured Cluster from Aspen Systems. It purchased & > installed about four or five years ago. As delivered the system was > originally configured with two Head nodes each with 32 compute nodes. > One head node was running SUSE 9.x and the other Head Node was running > // Scyld (version unknown) also with 32 compute nodes. While I don't > know all of the history, apparently this system was not very actively > maintain and had numerous hardware & software issues, to include losing > the array on which Scyld was installed. //Prior to my arrival a Ouch ... if you call the good folks at Aspen, they could help with that (ping me if you need a contact) > decision was made to reconfigure the system from having two different > head nodes running two different OS Distributions to one Head Node > controlling all 64 Compute Nodes. In addition SUSE Linux Enterprise > Server (10SP2) (X86-64) was selected as the OS for all of the nodes. Ok. > Now on to my question which will more then likely be the first of many. > In the collective group wisdom what would be the most efficient & Danger Will Robinson ... for the N people who answer, you are likely to get N+2 answers, and N/2 arguments going ... not a bad thing, but to steal from the Perl motto "there is more than one way to do these things ..." > effective way to "push" the SLES OS out to all of the compute nodes once > it is fully installed & configured on the Head Node. In my research First: Stateless (e.g. diskless) versus Stateful (e.g. local installation). Scyld is "stateless" though Don will likely correct me (as this is massively oversimpilfied). SuSE can be installed Stateless or Stateful. Its installation can be automated ... we have been doing this for years (one of the few vendors to have done this with SuSE). It can also be run diskless ... we have booted compute nodes with Infiniband to fully operational compute nodes visible in all aspects within the cluster in under 60 seconds. This is the case for 9.3, 10.x SuSE flavors. > I've read about various Cluster packages/distributions that have that > capability built in, such as ROCKS & OSCAR which appear to have the > innate capability to do this as well as some additional tools that would > be very nice to use in managing the system. However, from my current > research in appears that they do not support SLES 10sp2 for the AMD Rocks only supports Redhat and rebuilds, I wouldn't recommend it for the task as you have indicated. Oscar might be able to handle this, though I haven't kept up on it, so I am not sure how active it is. You want to look at xCat v2 (open source), and Warewulf/Perceus (open source). Our package (Tiburon) is not ready to be released, and we will likely make it a meta package atop Perceus at some point soon. Though it is used in production at several large commercial companies specifically for SuSE clusters. > 64-bit Architecture (although since I am so new at this I could be > wrong). Are there any other "free" (money is always an issue) products > or methodologies I should be looking at to push the OS out & help me > manage the system? It appears that a commercial product Moab Cluster See above. If you want a prepackaged system, likely you are going to need to spend money. Moab is a possibility, though for SuSE, I would recommend looking at Concurrent Thinking's appliance. It will cost money, but they solve pretty much all of the problems for you. > Builder will do everything I need & more, but I do not have the funds to > purchase a solution. I also certainly do not want to perform a manual > OS install on all 64 Compute Nodes. No... in all likelihood, you really don't want to do any installation to the nodes (stateless if possible). > > > > Thanks in advance for any & all help, advice, guidance, or pearls of > wisdom that you can provide this Neophyte. Oh and please don't ask why > SLES 10sp2, I've already been through that one with management. It is > what I have been provided & will make work. It's not an issue, though we recommend better kernels/kernel updates. Compared to the RHEL kernels, it uses modern stuff. Joe > > > > > > ** Steven A. Herborn ** > > * * U.S. * * ** Naval Academy ** > > ** Advanced Research Computing ** > > ** 410-293-6480 (Desk) ** > > ** 757-418-0505 (Cell) **** ** > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From herborn at usna.edu Tue Dec 9 05:47:28 2008 From: herborn at usna.edu (Steve Herborn) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com><381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu><493D6B43.2090004@ldeo.columbia.edu><493D8EFB.5080004@ldeo.columbia.edu><493D9220.3080200@scalableinformatics.com> <493DA27C.40707@ldeo.columbia.edu> Message-ID: <7ED78016F3BB4B1CB147EB8C212E608A@dynamic.usna.edu> -----Original Message----- From: Lux, James P [mailto:james.p.lux@jpl.nasa.gov] Sent: Monday, December 08, 2008 6:15 PM To: Gus Correa; Beowulf; Steve Herborn Subject: RE: [Beowulf] Personal Introduction & First Beowulf Cluster Question > very high performance is not expected, Rocks is a quite > convenient and cost-effective solution. > > That is how I maintain a Pentium III little cluster, and my > 1993 Honda. :) Would you take such a jewel to the dealership > for an oil change? > You put rocks in the crankcase of your 93 Honda? Doesn't that make a lot of noise at high revs? Jim And off we go on a rocky side-road. I was wondering how long that would take. :) From landman at scalableinformatics.com Tue Dec 9 06:12:34 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <85E4A12B2A64449A88808239EB84667C@dynamic.usna.edu> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D68C4.8060400@scalableinformatics.com> <85E4A12B2A64449A88808239EB84667C@dynamic.usna.edu> Message-ID: <493E7CD2.2090702@scalableinformatics.com> Steve Herborn wrote: > The system's compute nodes were originally built to be "Stateful" and the > current power player on my team wants it to remain that way. As things sit Ok, not a problem. > as of today I'm looking at either using AutoYast and am also evaluating Xcat > to perform the task. The biggest issue with AutoYast is that it will assist > me in getting the OS out to the Nodes; it really doesn't provide any of the > Cluster Management Tools that I would like to get installed. Which tools do you have in mind? The Autoyast package that we have set up for our customers installs the OS locally, as well as pdsh, ganglia, and several other tools. Then in our finishing scripts which the autoyast.xml file links to, we set up SGE, adjust NIS/mounts, ... As I indicated, we get operational compute nodes shortly after turning them on. The current version of autoyast.xml + finishing scripts we have constructed also builds a RAID0 for local scratch, uses xfs file systems for root and scratch, installs OFED RPMs (on SuSE), updates the kernel to a late model (2.6.23.14 or so) and does some sysctl tuning. > Now you maybe asking yourself why "Stateful" Compute Nodes as I did. It Not really ... end users and customers have preferences. Our job is to help them understand the good and bad elements of each. Once they understand, if they prefer to make the decision, then we have them decide and go from there. If they leave it up to us, we try to help them make the best choice. > appears to me at this time that along with occasionally using these nodes as > part of a Cluster, they also use them as plain old Servers/Workstations as > I've found User Accounts & home directories on some of the compute nodes. Ow. A central "enterprise" disk is definitely needed. > As I said in my first post I'm new to this position & organization and not > quite sure with exactly how & for what the system is even used for. I was > simply told to get'er up. :) Bug me offline if you want our autoyast.xml, and access to our finishing scripts (parts of our Tiburon package). Check out xCat as well. > > Steven A. Herborn > U.S. Naval Academy > Advanced Research Computing > 410-293-6480 (Desk) > 757-418-0505 (Cell) > > -----Original Message----- > From: Joe Landman [mailto:landman@scalableinformatics.com] > Sent: Monday, December 08, 2008 1:35 PM > To: Steve Herborn > Cc: beowulf@beowulf.org > Subject: Re: [Beowulf] Personal Introduction & First Beowulf Cluster > Question > > Steve Herborn wrote: >> >> Good day to the group. I would like to make a brief introduction to >> myself and raise my first question to the forum. >> >> >> >> My name is Steve Herborn and I am a new employee at the United States >> Naval Academy in the Advanced Research Computing group which supports > > Greetings Steve > >> the IT systems used for faculty research. Part of my responsibilities >> will be the care & feeding of our Beowulf Cluster which is a >> commercially procured Cluster from Aspen Systems. It purchased & >> installed about four or five years ago. As delivered the system was >> originally configured with two Head nodes each with 32 compute nodes. >> One head node was running SUSE 9.x and the other Head Node was running >> // Scyld (version unknown) also with 32 compute nodes. While I don't >> know all of the history, apparently this system was not very actively >> maintain and had numerous hardware & software issues, to include losing >> the array on which Scyld was installed. //Prior to my arrival a > > Ouch ... if you call the good folks at Aspen, they could help with that > (ping me if you need a contact) > >> decision was made to reconfigure the system from having two different >> head nodes running two different OS Distributions to one Head Node >> controlling all 64 Compute Nodes. In addition SUSE Linux Enterprise >> Server (10SP2) (X86-64) was selected as the OS for all of the nodes. > > Ok. > >> Now on to my question which will more then likely be the first of many. >> In the collective group wisdom what would be the most efficient & > > Danger Will Robinson ... for the N people who answer, you are likely to > get N+2 answers, and N/2 arguments going ... not a bad thing, but to > steal from the Perl motto "there is more than one way to do these things > ..." > >> effective way to "push" the SLES OS out to all of the compute nodes once >> it is fully installed & configured on the Head Node. In my research > > First: Stateless (e.g. diskless) versus Stateful (e.g. local > installation). Scyld is "stateless" though Don will likely correct me > (as this is massively oversimpilfied). SuSE can be installed Stateless > or Stateful. Its installation can be automated ... we have been doing > this for years (one of the few vendors to have done this with SuSE). It > can also be run diskless ... we have booted compute nodes with > Infiniband to fully operational compute nodes visible in all aspects > within the cluster in under 60 seconds. This is the case for 9.3, 10.x > SuSE flavors. > >> I've read about various Cluster packages/distributions that have that >> capability built in, such as ROCKS & OSCAR which appear to have the >> innate capability to do this as well as some additional tools that would >> be very nice to use in managing the system. However, from my current >> research in appears that they do not support SLES 10sp2 for the AMD > > Rocks only supports Redhat and rebuilds, I wouldn't recommend it for the > task as you have indicated. > > Oscar might be able to handle this, though I haven't kept up on it, so I > am not sure how active it is. > > You want to look at xCat v2 (open source), and Warewulf/Perceus (open > source). Our package (Tiburon) is not ready to be released, and we will > likely make it a meta package atop Perceus at some point soon. Though > it is used in production at several large commercial companies > specifically for SuSE clusters. > >> 64-bit Architecture (although since I am so new at this I could be >> wrong). Are there any other "free" (money is always an issue) products >> or methodologies I should be looking at to push the OS out & help me >> manage the system? It appears that a commercial product Moab Cluster > > See above. If you want a prepackaged system, likely you are going to > need to spend money. Moab is a possibility, though for SuSE, I would > recommend looking at Concurrent Thinking's appliance. It will cost > money, but they solve pretty much all of the problems for you. > >> Builder will do everything I need & more, but I do not have the funds to >> purchase a solution. I also certainly do not want to perform a manual >> OS install on all 64 Compute Nodes. > > No... in all likelihood, you really don't want to do any installation to > the nodes (stateless if possible). > >> >> >> Thanks in advance for any & all help, advice, guidance, or pearls of >> wisdom that you can provide this Neophyte. Oh and please don't ask why >> SLES 10sp2, I've already been through that one with management. It is >> what I have been provided & will make work. > > It's not an issue, though we recommend better kernels/kernel updates. > Compared to the RHEL kernels, it uses modern stuff. > > Joe > >> >> >> >> >> ** Steven A. Herborn ** >> >> * * U.S. * * ** Naval Academy ** >> >> ** Advanced Research Computing ** >> >> ** 410-293-6480 (Desk) ** >> >> ** 757-418-0505 (Cell) **** ** >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From kilian.cavalotti.work at gmail.com Tue Dec 9 06:35:03 2008 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation Message-ID: <200812091535.03920.kilian.cavalotti.work@gmail.com> Hi all, I'd be curious to know if some of you use or have some real-life experience with rear-door heat exchangers, such as those from SGI [1] or IBM [2]. I'm especially interested in feedback about condensation, and operational water temperature. [1]http://www.sgi.fr/synergie/EpisodeXI/articles/3g.shtml [2]http://www.ibm.com/servers/eserver/xseries/storage/pdf/IBM_RDHx_Spec_Sheet.pdf Thanks a lot! Cheers, -- Kilian From hearnsj at googlemail.com Tue Dec 9 06:52:03 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <200812091535.03920.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: <9f8092cc0812090652t505a0d85y35c6a103cf17bf42@mail.gmail.com> 2008/12/9 Kilian CAVALOTTI > Hi all, > > I'd be curious to know if some of you use or have some real-life experience > with rear-door heat exchangers, such as those from SGI [1] or IBM [2]. > Killian, yes indeed. I manage both an SGI Altix with the rear-door heat exchangers, and an ICE cluster. We are lucky enough to have our own lake for a cooling pond. Grin. I think these are the cat's pyjamas - the SGI ones come in four horizontal 'stable doors' so you can swing one open for an extended amount of time to work on the rear of systems without overheating the whole rack. They enable us to run machines in some reasonably small spaces, and have been reliable. Contact me off-list please for temperature notes. John Hearns -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/6f1657b5/attachment.html From hearnsj at googlemail.com Tue Dec 9 07:04:57 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Personal Introduction & First Beowulf Cluster Question In-Reply-To: <493E7CD2.2090702@scalableinformatics.com> References: <386fa5610812020305n764d006dg606b2bf6461278a9@mail.gmail.com> <381BF20CBD854583BF28F0ED39E71E5A@dynamic.usna.edu> <493D68C4.8060400@scalableinformatics.com> <85E4A12B2A64449A88808239EB84667C@dynamic.usna.edu> <493E7CD2.2090702@scalableinformatics.com> Message-ID: <9f8092cc0812090704h50aae5b5v37366f3c3ca32f4a@mail.gmail.com> 2008/12/9 Joe Landman > > > Which tools do you have in mind? The Autoyast package that we have set up > for our customers installs the OS locally, as well as pdsh, ganglia, and > several other tools. Then in our finishing scripts which the autoyast.xml > file links to, we set up SGE, adjust NIS/mounts, ... > > As I indicated, we get operational compute nodes shortly after turning them > on. The current version of autoyast.xml + finishing scripts we have > constructed also builds a RAID0 for local scratch, uses xfs file systems for > root and scratch, installs OFED RPMs (on SuSE), updates the kernel to a late > model (2.6.23.14 or so) and does some sysctl tuning. > > That's how Streamline originally installed their clusters. Works fine - you do a generic SuSE install, and let the Autoyast tools do all the 'heavy lifting' then run a post-install script which integrates your nodes into the cluster, as Joe says by enabling NSI binding, copying across the batch startup script (yada yada...). I agree with Joe this would probably be a good way forward for you. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/40511412/attachment.html From hearnsj at googlemail.com Tue Dec 9 07:00:55 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <200812091535.03920.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: <9f8092cc0812090700s3a19af52xa5917ec85ea5c03e@mail.gmail.com> 2008/12/9 Kilian CAVALOTTI > > [1]http://www.sgi.fr/synergie/EpisodeXI/articles/3g.shtml > If you look on SGI Techpubs you can find their site install guide http://techpubs.sgi.com/library/tpl/cgi-bin/summary.cgi?coll=hdwr&db=bks&docnumber=007-5021-001 Chapter 4 has the specs for the water cooled racks. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/b7fbe045/attachment.html From iioleynik at gmail.com Tue Dec 9 07:09:51 2008 From: iioleynik at gmail.com (Ivan Oleynik) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <9f8092cc0812090652t505a0d85y35c6a103cf17bf42@mail.gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <9f8092cc0812090652t505a0d85y35c6a103cf17bf42@mail.gmail.com> Message-ID: John, What is the water rate requirement? Can it be fitted to any standard 42 rack, not only SGI made? How much did it cost (rough estimate would suffice)? Thanks, Ivan On Tue, Dec 9, 2008 at 9:52 AM, John Hearns wrote: > > > 2008/12/9 Kilian CAVALOTTI > >> Hi all, >> >> I'd be curious to know if some of you use or have some real-life >> experience >> with rear-door heat exchangers, such as those from SGI [1] or IBM [2]. >> > Killian, yes indeed. I manage both an SGI Altix with the rear-door heat > exchangers, and an ICE cluster. > We are lucky enough to have our own lake for a cooling pond. Grin. > > I think these are the cat's pyjamas - the SGI ones come in four horizontal > 'stable doors' so you can swing one open for an extended amount of time to > work on the rear of systems without overheating the whole rack. > They enable us to run machines in some reasonably small spaces, and have > been reliable. > > Contact me off-list please for temperature notes. > > John Hearns > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/ec93782b/attachment.html From lynesh at cardiff.ac.uk Tue Dec 9 07:14:46 2008 From: lynesh at cardiff.ac.uk (Huw Lynes) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <9f8092cc0812090652t505a0d85y35c6a103cf17bf42@mail.gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <9f8092cc0812090652t505a0d85y35c6a103cf17bf42@mail.gmail.com> Message-ID: <1228835686.21024.22.camel@w1199.insrv.cf.ac.uk> On Tue, 2008-12-09 at 14:52 +0000, John Hearns wrote: > > > 2008/12/9 Kilian CAVALOTTI > Hi all, > > I'd be curious to know if some of you use or have some > real-life experience > with rear-door heat exchangers, such as those from SGI [1] or > IBM [2]. > Killian, yes indeed. I manage both an SGI Altix with the rear-door > heat exchangers, and an ICE cluster. > We are lucky enough to have our own lake for a cooling pond. Grin. > > I think these are the cat's pyjamas - the SGI ones come in four > horizontal 'stable doors' so you can swing one open for an extended > amount of time to work on the rear of systems without overheating the > whole rack. How much cooling do you lose when opening the rack to do work on it? Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From gerry.creager at tamu.edu Tue Dec 9 07:26:29 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <200812091535.03920.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: <493E8E25.1060507@tamu.edu> Our p575 has cool doors. Our campus chill water temp is spec'd at 42F but ranges up as high as 48F. We are seeing no condensation I'm aware of, but I'll ask the operations guys. gerry Kilian CAVALOTTI wrote: > Hi all, > > I'd be curious to know if some of you use or have some real-life experience > with rear-door heat exchangers, such as those from SGI [1] or IBM [2]. > > I'm especially interested in feedback about condensation, and operational > water temperature. > > [1]http://www.sgi.fr/synergie/EpisodeXI/articles/3g.shtml > [2]http://www.ibm.com/servers/eserver/xseries/storage/pdf/IBM_RDHx_Spec_Sheet.pdf > > Thanks a lot! > Cheers, -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From hearnsj at googlemail.com Tue Dec 9 07:33:12 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <1228835686.21024.22.camel@w1199.insrv.cf.ac.uk> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <9f8092cc0812090652t505a0d85y35c6a103cf17bf42@mail.gmail.com> <1228835686.21024.22.camel@w1199.insrv.cf.ac.uk> Message-ID: <9f8092cc0812090733y4ece1f45i28247e9411044977@mail.gmail.com> 2008/12/9 Huw Lynes > > > How much cooling do you lose when opening the rack to do work on it? > > Good question! It must be about a quarter! Joking aside, if there's any interest I could take some IPMI temperature data before and after opening a door for (say) half an hour. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/69303f62/attachment.html From hearnsj at googlemail.com Tue Dec 9 08:06:44 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Re: Beowulf Digest, Vol 58, Issue 9 In-Reply-To: <7be8c36b0812050952h4225e5d3hd15bc9431906ead3@mail.gmail.com> References: <200812051644.mB5GhqRt029376@bluewest.scyld.com> <7be8c36b0812050952h4225e5d3hd15bc9431906ead3@mail.gmail.com> Message-ID: <9f8092cc0812090806q64c6b2a7ub14505cca19339f6@mail.gmail.com> 2008/12/5 Alcides Simao > Hello all! > > I was thinking of how to 'enpower' a Beowulf cluster. I remember back a > while ago that a Intel Atom was overclocked sucessfully to 2.4 GHz > Could it be possible to build a cooling apparatus sufficient to upgrade the > velocity of the beowulf cpu? > I don't see why you could not run a cluster with overclocked CPUs and (say) some heatpipe coolers. Just don't ask your vendor for a warranty! Seriously though, regarding Intel Atom and funky cooling schemes, have a look at: http://www.theregister.co.uk/2008/11/20/sgi_molecule_concept/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081209/248dc4ec/attachment.html From rgb at phy.duke.edu Tue Dec 9 09:02:21 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India Message-ID: Daily Dec 09 2008 TOP NEWS from www.siliconindia.com: 8 Indian supercomputers enter global top 500 list With India making a mark in every sector of the technology field, the country has shown its importance in the supercomputing race too. Eight of the top 500 supercomputers are of India with Tata Group's Eka, a HP based system leading the race. Go India! You rock! (The crowd goes wild...:-) rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From lindahl at pbm.com Tue Dec 9 10:06:33 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] ntpd wonky? Message-ID: <20081209180633.GA21193@bx9> Ever since the US daylight savings time change, I've been seeing a lot of jitter in the ntp servers I'm synched to... I'm using the redhat pool. Has anyone else noticed this? On 200 machines I get several complaints per day of >100 ms jitter from my hourly check-ntp cronjob. -- greg From diep at xs4all.nl Tue Dec 9 10:45:08 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: References: Message-ID: Maybe some decades from now all power per flop wasting supercomputers will be located in India. In the long run, they're the only ones on the planet who can afford the energy real cheap, and supercomputers usually burn a lot more power per gflop than they should, power6 up to factor 10. So even existing government rules (within EU that is) already would forbid building supercomputers as they waste too much power per double precision gflop as compared to the objective norm. Vincent On Dec 9, 2008, at 6:02 PM, Robert G. Brown wrote: > > Daily Dec 09 2008 TOP NEWS from www.siliconindia.com: > > 8 Indian supercomputers enter global top 500 list > > With India making a mark in every sector of the technology field, > the > country has shown its importance in the supercomputing race too. > Eight > of the top 500 supercomputers are of India with Tata Group's Eka, > a HP > based system leading the race. > > Go India! You rock! > > (The crowd goes wild...:-) > > rgb > > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From steffen.grunewald at aei.mpg.de Tue Dec 9 01:53:29 2008 From: steffen.grunewald at aei.mpg.de (Steffen Grunewald) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Tesla systems in Germany? Message-ID: <20081209095329.GY16423@casco.aei.mpg.de> Hi, I'm looking for someone in Germany who already has access to a Tesla system. I have received a request by a scientist for "a very powerful machine", and would like him to run some tests before spending and possibly wasting money. (To me it isn't clear whether his code would be suited at all, and he wasn't able to convince me...) Anyone? Cheers, Steffen -- Steffen Grunewald * MPI Grav.Phys.(AEI) * Am Mühlenberg 1, D-14476 Potsdam Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http://www.aei.mpg.de/ * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon:7233,fax:7298} No Word/PPT mails - http://www.gnu.org/philosophy/no-word-attachments.html From diep at xs4all.nl Tue Dec 9 16:47:25 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:09 2010 Subject: Fwd: [Beowulf] Tesla systems in Germany? References: <20081209095329.GY16423@casco.aei.mpg.de> Message-ID: Nominated for "what i want to have for christmas" posting of the year 2008, from the beowulf mailing list: "a very powerful machine": Begin forwarded message: > From: Steffen Grunewald > Date: December 9, 2008 10:53:29 AM GMT+01:00 > To: Beowulf mailing list > Subject: [Beowulf] Tesla systems in Germany? > > Hi, > > I'm looking for someone in Germany who already has access to a > Tesla system. > I have received a request by a scientist for "a very powerful > machine", and > would like him to run some tests before spending and possibly > wasting money. > (To me it isn't clear whether his code would be suited at all, and > he wasn't > able to convince me...) > > Anyone? > > Cheers, > Steffen > > -- > Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M?hlenberg 1, D-14476 > Potsdam > Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http:// > www.aei.mpg.de/ > * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon: > 7233,fax:7298} > No Word/PPT mails - http://www.gnu.org/philosophy/no-word- > attachments.html > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From diep at xs4all.nl Tue Dec 9 16:52:43 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Tesla systems in Germany? In-Reply-To: <20081209095329.GY16423@casco.aei.mpg.de> References: <20081209095329.GY16423@casco.aei.mpg.de> Message-ID: <893D66AE-D468-47AA-B83D-1EEC6D070DF5@xs4all.nl> heh Steffen, On a more serious note. What does your friend want to run at the machine for type of code? Have the algorithm in some sort of stripped format showing the working set size where you read from? Figuring out Tesla type devices is not so stupid right now. It has 240 cores @ 32 bits (either integer of floating point) clocked at say 1.2+ Ghz or so (1 instruction a cycle, forget the BS they quote online). Very powerful. Some algorithms can get rewritten. Would be fun to practice with some physicist code to rewrite it from memory intensive to instruction intensive code. As i have nothing to do with christmas i wanted to write some CUDA code anyway. Of course i have to rehearse dry as i have no CUDA set up devices here let alone budget to buy a 8800 card, let alone a Tesla. Vincent On Dec 9, 2008, at 10:53 AM, Steffen Grunewald wrote: > Hi, > > I'm looking for someone in Germany who already has access to a > Tesla system. > I have received a request by a scientist for "a very powerful > machine", and > would like him to run some tests before spending and possibly > wasting money. > (To me it isn't clear whether his code would be suited at all, and > he wasn't > able to convince me...) > > Anyone? > > Cheers, > Steffen > > -- > Steffen Grunewald * MPI Grav.Phys.(AEI) * Am M?hlenberg 1, D-14476 > Potsdam > Cluster Admin * http://pandora.aei.mpg.de/merlin/ * http:// > www.aei.mpg.de/ > * e-mail: steffen.grunewald(*)aei.mpg.de * +49-331-567-{fon: > 7233,fax:7298} > No Word/PPT mails - http://www.gnu.org/philosophy/no-word- > attachments.html > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From smulcahy at aplpi.com Wed Dec 10 00:21:13 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: References: Message-ID: <493F7BF9.70306@aplpi.com> Vincent Diepeveen wrote: > Maybe some decades from now all power per flop wasting supercomputers > will be located in India. > > In the long run, they're the only ones on the planet who can afford the > energy real cheap, > and supercomputers usually burn a lot more power per gflop than they > should, power6 up to factor 10. Iceland have energy literally pumping out of the ground - if they can sort out their connectivity to the US and Europe I think they'll quickly become the data centre to the world. Okay, they have some minor issues with seismic activity to deal with but you can't win em all. -stephen -- Stephen Mulcahy Applepie Solutions Ltd. http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From hearnsj at googlemail.com Wed Dec 10 01:11:13 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Tesla systems in Germany? In-Reply-To: <20081209095329.GY16423@casco.aei.mpg.de> References: <20081209095329.GY16423@casco.aei.mpg.de> Message-ID: <9f8092cc0812100111k6ba52f0ev1ae3571bb1b0aa32@mail.gmail.com> 2008/12/9 Steffen Grunewald > Hi, > > I'm looking for someone in Germany who already has access to a Tesla > system. > I have received a request by a scientist for "a very powerful machine", and > would like him to run some tests before spending and possibly wasting > money. > In that case, why not just buy a standard Nvidia graphics card? They run the same CUDA code. You can run your tests and get an idea of possible speedups, or indeed if the code will run under CUDA, before committing to buy Tesla. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081210/6c56e56c/attachment.html From hearnsj at googlemail.com Wed Dec 10 01:18:02 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Tesla systems in Germany? In-Reply-To: <20081209095329.GY16423@casco.aei.mpg.de> References: <20081209095329.GY16423@casco.aei.mpg.de> Message-ID: <9f8092cc0812100118m23d2fdffp433e6a25addac202@mail.gmail.com> 2008/12/9 Steffen Grunewald > Hi, > > I'm looking for someone in Germany who already has access to a Tesla > system. > I have received a request by a scientist for "a very powerful machine", and > A "very powerful machine" could mean a lot of things - a cluster with a high core count. A large SMP machine with a huge amount of memory. A dedicated machine like the QCD calculators. As Vincent says, you need to look at what the code is before hitting the "I need Cuda" button. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081210/e17becc2/attachment.html From kilian.cavalotti.work at gmail.com Wed Dec 10 01:21:42 2008 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <493E8E25.1060507@tamu.edu> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <493E8E25.1060507@tamu.edu> Message-ID: <200812101021.42824.kilian.cavalotti.work@gmail.com> Hi Gerry, On Tuesday 09 December 2008 16:26:29 Gerry Creager wrote: > Our p575 has cool doors. Our campus chill water temp is spec'd at 42F > but ranges up as high as 48F. We are seeing no condensation I'm aware > of, but I'll ask the operations guys. Thanks, that's helpful. I was afraid that a low temp for chilled water would generate condensation on the pipes, or even on the doors themselves. Cheers, -- Kilian From diep at xs4all.nl Wed Dec 10 03:08:37 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: <493F7BF9.70306@aplpi.com> References: <493F7BF9.70306@aplpi.com> Message-ID: On Dec 10, 2008, at 9:21 AM, stephen mulcahy wrote: > Vincent Diepeveen wrote: >> Maybe some decades from now all power per flop wasting >> supercomputers will be located in India. >> In the long run, they're the only ones on the planet who can >> afford the energy real cheap, >> and supercomputers usually burn a lot more power per gflop than >> they should, power6 up to factor 10. > > Iceland have energy literally pumping out of the ground - if they > can sort out their connectivity to the US and Europe I think > they'll quickly become the data centre to the world. Okay, they > have some minor issues with seismic activity to deal with but you > can't win em all. > Now in iceland, i was there not so long ago, i won't say i saw the credit crisis come when i was there. That would be a bit overoptimistic. But realize it's just a fishers society with some sheep on the rocky (vulcano) ground and people living at high american standards driving around in jeeps with huge wheels as there are no roads. There is in total around 300k inhabitants there. Buying a hamburger there always was a big ripoff for tourists (i paid for a simple meal 15 euro or so), as everything gets imported to the island. So the industry that eats 90+% of all energy isn't there. Forget the idea of energy centrals the use heat from the underground. That's just to keep happy the environmental lobby which is bad in doing math. See below. These energy centrals are very expensive and the biggest and most expensive one produces a factor 40 less than what a normal nuclear reactor produces for a cheap price. Additionally a nuclear reactor for sure can produce coming 25 years whereas digging in the underground always is complicated and unsure business. Additionally there is going to be a new treaty within EU about CO2 reduction. Idea is to reduce 20% CO2 or so the coming years. Industry that can compete gets exempted from the treaty. Basically that's all industry, on paper that's 96% of all industry now, and i do not know why the other 4% were so stupid to not ask for an exception, maybe there application is still 4 layers down some office desk. Anyway the energy centrals are not excepted from this treaty, so CO2 reduction it will be for them. So these nuclear reactors will get built massively coming years and India can build most nuclear reactors of us all at a cheap price and they keep producing cheap there forever. Unlike Europe they probably do not have a '25 year limit' in which an energy central must pay itself back after which it has to get destroyed; practical there is a big need for energy so it still keeps producing for the coming 100 years. If each scientific commission of each nation is on its own deciding what type of nuclear reactor gets built, it's gonna be a watercooled reactor (very safe and cheap), which burns up the worldwide stockpile of easily extractable pile of uranium quickly, as the amount of reactors that's gonna get built in Europe is gonna be for sure more than tesla has stream processors. It is obvious that replacing these centrals by nuclear centrals is the only manner for energy industry to reduce CO2. Meanwhile when economy is going to boom again in Europe, of course Germany is going to use even more coals for industry (the 96% that falls in the exemption), so a year or 10 from now of course from the original plans to reduce CO2 output will be a joke of course. Be happy if it didn't double by 2018. In either case, it's good news for Australian export. Vincent > -stephen > > -- > Stephen Mulcahy Applepie Solutions Ltd. http:// > www.aplpi.com > Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, > Galway) > From hearnsj at googlemail.com Wed Dec 10 03:11:08 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Tesla systems in Germany? In-Reply-To: <20081210092947.GK16423@casco.aei.mpg.de> References: <20081209095329.GY16423@casco.aei.mpg.de> <9f8092cc0812100118m23d2fdffp433e6a25addac202@mail.gmail.com> <9f8092cc0812100111k6ba52f0ev1ae3571bb1b0aa32@mail.gmail.com> <20081210092947.GK16423@casco.aei.mpg.de> Message-ID: <9f8092cc0812100311s319dc3ebg6f34c3f464ada564@mail.gmail.com> 2008/12/10 Steffen Grunewald > > Cluster with high core count: this would give the opportunity to do stupid > things on the "several hundreds" scale, but not speed up the single stupid > thing. > Steffen, if I'm not wrong you have just restated Amdahl's Law. > > > As Vincent says, you need to look at what the code is before hitting the > "I > > need Cuda" button. > > Sometimes the approach to "throw enough money at a problem, and it will > resolve itself" is the easier one, compared with the need to power-up your > brains :( > > Thanks for your patience, No problem. Sounds to me actually like you need to encourage some code profiling, before saying that any particular machine is the answer to this one. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081210/f4c7a904/attachment.html From Dan.Kidger at quadrics.com Wed Dec 10 03:15:15 2008 From: Dan.Kidger at quadrics.com (Dan.Kidger@quadrics.com) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: <493F7BF9.70306@aplpi.com> References: <493F7BF9.70306@aplpi.com> Message-ID: <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> And I am sure Iceland would find it much easier to do the machine room cooling than say Spain or the Southern USA Daniel -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of stephen mulcahy Sent: 10 December 2008 08:21 To: Vincent Diepeveen Cc: Beowulf Mailing List; Robert G. Brown Subject: Re: [Beowulf] For grins...India Vincent Diepeveen wrote: > Maybe some decades from now all power per flop wasting supercomputers > will be located in India. > > In the long run, they're the only ones on the planet who can afford the > energy real cheap, > and supercomputers usually burn a lot more power per gflop than they > should, power6 up to factor 10. Iceland have energy literally pumping out of the ground - if they can sort out their connectivity to the US and Europe I think they'll quickly become the data centre to the world. Okay, they have some minor issues with seismic activity to deal with but you can't win em all. -stephen -- Stephen Mulcahy Applepie Solutions Ltd. http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From rgb at phy.duke.edu Wed Dec 10 05:37:54 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> Message-ID: On Wed, 10 Dec 2008, Dan.Kidger@quadrics.com wrote: > And I am sure Iceland would find it much easier to do the machine room > cooling than say Spain or the Southern USA Or the same people that are bringing you e-paper in e-book readers and superphones will figure out how to spray a processor core, a GB of sram, and a GB of nvram onto a piece of vinyl the size of a postage stamp that is powered by the spray-on-solar cell that is sprayed on top of it, and your kilonode supercomputer will become your desktop, literally, as long as you don't cover it all with papers so ambient light can't power it or spill your coffee on it. In the meantime, the advent of the overdue ice age will a) put a whole lot of climatologists out of business, but don't worry, the ones that aren't actually lynched will move on to get important work in public relations or cleaning roadsides wearing lovely orange jumpers or working in the coal-from-ground extraction industry in the forlorn hope that somehow getting enough CO_2 into the air will actually delay the inevitable progress of planetary orbits in interaction with the solar cycle; b) make the idea of a nice, warm desktop computer very attractive once again. DEC/Compaq/HP (which by then will have been take over by Toshiba) will trot out a new release of the Alpha and we will once again have a small computer that is entirely capable of heating a standard office. Iceland will become distinctly unfavorable real estate as it is once again covered with glaciers -- DEEP glaciers. Of course, so will Europe, most of North Asia and Canada down to roughly Ohio. The world will wistfully discover that global warming was actually rather a lovely dream, and that being warm, wet and fertile is GOOD even at the expense of some coastline where having 1/3 of the planet's surface, including most of its wheat growing regions, covered in permafrost is really, really bad. Bad. Did I mention that it won't be good? North Carolina, of course, will thrive, with a climate roughly like that of Nova Scotia today, and we'll do our best to accomodate all of you yankees reading this to tend our farms and bring us our mint juleps to sip. Just remember, you heard it here first. I estimate a roughly one in a hundred chance of the current low in the solar cycle triggering the next (expected) Maunder minimum perhaps by altering the thermohaline circulation that is some five or size orders of magnitude more important a global climate determiner than any greenhouse gas (with water a similar number of orders of magnitude more important than mere CO_2) into deep freeze mode. In any event, out here at 10,000+ years of interglacial (the second longest in the last ten) our ass is definitely hanging over the abyss and we may even live to see the fall. So cooling clusters may not be that much of a problem very, very soon. rgb > Daniel > > > -----Original Message----- > From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of stephen mulcahy > Sent: 10 December 2008 08:21 > To: Vincent Diepeveen > Cc: Beowulf Mailing List; Robert G. Brown > Subject: Re: [Beowulf] For grins...India > > Vincent Diepeveen wrote: >> Maybe some decades from now all power per flop wasting supercomputers >> will be located in India. >> >> In the long run, they're the only ones on the planet who can afford the >> energy real cheap, >> and supercomputers usually burn a lot more power per gflop than they >> should, power6 up to factor 10. > > Iceland have energy literally pumping out of the ground - if they can > sort out their connectivity to the US and Europe I think they'll quickly > become the data centre to the world. Okay, they have some minor issues > with seismic activity to deal with but you can't win em all. > > -stephen > > -- > Stephen Mulcahy Applepie Solutions Ltd. http://www.aplpi.com > Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From deadline at eadline.org Wed Dec 10 05:57:52 2008 From: deadline at eadline.org (Douglas Eadline) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Small request Message-ID: <33558.192.168.1.213.1228917472.squirrel@mail.eadline.org> Fellow geeks and/or other assorted HPC riff-raff: I have posted a one uestion survey as part of my weekly Linux Magazine column. It is about how the economy is effecting your HPC plans for next year. I also invite comments if you have any ... http://linux-mag.com/id/7198 There is a free registration required to post comments. I'll be summarizing the results of the poll and comments next week. Thanks -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From prentice at ias.edu Wed Dec 10 06:50:26 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Small request In-Reply-To: <33558.192.168.1.213.1228917472.squirrel@mail.eadline.org> References: <33558.192.168.1.213.1228917472.squirrel@mail.eadline.org> Message-ID: <493FD732.2070002@ias.edu> Douglas Eadline wrote: > Fellow geeks and/or other assorted HPC riff-raff: > > I have posted a one uestion survey as part of my weekly > Linux Magazine column. It is about how the economy > is effecting your HPC plans for next year. I also > invite comments if you have any ... > > http://linux-mag.com/id/7198 > > There is a free registration required to post comments. > I'll be summarizing the results of the poll and comments > next week. > > Thanks > > -- > Doug > Doug, I was going to post a comment, but it wouldn't be anonymous. I thought your article mentioned that the comments would be anonymous. -- Prentice From deadline at eadline.org Wed Dec 10 07:09:14 2008 From: deadline at eadline.org (Douglas Eadline) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Small request In-Reply-To: <493FD732.2070002@ias.edu> References: <33558.192.168.1.213.1228917472.squirrel@mail.eadline.org> <493FD732.2070002@ias.edu> Message-ID: <53464.192.168.1.213.1228921754.squirrel@mail.eadline.org> Well, you may be correct as it depends on how you registered with the site. Many people register for site with names like "clusterbunny@gmail.com" so they are somewhat anonymous. If f anyone has any comments they want kept anonymous, send them directly to me. And, because I suffer from CRS, your name will probably get dropped from my memory like a bad packet. -- Doug * Can't Remember Shit > Douglas Eadline wrote: >> Fellow geeks and/or other assorted HPC riff-raff: >> >> I have posted a one uestion survey as part of my weekly >> Linux Magazine column. It is about how the economy >> is effecting your HPC plans for next year. I also >> invite comments if you have any ... >> >> http://linux-mag.com/id/7198 >> >> There is a free registration required to post comments. >> I'll be summarizing the results of the poll and comments >> next week. >> >> Thanks >> >> -- >> Doug >> > > > Doug, > > I was going to post a comment, but it wouldn't be anonymous. I thought > your article mentioned that the comments would be anonymous. > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- > This message has been scanned for viruses and > dangerous content by MailScanner, and is > believed to be clean. > -- Doug -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From niftyompi at niftyegg.com Wed Dec 10 12:08:12 2008 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <200812101021.42824.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <493E8E25.1060507@tamu.edu> <200812101021.42824.kilian.cavalotti.work@gmail.com> Message-ID: <20081210200812.GA3449@compegg.wr.niftyegg.com> On Wed, Dec 10, 2008 at 10:21:42AM +0100, Kilian CAVALOTTI wrote: > > Hi Gerry, > > On Tuesday 09 December 2008 16:26:29 Gerry Creager wrote: > > Our p575 has cool doors. Our campus chill water temp is spec'd at 42F > > but ranges up as high as 48F. We are seeing no condensation I'm aware > > of, but I'll ask the operations guys. > > Thanks, that's helpful. I was afraid that a low temp for chilled water would > generate condensation on the pipes, or even on the doors themselves. > Watch dew point numbers in the room. Dew point is dominantly a function of humidity... http://en.wikipedia.org/wiki/Dew_point If the dew point is higher than the chilled water temp condensation is possible if the heat exchanger surface cools that much. Condensation on normal cold water pipes and chillers in the large and small construction like home or office is common so the correct insulation materials are easy to find and install. Many frost free home refrigerators solve this problem by running the heated exhaust air over the catch pan so any frost/ condensation is promptly evaporated. With clever airflow management drains may not be needed but water rots wood, breeds bacteria and attracts bugs and may be problematic. The bacteria issue is important.... see Legionella pneumophila. Right now the outside air dew point in Bryan, Texas is about 19F and historically gets as high as 69F in December. So yes condensation from 42F cooling pipes is possible and should be part of the management/ monitoring process. I suspect that the campus AC manages the dew point to the high end of a comfort range that might be about 50 - 54?F in the US keeping things all OK. i.e. If the building AC manages humidity you may not have to if they have the capacity to control it at the building air inlets. Of course the weather in France is not the same as Texas... looks nice ;-) something like. 52 ?F / 11 ?C Light Rain Humidity: 82% Dew Point: 46 ?F / 8 ?C -- T o m M i t c h e l l Found me a new hat, now what? From niftyompi at niftyegg.com Wed Dec 10 13:39:29 2008 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> Message-ID: <20081210213929.GB3449@compegg.wr.niftyegg.com> On Wed, Dec 10, 2008 at 08:37:54AM -0500, Robert G. Brown wrote: > On Wed, 10 Dec 2008, Dan.Kidger@quadrics.com wrote: > >> And I am sure Iceland would find it much easier to do the machine room >> cooling than say Spain or the Southern USA > ..... > > In the meantime, the advent of the overdue ice age will... ---- And in many of the 'global warming' reserch groups are those that are looking at 'anoxic' ocean regons in the ocean as bad side effects of global warming. In a geologic perspective it is exactly the environment that sequestered so much carbon as coal. These regions and processes may be critical in keeping the lid on CO2 in the atmosphere. As for the north polar cap it would be interesting to model the warm water flow of the Japan Current as it encounters the Bering Strait. Only 53 Miles wide the warm water flow change into the artic with less than a meter rise in the sea level would be large (%age) and have a butterfly effect on the artic. On the converse, a probject to place a meter+ thick gravel flow barrier would be an engineering project akin to a railroad ballast 53 miles long (easy). With GPS locators dredge/ fill/ rock could be placed with precision to this end and PERHAPS reverse the shrinking of the artic ice sheet and increase the albedo of the earth and perhaps restoring the status quo in this regard. OK grosly simplified but there are not many environmental pinch points with as much global leverage. http://en.wikipedia.org/wiki/Kuroshio Others are thinking about this. But are they able to modeling it? http://psc.apl.washington.edu/HLD/Bstrait/bstrait.html -- T o m M i t c h e l l Found me a new hat, now what? PS: the critical point that the Bering Strait might play here was first expressed to me by Ed McCullough then dean of Geology at the University of Arizona c. 1969. From hahn at mcmaster.ca Wed Dec 10 12:34:24 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <200812101021.42824.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <493E8E25.1060507@tamu.edu> <200812101021.42824.kilian.cavalotti.work@gmail.com> Message-ID: > Thanks, that's helpful. I was afraid that a low temp for chilled water would > generate condensation on the pipes, or even on the doors themselves. condensation happens when air passes over a surface which is below the air's dew point. it's more likely that the main chiller's cold output will be at a lower temperature than the door coil, so any condensation will happen there. that's assuming you don't have oddities like major moisture sources, and that you avoid undesirable airflow. From diep at xs4all.nl Wed Dec 10 15:01:23 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: <20081210213929.GB3449@compegg.wr.niftyegg.com> References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> <20081210213929.GB3449@compegg.wr.niftyegg.com> Message-ID: What is most interesting from supercomputer viewpoint seen is the comments i got from some scientists when speaking about climate calculations. At a presentation at SARA at 11 september 2008 with some bobo's there (minister bla bla), there was a few sheets from the North-Atlantic. It was done in rectangles from 40x40KM. The real question raised by the other scientists who weren't there at the presentation is: "why such an ugly resolution the commercial software we use is far superior to this and capable of calculating more". So my question basically here to the climatologists here would be: "what does it take to accurately calculate the effects of "global warming" and the fact that the ocean will react, triggering according to some predictions a new iceage. It should be really possible to do a lot of calculations there also presenting what errors there can be in the calculations to hold true. Especially the resolutions at which things got calculated so far in climate change area, most scientists who are more busy with airwings (some others of those North Atlantic Software Association type guys post also in this group), influence of moon and so on, onto all kind of models. They do not really understand why all this hasn't been calculated before very well. Maybe the format used to calculate is too generic and therefore not storing enough information? Would GPU's help speeding up calculations here? So far most models were to say polite, total laymen models. A good guess from a scientist so far always has been better than any calculation. You realize that this meter rise calculation, i checked out that source code myself back in 2003 which ran on Earth machine and SARA's 1024 processor Origin3800. I wasn't impressed to read that their conclusion was the rise would be 1 meter and in some sort of file that i would call now bugfix.log there was the comments: "oops we fixed a bug, the meter was initialized a meter too high". I could be wrong of course reading that, as it might be it was just the first half million CPU node hours that got wasted... Is the software too generic to be accurate? How low level has it been optimized? Not seldom if some low level programmers go busy with such software it suddenly speeds up factor 1000. Vincent On Dec 10, 2008, at 10:39 PM, Nifty Tom Mitchell wrote: > On Wed, Dec 10, 2008 at 08:37:54AM -0500, Robert G. Brown wrote: >> On Wed, 10 Dec 2008, Dan.Kidger@quadrics.com wrote: >> >>> And I am sure Iceland would find it much easier to do the machine >>> room >>> cooling than say Spain or the Southern USA >> > ..... >> >> In the meantime, the advent of the overdue ice age will... > ---- > > And in many of the 'global warming' reserch groups are those that are > looking at 'anoxic' ocean regons in the ocean as bad side effects of > global warming. In a geologic perspective it is exactly the > environment > that sequestered so much carbon as coal. These regions and processes > may be critical in keeping the lid on CO2 in the atmosphere. > > As for the north polar cap it would be interesting to model the > warm water > flow of the Japan Current as it encounters the Bering Strait. Only > 53 Miles > wide the warm water flow change into the artic with less than a > meter rise > in the sea level would be large (%age) and have a butterfly effect > on the artic. > On the converse, a probject to place a meter+ thick gravel flow > barrier would > be an engineering project akin to a railroad ballast 53 miles long > (easy). > With GPS locators dredge/ fill/ rock could be placed with precision > to this end and PERHAPS > reverse the shrinking of the artic ice sheet and increase the > albedo of > the earth and perhaps restoring the status quo in this regard. > > OK grosly simplified but there are not many environmental pinch points > with as much global leverage. > > http://en.wikipedia.org/wiki/Kuroshio > > Others are thinking about this. But are they able to modeling it? > > http://psc.apl.washington.edu/HLD/Bstrait/bstrait.html > > > -- > T o m M i t c h e l l > Found me a new hat, now what? > > PS: the critical point that the Bering Strait might play here was > first expressed to me by Ed McCullough then dean of Geology at the > University > of Arizona c. 1969. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From diep at xs4all.nl Wed Dec 10 15:07:37 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: <20081210213929.GB3449@compegg.wr.niftyegg.com> References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> <20081210213929.GB3449@compegg.wr.niftyegg.com> Message-ID: On Dec 10, 2008, at 10:39 PM, Nifty Tom Mitchell wrote: > On Wed, Dec 10, 2008 at 08:37:54AM -0500, Robert G. Brown wrote: >> On Wed, 10 Dec 2008, Dan.Kidger@quadrics.com wrote: >> >>> And I am sure Iceland would find it much easier to do the machine >>> room >>> cooling than say Spain or the Southern USA >> > ..... >> >> In the meantime, the advent of the overdue ice age will... > ---- > > And in many of the 'global warming' reserch groups are those that are > looking at 'anoxic' ocean regons in the ocean as bad side effects of > global warming. In a geologic perspective it is exactly the > environment > that sequestered so much carbon as coal. These regions and processes > may be critical in keeping the lid on CO2 in the atmosphere. > > As for the north polar cap it would be interesting to model the > warm water > flow of the Japan Current as it encounters the Bering Strait. Only > 53 Miles > wide the warm water flow change into the artic with less than a > meter rise > in the sea level would be large (%age) and have a butterfly effect > on the artic. > On the converse, a probject to place a meter+ thick gravel flow > barrier would > be an engineering project akin to a railroad ballast 53 miles long > (easy). > With GPS locators dredge/ fill/ rock could be placed with precision > to this end and PERHAPS > reverse the shrinking of the artic ice sheet and increase the > albedo of > the earth and perhaps restoring the status quo in this regard. > Of course as usual such a barrier has some political implications. Putin's building 5 new aircraft carriers not even days after oil was found underneath the northpole with a russian flag on the bottom already. If there is ice once again over there how is he gonna get out the oil out of there? > OK grosly simplified but there are not many environmental pinch points > with as much global leverage. > > http://en.wikipedia.org/wiki/Kuroshio > > Others are thinking about this. But are they able to modeling it? > What we need is accurate calculations. All the accuracy goes to military currently not to climate modelling it seems. Why is that? Politicians just 4 years to power each one of 'em? > http://psc.apl.washington.edu/HLD/Bstrait/bstrait.html > > > -- > T o m M i t c h e l l > Found me a new hat, now what? > > PS: the critical point that the Bering Strait might play here was > first expressed to me by Ed McCullough then dean of Geology at the > University > of Arizona c. 1969. > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Wed Dec 10 15:14:21 2008 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <200812091535.03920.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: <49404D4D.6060203@cse.ucdavis.edu> Since you mentioned the rear door exchangers, I figured I'd mention a related solution for machine rooms that can't handle the heat density of today's 1U/blade racks. Liebert makes a rack top cooler that blows air in front of the rack (the cold isle) and sucks in hot air from the rear, and dumps the heat into a water source. Seems like a pretty reasonable design and seems to work well. It doesn't make it any harder to work on the rack/nodes, although I do recommend a wide brimmed hat if you don't like high volumes of cold air blowing on your forehead when you are working on the console. One complications I saw of a design with the retrofitted rear rack cooler was the maximum flow rate they were designed for and how changes in that rate would effect the resulting cooling. Vendors I talked to didn't immediately have CFM numbers for nodes, nor did the rear door vendor have any graphs for cooling delivered vs air temperature and pressure. Nor the the 1U vendors have graphs of airflow delivered relative to backpressure (potentially caused by the rear door). It wasn't at all clear to me if a rear door would work similarly with a 10kw rack with zero room cooling as it would with a 20kw rack with 10kw of room cooling. Not to mention blades/1Us designed to dissipate 20 kw per rack would likely have significantly higher airflow. With all that said I've seen installations that were pretty happy with the rear door solutions as well. From lindahl at pbm.com Wed Dec 10 15:29:43 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> <20081210213929.GB3449@compegg.wr.niftyegg.com> Message-ID: <20081210232943.GC21119@bx9> On Thu, Dec 11, 2008 at 12:01:23AM +0100, Vincent Diepeveen wrote: > Not seldom if some low level programmers go busy with such software it > suddenly speeds up factor 1000. Vincent, the people I know who do climate compare notes on how many model years per cpu year they can compute at a given resolution and algorithm and machine. If someone made a mistake and was 1000X slower than the competition, they'd be aware of it. Your wild claims can be funny at times, but really, you should start a blog instead of posting them here. -- greg From diep at xs4all.nl Wed Dec 10 16:22:23 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: <20081210232943.GC21119@bx9> References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> <20081210213929.GB3449@compegg.wr.niftyegg.com> <20081210232943.GC21119@bx9> Message-ID: <4CC1AA71-F8A4-4D30-A7BB-9D26580AD7E6@xs4all.nl> Heh Greg, nice to hear something from you. But... ...since when are you expert on garbage dumps as well? For someone who said he would shredder all emails i shipped, you seem to have the remarkable quality to recover stuff from the garbage dump. Vincent On Dec 11, 2008, at 12:29 AM, Greg Lindahl wrote: > On Thu, Dec 11, 2008 at 12:01:23AM +0100, Vincent Diepeveen wrote: > >> Not seldom if some low level programmers go busy with such >> software it >> suddenly speeds up factor 1000. > > Vincent, the people I know who do climate compare notes on how many > model years per cpu year they can compute at a given resolution and > algorithm and machine. If someone made a mistake and was 1000X slower > than the competition, they'd be aware of it. > > Your wild claims can be funny at times, but really, you should start a > blog instead of posting them here. > > -- greg > > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From lindahl at pbm.com Wed Dec 10 17:17:01 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: <4CC1AA71-F8A4-4D30-A7BB-9D26580AD7E6@xs4all.nl> References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> <20081210213929.GB3449@compegg.wr.niftyegg.com> <20081210232943.GC21119@bx9> <4CC1AA71-F8A4-4D30-A7BB-9D26580AD7E6@xs4all.nl> Message-ID: <20081211011701.GA3780@bx9> On Thu, Dec 11, 2008 at 01:22:23AM +0100, Vincent Diepeveen wrote: > For someone who said he would shredder all emails i shipped, > you seem to have the remarkable quality to recover stuff from the > garbage dump. Vincent, Stop making stuff up. I don't believe I've ever said I was "shredder"ing your emails. I did say (Oct 15th 2008), in a personal email: | Vincent, I don't read most of your blather on the Beowulf | list. Emailing me personally is a waste of your time and mine. I can only hope that your memory for HPC is better than your memory of past disagreements. -- greg From csamuel at vpac.org Wed Dec 10 19:58:54 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] cloning issue, hidden module dependency In-Reply-To: <1617901603.60651228967269477.JavaMail.root@mail.vpac.org> Message-ID: <117871361.61381228967934181.JavaMail.root@mail.vpac.org> ----- "Tim Cutts" wrote: > It's very easy to build a custom kernel package with just what > you want using the 'make-kpkg' command, and then you can strip > out all the extraneous cruft if you want Be warned that with 2.6.27 and later you will most likely need to patch the kernel-package scripts to avoid putting firmware directly into /lib/firmware as otherwise you'll get conflicts with other 2.6.27+ packages. The Ubuntu bug report: https://bugs.launchpad.net/ubuntu/+source/kernel-package/+bug/256983 has a link to the upstream fix that I applied manually to my system to fix this. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Wed Dec 10 20:09:27 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] RE: moab In-Reply-To: <8E50F960A9F3F6448D39155B882544E818717F79@EX2K7-VIRT-1.ads.qub.ac.uk> Message-ID: <900957656.61771228968567397.JavaMail.root@mail.vpac.org> ----- "Richard Rankin" wrote: > I will have funding available to purchase some new clusters in the new > year. > > I was hoping to be able to have a cluster with a mix of Linux and > windows nodes so that the mix could be varied depending on the work > load. > > I have been pointed to > http://www.clusterresources.com/pages/products/moab-hybrid-cluster.php > > Has anyone any experience of this Not in that scenario (no Windows here) but I do know a University here who do power up/down nodes based on demand using Moab. We're not doing that as we always have a backlog of jobs waiting to run! Hope that's of some use to you ? cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From algomantra at gmail.com Tue Dec 9 16:47:33 2008 From: algomantra at gmail.com (AlgoMantra) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] For grins...India In-Reply-To: References: Message-ID: <6171110d0812091647h730be9f7sd4323c34c5cda077@mail.gmail.com> >Maybe some decades from now all power per flop wasting supercomputers will be located in India. >In the long run, they're the only ones on the planet who can afford the energy real cheap.... (Disclaimer: I'm located in Jaipur, India). Vincent, I'm curious why you think we will be able to afford this energy cheaply and where will it come from! ------- -.- 1/f ))) --. ------- ... http://www.algomantra.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081210/e9861e7a/attachment.html From steffen.grunewald at aei.mpg.de Wed Dec 10 01:29:47 2008 From: steffen.grunewald at aei.mpg.de (Steffen Grunewald) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Tesla systems in Germany? In-Reply-To: <9f8092cc0812100118m23d2fdffp433e6a25addac202@mail.gmail.com> <9f8092cc0812100111k6ba52f0ev1ae3571bb1b0aa32@mail.gmail.com> References: <20081209095329.GY16423@casco.aei.mpg.de> <9f8092cc0812100118m23d2fdffp433e6a25addac202@mail.gmail.com> <20081209095329.GY16423@casco.aei.mpg.de> <9f8092cc0812100111k6ba52f0ev1ae3571bb1b0aa32@mail.gmail.com> Message-ID: <20081210092947.GK16423@casco.aei.mpg.de> Thanks John for your thoughts (accidentally they match with mine) > > I'm looking for someone in Germany who already has access to a Tesla > > system. > > I have received a request by a scientist for "a very powerful machine", and > > would like him to run some tests before spending and possibly wasting > > money. > > > In that case, why not just buy a standard Nvidia graphics card? They run the > same CUDA code. You can run your tests and get an idea of possible speedups, > or indeed if the code will run under CUDA, before committing to buy Tesla. > A "very powerful machine" could mean a lot of things - a cluster with a high > core count. A large SMP machine with a huge amount of memory. A dedicated > machine like the QCD calculators. Since he would have been able to use MPI (at least locally to use the available multi-core architecture), and didn't do that, it's still a lot of linear code. Large memory (but not excessive) - yes. SMP - not really. Cluster with high core count: this would give the opportunity to do stupid things on the "several hundreds" scale, but not speed up the single stupid thing. > As Vincent says, you need to look at what the code is before hitting the "I > need Cuda" button. Sometimes the approach to "throw enough money at a problem, and it will resolve itself" is the easier one, compared with the need to power-up your brains :( Actually, I was facing an outcome of "5% speedup if nothing is done about code efficiency", and I wouldn't like wasting money for that result - that's why I was asking for a way to confront that guy with the need to re-work his code, you understand? Thanks for your patience, Steffen From drcoolsanta at gmail.com Wed Dec 10 02:32:06 2008 From: drcoolsanta at gmail.com (Dr Cool Santa) Date: Thu Mar 18 01:08:09 2010 Subject: [Beowulf] Developing software for MPICH clusters? Message-ID: <86b56470812100232n5307706cm9aad031c38c4381d@mail.gmail.com> Till now I have only created Beowulf clusters for my mother who is a theoretical chemist. She needed clusters to run applications and study chemical aspects of various substances to help in her research. I being a programmer found it quite exciting. I sometimes have had programmed software to automate my work, however sometimes that work is too much for my computer to handle that it takes hours and days. I was thinking if someone could tell me how I could convert it into MPI based code. Basically my aim is to divide the work among the computers that are on the cluster. The cluster is comprised of 4 dual core machines so you can understand how much powerful they would be compare to my computer. Also I generally program in C or C++ but I have a vast range of languages to program in. I can explain the main features of such programs with an example. They would compute results of consecutive numbers and store them in some file or database so it doesn't have to compute them again later or something similar. This is just an example my work is more complicates. Basically what I wanted to tell was that the work in itself isn't difficult but the quantity is a lot and so it needs to be divided. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081210/32242bde/attachment.html From drcoolsanta at gmail.com Wed Dec 10 05:51:43 2008 From: drcoolsanta at gmail.com (Dr Cool Santa) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Parallel software for chemists Message-ID: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> Currently in the lab we use Schrodinger and we are looking into NWchem. We'd be interested in knowing about software that a chemist could use that makes use of a parallel supercomputer. And better if it is linux. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081210/f4c78c0c/attachment.html From oneal at dbi.udel.edu Wed Dec 10 06:00:24 2008 From: oneal at dbi.udel.edu (Doug ONeal) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: Rear-door heat exchangers and condensation In-Reply-To: <200812091535.03920.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: The IBM doors look great - will they fit any 19" rack? I have APC Netshelter VX racks and the powered ventilation rear doors are not sufficient any more. Doug On 12/09/2008 09:35 AM, Kilian CAVALOTTI wrote: > Hi all, > > I'd be curious to know if some of you use or have some real-life experience > with rear-door heat exchangers, such as those from SGI [1] or IBM [2]. > > I'm especially interested in feedback about condensation, and operational > water temperature. > > [1]http://www.sgi.fr/synergie/EpisodeXI/articles/3g.shtml > [2]http://www.ibm.com/servers/eserver/xseries/storage/pdf/IBM_RDHx_Spec_Sheet.pdf > > Thanks a lot! > Cheers, From andrew.robbie at gmail.com Wed Dec 10 07:27:27 2008 From: andrew.robbie at gmail.com (Andrew Robbie (GMail)) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Tesla systems in Germany? In-Reply-To: <20081209095329.GY16423@casco.aei.mpg.de> References: <20081209095329.GY16423@casco.aei.mpg.de> Message-ID: On Tue, Dec 9, 2008 at 8:53 PM, Steffen Grunewald < steffen.grunewald@aei.mpg.de> wrote: > Hi, > > I'm looking for someone in Germany who already has access to a Tesla > system. > I have received a request by a scientist for "a very powerful machine", and > would like him to run some tests before spending and possibly wasting > money. Use the dilbert principle: http://pics.livejournal.com/allah_sulu/pic/0002f3h8/g13 i.e. he won't know the difference between a Tesla and something else... An nVidia GTX 280 graphics card isn't much different from a Quadro 5800. The Telsa is basically Quadro 5800s without a graphics port. In my experience, by the time your researcher has ported code to work on the new system, a much faster new iteration will have been released. Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/d22bdf11/attachment.html From brahmaforces at gmail.com Wed Dec 10 21:32:13 2008 From: brahmaforces at gmail.com (arjuna) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Setting up First Beowulf System: Recommendations re racking, linux flavour, and up to date books Message-ID: Dear Beowulfers, After dipping my toes in the pool of Beowulfery (doing research here and there) I am about to sail my ship by creating my beowulf system. I have four PCS that were cutting edge in their time over the past 5 years. I am thinking of mounting them on a rack, connecting them with ethernet cables. I would summon your wide and deep experiences on the following: 1) Rack ideas, materials and warnings 2) Upto date classic Beowulfery books for 4 to 16 nodes 3) The right uptodate books on parrallel programming 4) Which flavour of linux is well adapted for beowulfery and has all the required tools standardly? Any online resources on getting the hardware aspect of it going, ie from the box to the rack... Thanks in advance... -- Best regards, arjuna http://www.brahmaforces.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/222f31f8/attachment.html From brahmaforces at gmail.com Wed Dec 10 21:51:57 2008 From: brahmaforces at gmail.com (arjuna) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware Message-ID: Hello all again: I thought I would add a little more background about myself and the intended cluster. I am an artist and a computer programmer and am planning on using this cluster as a starting point to do research on building an ideal cluster for Animation for my own personal/entrepreneurial work. It would reside in my art studio. As an artist the idea of rack mounting the commodity PCS is much more fun that piling up the PCS. I was thinking of working with a local hardware friend and figuring out how to screw on motherboards onto hardware type racks. Im sure there are better tried and tested racks out there that are not expensive. Any suggestions on the actual physical hardware for constructing racks for upto 16PCs. Also any thoughts on racks versus piles of PCS. A lot of the posts on the internet are old and out of date. I am wondering what the upto date trends are in racking commodity computers to create beowulf clusters. What should i be reading? -- Best regards, arjuna http://www.brahmaforces.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/c3ba31f9/attachment.html From hearnsj at googlemail.com Thu Dec 11 01:33:33 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Parallel software for chemists In-Reply-To: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> References: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> Message-ID: <9f8092cc0812110133q269f58f0o2685ec95a590b727@mail.gmail.com> 2008/12/10 Dr Cool Santa > Currently in the lab we use Schrodinger and we are looking into NWchem. > We'd be interested in knowing about software that a chemist could use that > makes use of a parallel supercomputer. And better if it is linux. > > Its probably worth it for you to join the Computational Chemistry list: http://www.ccl.net/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/efc41906/attachment.html From rgb at phy.duke.edu Thu Dec 11 05:33:45 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Developing software for MPICH clusters? In-Reply-To: <86b56470812100232n5307706cm9aad031c38c4381d@mail.gmail.com> References: <86b56470812100232n5307706cm9aad031c38c4381d@mail.gmail.com> Message-ID: On Wed, 10 Dec 2008, Dr Cool Santa wrote: > Till now I have only created Beowulf clusters for my mother who is a > theoretical chemist. She needed clusters to run applications and study > chemical aspects of various substances to help in her research. > > I being a programmer found it quite exciting. I sometimes have had > programmed software to automate my work, however sometimes that work is too > much for my computer to handle that it takes hours and days. I was thinking > if someone could tell me how I could convert it into MPI based code. > Basically my aim is to divide the work among the computers that are on the > cluster. The cluster is comprised of 4 dual core machines so you can > understand how much powerful they would be compare to my computer. Also I > generally program in C or C++ but I have a vast range of languages to > program in. > I can explain the main features of such programs with an example. They would > compute results of consecutive numbers and store them in some file or > database so it doesn't have to compute them again later or something > similar. This is just an example my work is more complicates. > Basically what I wanted to tell was that the work in itself isn't difficult > but the quantity is a lot and so it needs to be divided. It sounds like you already have a collection of systems running linux in a "beowulf" (cluster) configuration, so I'll focus on just the MPI aspect. Pick an MPI. There are at least three or four to choose from, and I have no particular religious bias towards any of them and expect that all of them would work for your problem. For example, lam is often a "yum install" or "apt get" away, as is openmpi. IIRC mpich(2) has to be built, but it is EASY to build with e.g. src rpms ready to fire up. Look in the following places for mpi examples, in order: * Online, e.g. in articles on www.clustermonkey.net. I think you could very likely find a complete set of tutorials there alone that would take you through your first few programs and out to where you could write/run YOUR code in MPI. * In the source or documentation trees. There are almost always simple example programs there that serve as templates for more complicated parallel programs (they e.g. compute pi or evaluate chunks of the mandelbrot set). * Books. There are some decent books on MPI programming available that should suffice to at least get you started, as before. I think C will do just fine. Good luck. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From landman at scalableinformatics.com Thu Dec 11 05:40:21 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Parallel software for chemists In-Reply-To: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> References: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> Message-ID: <49411845.2010803@scalableinformatics.com> Dr Cool Santa wrote: > Currently in the lab we use Schrodinger and we are looking into NWchem. > We'd be interested in knowing about software that a chemist could use > that makes use of a parallel supercomputer. And better if it is linux. Depends upon the calculations you wish to do. GAMESS for electronic structure is a very nice parallel code, though setting up the parallel system can be a little challenging for the un-initiated. There are quite a few others (Amber, Charmm, ...) What types of calculation do you want to do? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From rgb at phy.duke.edu Thu Dec 11 06:00:32 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: Message-ID: On Thu, 11 Dec 2008, arjuna wrote: > Hello all again: > > I thought I would add a little more background about myself and the intended > cluster. I am an artist and a computer programmer and am planning on using > this cluster as a starting point to do research on building an ideal cluster > for Animation for my own personal/entrepreneurial work. It would reside in > my art studio. As an artist the idea of rack mounting the commodity PCS is > much more fun that piling up the PCS. > > I was thinking of working with a local hardware friend and figuring out how > to screw on motherboards onto hardware type racks. Im sure there are better > tried and tested racks out there that are not expensive. Any suggestions on > the actual physical hardware for constructing racks for upto 16PCs. > > Also any thoughts on racks versus piles of PCS. > > A lot of the posts on the internet are old and out of date. I am wondering > what the upto date trends are in racking commodity computers to create > beowulf clusters. What should i be reading? Look in the online archives -- they aren't old or out of date at all. We just had a brief discussion of rack cases vs tower cases last week, for example. The consensus view from that was (that rack cases and racks tend to be more expensive than tower cases and cheap shelving, that 2U cases were likely to be quieter and less fussy than 1U cases, that there some very "nifty" relatively new micro- form factor cases that work quite well and attractively in shelved machine room environments (suggesting that racks aren't the only way to get an "artistically clean" looking cluster:-), and that either rack or micro is likely to produce a smaller footprint cluster than the old/classic shelf full of towers model. Bladed systems were (as always) mentioned as an alternative and (as always) it was pointed out that bladed systems are an alternative for the truly deep pocketed as they are even more expensive (if more compact) than racked systems. There are enormous clusters built on all of these models. It sounds like you are interested in building a rendering farm. The original render farms for the original rendered cartoon movies were IIRC shelves full of towers, as is IIRC Google, but I'm sure that a lot of them now are racked up. As for "trends" -- I doubt that there are any. Beowulfery is all about designing a cluster to meet your specific needs given your specific application space and budget. A rackmount cluster has certain advantages, but they cost a certain amount extra. At some point you have to face the question of whether you are better off in the long run spending the extra for rackmount boxes or would prefer to get cheaper form factors and get more systems. I hesitate to make pronouncements on what SHOULD differentiate these choices as no matter what I say there will be somebody on list who chose differently, quite probably for good reasons. So with a LARGE grain of salt, I'd say that very very loosely, if you are building your first cluster, a hobby cluster, a low-budget cluster, a small cluster (say less than 32 nodes total), or a production cluster in an environment with lots of physical room and AC/power resources, one or more shelf units of towers is either optimal or perfectly reasonable. If you are a professional with experience building a commercial-grade production cluster, especially one expected to have >=32 nodes in a real machine-room environment, and you aren't horribly constrained in your budget, you're more likely to go with rackmount FF nodes, or in the richest and most space-constrained environments, even blades. But these lines and differentiators are FAR from sharp. I'm sure there are people on list with 100's of shelved nodes (some of them just posted in last week's discussion). There are also bound to be people with racks containing just four or five nodes (somewhat more likely if their four or five nodes are just SOME of the systems in the preexisting rack). At home thus far I've tended to go with towers, at Duke I started with towers years ago but now would only get rackmounts, and I've thought pretty seriously about getting e.g. a half-height rolling rack for home and starting to populate it a few U a year with my very limited budget. The obstacle is that racks are expensive enough that it will cost me AT LEAST one node just to get set up with a rack and a single rackmount system, compared to just buying two equivalent power towers and popping them into my existing $60 steel shelving. OTOH, I could get at least four cores even in a single rackmount chassis, for cheap. OTOOH, I could probably get eight cores one way or another in towers. And while it isn't exactly "my money" I'm spending, the particular pocket of OPM I'm using is quite finite. So ultimately, your decision here will come down to what you want to spend and what you expect to get for it. Beauty? Ease of maintenance and access? A professional look to attract investors? All of these things bear, which is why the choice is not simple. As far as books are concerned, I'll let others answer. My online book is free (so it costs you nothing to start there) but I'm the first to admit that it is dated at this point, especially in its (lack of) treatment of the more advanced networks. Clustermonkey resources are arguably more up to date, also free. Many of the print books on the subject are either similarly out of date or are written by people I've never heard of, which basically means that they don't frequent this list and participate in it, which means that I am skeptical about their value (of the books). The ones I've picked up in the store to thumb through have mostly been pretty forgettable. > Best regards, > arjuna > http://www.brahmaforces.com Two names near and dear to my heart, given that I love the Mahabharata and named my first cluster "Brahma" as well...;-) rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From prentice at ias.edu Thu Dec 11 06:14:54 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] For grins...India In-Reply-To: <20081210232943.GC21119@bx9> References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> <20081210213929.GB3449@compegg.wr.niftyegg.com> <20081210232943.GC21119@bx9> Message-ID: <4941205E.10008@ias.edu> Greg Lindahl wrote: > On Thu, Dec 11, 2008 at 12:01:23AM +0100, Vincent Diepeveen wrote: > >> Not seldom if some low level programmers go busy with such software it >> suddenly speeds up factor 1000. > > Vincent, the people I know who do climate compare notes on how many > model years per cpu year they can compute at a given resolution and > algorithm and machine. If someone made a mistake and was 1000X slower > than the competition, they'd be aware of it. > > Your wild claims can be funny at times, but really, you should start a > blog instead of posting them here. > > -- greg I agree. Vincent constantly makes ridiculous political statements that simply have no place on this list. -- Prentice From prentice at ias.edu Thu Dec 11 06:35:15 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Parallel software for chemists In-Reply-To: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> References: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> Message-ID: <49412523.9030307@ias.edu> Dr Cool Santa wrote: > Currently in the lab we use Schrodinger and we are looking into NWchem. > We'd be interested in knowing about software that a chemist could use > that makes use of a parallel supercomputer. And better if it is linux. > Clarification: I supported Schodinger up until a year ago. Unless things have changed, Jaguar is the only Schrodinger application that can truly make use of a "parallel" supercomputer. The other Schrodinger programs perform calculations that are embarassingly parallel and require no inter-process communications during calculations. In these cases, the data to be analyzed is broken up into smaller pieces that are analyzed individually by the computers with no communication between them. When they are done, the main program reassembles their output to a final result. This is parallel computing, but doesn't require a "parallel supercomputer". It works great, BTW. OpenEye provides some commercial computational chemistry software (conformer generation, docking, etc.), that uses parallel code. Their code uses PVM instead of MPI, which makes OpenEye kind of an odd duck. Last I spoke to OpenEye, they were planning on porting their code to MPI, but don't know if that's been done yet, since I no longer support their software. If you're doing molecular simulations, there's LAMMPS ( Large-scale Atomic/Molecular Massively Parallel Simulator), which is open-source. I actually submitted a *very* small patch to when I ported it to IRIX 6.5 a few years ago. NAMD is also parallel, but I don't know much about it. I compiled it, installed it, but then I don't think the comp chemists ever used it (don't you hate that?). -- Prentice From kilian.cavalotti.work at gmail.com Thu Dec 11 06:47:45 2008 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <20081210200812.GA3449@compegg.wr.niftyegg.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <200812101021.42824.kilian.cavalotti.work@gmail.com> <20081210200812.GA3449@compegg.wr.niftyegg.com> Message-ID: <200812111547.45559.kilian.cavalotti.work@gmail.com> Hi Tom, On Wednesday 10 December 2008 21:08:12 Nifty Tom Mitchell wrote: > Watch dew point numbers in the room. > Dew point is dominantly a function of humidity... > http://en.wikipedia.org/wiki/Dew_point Oh right, that's interesting: """ The dew point is associated with relative humidity. A high relative humidity indicates that the dew point is closer to the current air temperature. Relative humidity of 100% indicates that the dew point is equal to the current temperature (and the air is maximally saturated with water). When the dew point stays constant and temperature increases, relative humidity will decrease. """ I guess that controlling the relative humidity level (most CRAC units can do that, can't they?) and keeping it below say 60% is a pretty simple way to avoid condensation, then. > Many frost free home refrigerators solve this problem by running the > heated exhaust air over the catch pan so any frost/ condensation is > promptly evaporated. With clever airflow management drains may not be > needed but water rots wood, breeds bacteria and attracts bugs and may be > problematic. The bacteria issue is important.... see Legionella > pneumophila. I was only thinking about the hassle of having to mop down your racks every morning, but the point about bacteria is very relevant. That should legitimate a bonus, working in hazardous areas. :) > Right now the outside air dew point in Bryan, Texas is about 19F and > historically gets as high as 69F in December. So yes condensation from > 42F cooling pipes is possible and should be part of the management/ > monitoring process. I suspect that the campus AC manages the dew point > to the high end of a comfort range that might be about 50 - 54?F in the > US keeping things all OK. i.e. If the building AC manages humidity you > may not have to if they have the capacity to control it at the building > air inlets. There's no such thing as "building AC" where I am, unless you call opening a window "managing the dew point". I guess we won't avoid local equipment in the server room to control relative humidity. > Of course the weather in France is not the same as Texas... looks nice ;-) > something like. > 52 ?F / 11 ?C Light Rain Humidity: 82% Dew Point: 46 ?F / 8 ?C That's pretty much it, even on the south coast: gray, rainy and cold. Man, I so miss California... :) Thanks for the insight, Cheers, -- Kilian From kilian.cavalotti.work at gmail.com Thu Dec 11 06:57:37 2008 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: Rear-door heat exchangers and condensation In-Reply-To: References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: <200812111557.37555.kilian.cavalotti.work@gmail.com> Hi Doug, On Wednesday 10 December 2008 15:00:24 Doug ONeal wrote: > The IBM doors look great - will they fit any 19" rack? > I have APC Netshelter VX racks and the powered ventilation rear doors are > not sufficient any more. If I'm not mistaken, the IBM RDHx doors are manufactured by Vette Corp, and according to their spec sheet, it looks like they can be installed on a variety of standard racks, including Netshelter VXes, with the help of a "transition frame". See : http://www.vettecorp.com/information_center/LiquiCoolRDHx_DataSheet.pdf Cheers, -- Kilian From prentice at ias.edu Thu Dec 11 07:00:45 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Setting up First Beowulf System: Recommendations re racking, linux flavour, and up to date books In-Reply-To: References: Message-ID: <49412B1D.1040205@ias.edu> arjuna wrote: > I have four PCS that were cutting edge in their time over the past 5 > years. I am thinking of mounting them on a rack, connecting them with > ethernet cables. > > I would summon your wide and deep experiences on the following: > > 1) Rack ideas, materials and warnings > 2) Upto date classic Beowulfery books for 4 to 16 nodes > 3) The right uptodate books on parrallel programming > 4) Which flavour of linux is well adapted for beowulfery and has all the > required tools standardly? > > Any online resources on getting the hardware aspect of it going, ie from > the box to the rack... 3) There are two aspects to parallel programming: The concepts of parallel programming, and using an actual programming language. Personally, I recommend starting with the "why" and learning the theory of parallel programming. It will make designing effective parallel programs easier. I have these two parallel computing texbooks on my bookshelf: Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition) by Barry Wilkinson and Michael Allen http://www.amazon.com/Parallel-Programming-Techniques-Applications-Workstations/dp/0131405632 Introduction to Parallel Computing (2nd Edition) (Hardcover) by Ananth Grama, George Karypis, Vipin Kumar, Anshul Gupta http://www.amazon.com/Introduction-Parallel-Computing-Ananth-Grama/dp/0201648652 I haven't read either one cover to cover, but I have read portions, an both are relatively easy to read. Most parallel programming is done using MPI, so you might want to start there for actually writing parallel programs. For that, this is a good book: Parallel Programming With MPI (Paperback) by Peter Pacheco http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395/ Again, I haven't read this one in it's entirety, more of a reference for me, since I hardly actually do MPI programming as an admin. It's looks very easy to read. I'd go so far as to say it's the "gold standard" on this topic, since I've seen it recommended over and over again. 4) Any major Linux distro (Red Hat, SUSE, Debian, Ubuntu) will work well. I use a rebuild of RHEL. Not sure which distros have all you need right out of the box. -- Prentice From rgb at phy.duke.edu Thu Dec 11 07:40:28 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Setting up First Beowulf System: Recommendations re racking, linux flavour, and up to date books In-Reply-To: <49412B1D.1040205@ias.edu> References: <49412B1D.1040205@ias.edu> Message-ID: On Thu, 11 Dec 2008, Prentice Bisbal wrote: > Personally, I recommend starting with the "why" and learning the theory > of parallel programming. It will make designing effective parallel > programs easier. I have these two parallel computing texbooks on my > bookshelf: Excellent point. Don't forget Ian Foster's book: http://www-unix.mcs.anl.gov/dbpp/ This has the advantage of being available for free online as well as in hardcover if you prefer it that way. So you can read it NOW and see if it meets your needs, and explore the other books below (where I haven't read Wilkinson and Allen but have looked through GKKG and agree that it's a lovely book) as you can obtain a copy. rgb > Parallel Programming: Techniques and Applications Using Networked > Workstations and Parallel Computers (2nd Edition) > by Barry Wilkinson and Michael Allen > http://www.amazon.com/Parallel-Programming-Techniques-Applications-Workstations/dp/0131405632 > > Introduction to Parallel Computing (2nd Edition) (Hardcover) > by Ananth Grama, George Karypis, Vipin Kumar, Anshul Gupta > http://www.amazon.com/Introduction-Parallel-Computing-Ananth-Grama/dp/0201648652 > > I haven't read either one cover to cover, but I have read portions, an > both are relatively easy to read. Most parallel programming is done > using MPI, so you might want to start there for actually writing > parallel programs. For that, this is a good book: > > Parallel Programming With MPI (Paperback) > by Peter Pacheco > http://www.amazon.com/Parallel-Programming-MPI-Peter-Pacheco/dp/1558603395/ > > Again, I haven't read this one in it's entirety, more of a reference for > me, since I hardly actually do MPI programming as an admin. It's looks > very easy to read. I'd go so far as to say it's the "gold standard" on > this topic, since I've seen it recommended over and over again. > > 4) Any major Linux distro (Red Hat, SUSE, Debian, Ubuntu) will work > well. I use a rebuild of RHEL. Not sure which distros have all you need > right out of the box. > > > > -- > Prentice > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From hearnsj at googlemail.com Thu Dec 11 08:59:07 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Computing in the clouds Message-ID: <9f8092cc0812110859y4fde842tdcda0ae56dddce50@mail.gmail.com> I'm sure Joe is too self-effacing to trumpet this excellent article on the relevance of Cloud Computing to HPC: http://www.linux-mag.com/id/7196/1/ -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/a597e91e/attachment.html From landman at scalableinformatics.com Thu Dec 11 10:26:11 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Computing in the clouds In-Reply-To: <9f8092cc0812110859y4fde842tdcda0ae56dddce50@mail.gmail.com> References: <9f8092cc0812110859y4fde842tdcda0ae56dddce50@mail.gmail.com> Message-ID: <49415B43.3030708@scalableinformatics.com> John Hearns wrote: > I'm sure Joe is too self-effacing to trumpet this excellent article on > the relevance of Cloud Computing to HPC: > > http://www.linux-mag.com/id/7196/1/ Thanks for the pointer :) I hope the (feeble) attempt at humor up front (clouds ... vapor ... hot air) doesn't put anyone off ... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From mathog at caltech.edu Thu Dec 11 12:10:58 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: cloning issue, hidden module dependency Message-ID: Finally wrote a working image on the one node with a different mobo, it was a minor mess doing so. In order to figure out the values for modprobe.conf: alias eth0 8139too install usb-interface /sbin/modprobe uhci_hcd; /bin/true install ide-controller /sbin/modprobe via82cxxx; /sbin/modprobe ide_generic; /bin/true alias pci:v000010ECd00008139sv000010ECsd00008139bc02sc00i00 8139too I still had to to do a basic install on that node The first 3 lines I could probably have figured out eventually from the modprobe.conf from the previous release, but that last line, no way. For future reference: 1. write / and /boot from an image made from an S2466 system using a boel3 script (this also wrote all known node specific files) 2. ^C to break to boel shell 3. mkdir /a mount /dev/hda3 /a mount /dev/hda1 /a/boot chroot /a # file name will change for each release!!! rm -f /boot/initrd-2.6.24.7-desktop-2mnb.img # expect and ignore warnings mkinitrd /boot/initrd-2.6.24.7-desktop-2mnb.img 2.6.24.7-desktop-2mnb # the preceding stomped our custom inittab, put it back cp -f /etc/inittab.saf /etc/inittab exit reboot 4. Need to run lilo now or at reboot it STILL comes up using the wrong modules. You can guess how I found that out. Lilo won't work reliably chroot from boel3 with this Mandriva distro, and boel3 has no lilo of its own. So pxe boot the node with PLD 2.01, then at the prompt: mkdir /a mount /dev/hda3 /a mount /dev/hda1 /boot lilo -C /a/etc/lilo.conf #reset dhcpd on the master so this node will boot from internal disk reboot At least reinstalling on this node would now be less painful, since the new initrd has been stored on the SI server, and could be put in place by just copying it in the installation script. All and all though, this isn't a very elegant way to install a system image on a different type of machine. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From lindahl at pbm.com Thu Dec 11 13:38:57 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices Message-ID: <20081211213857.GA7355@bx9> I was recently surprised to learn that SSD prices are down in the $2-$3 per gbyte range. I did a survey of one brand (OCZ) at NexTag and it was: 256 gigs = $700 128 gigs = 300 64 gigs = 180 32 gigs = 70 Also, Micron is saying that they're going to get into the business of PCIe-attached flash, which will give us a second source for what Fusion-io is shipping today. If you're on the "I like a real system disk" side of the diskless/diskfull fence, these SSDs ought to be a lot more reliable than tradtional disks. And I'd like to get rid of the mirrored disks in our developer desktops... -- greg From rgb at phy.duke.edu Thu Dec 11 14:28:15 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <20081211213857.GA7355@bx9> References: <20081211213857.GA7355@bx9> Message-ID: On Thu, 11 Dec 2008, Greg Lindahl wrote: > I was recently surprised to learn that SSD prices are down in the > $2-$3 per gbyte range. I did a survey of one brand (OCZ) at NexTag > and it was: > > 256 gigs = $700 > 128 gigs = 300 > 64 gigs = 180 > 32 gigs = 70 > > Also, Micron is saying that they're going to get into the business of > PCIe-attached flash, which will give us a second source for what > Fusion-io is shipping today. > > If you're on the "I like a real system disk" side of the > diskless/diskfull fence, these SSDs ought to be a lot more reliable > than tradtional disks. And I'd like to get rid of the mirrored > disks in our developer desktops... Very useful information, as I'm on the "real system disk" side. Lagging real hard disk by what, a decade? But catching up, and 32 GB is really plenty anyway, whether for a node or for a workstation, right up to where you start putting your entire music/movie collection on it. I'll have to get a 32 GB chip and see if I can boot my laptop from it. I definitely can boot from USB flash (and carry linux in my pocket routinely these days:-) and it is down to something like 16 GB for $30. 16 GB is actually the size of / on my current laptop, and a fairly fat Fedora 9 (lots of games and other stuff to play with and try out) still leaves 5 GB. Thanks! rgb > > -- greg > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From hahn at mcmaster.ca Thu Dec 11 14:57:30 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: Rear-door heat exchangers and condensation In-Reply-To: References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: > Netshelter VX racks and the powered ventilation rear doors are not > sufficient any more. why do you have doors on your rack? normally, a rack is filled with servers with fans that generate the standard front-to-back airflow. that means that you want no doors, or at least only high-perf mesh ones. I've seen some wacky things in machinerooms - closed racks with just a small fan in the top, for instance. or racks with 1U servers each carefully separated by 2-3U of open space. here's the way I think of it: try to make your airflow cycle as close to a simple cycle as possible. get all the air coming out of the chiller to impinge (only and as naturally as possible) on the front of the rack(s). get all the hot air from the back to to the chiller intake as naturally as possible. no mixing, no bypass, no counter-rotation, minimizing total air path as well as changes in the airflow vectors. ideally, machine room should be divided into hot and cold halves, with no free flow between them. there are, of course some sweet spots (as well as "sour" ones): very close to the chiller, cold air velocity may be high enough to under-supply hot racks. far away from the chillers, the problem is both supply and the hot-air/return path. ducting and plenums are invaluable, but IMO the main goal is partitioning hot from cold. once airflow is relatively sane and controllable, it's more doable to measure dissipation in a rack, as well as its airflow and delta-t to see whether an auxiliary chiller is necessary. I would be very reluctant to add "spot fixes" to a particular rack without having a very good handle on the full flow/circulation/temp/power picture... >> I'm especially interested in feedback about condensation, and operational >> water temperature. incidentally, before committing to chilled water, make sure that it runs year-round with reasonable temperature and flow. our first machineroom was stable only during summer, because the campus chilled water loop was warmer and poorly chilled during the winter when (office) load was low. From lindahl at pbm.com Thu Dec 11 15:14:52 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: Rear-door heat exchangers and condensation In-Reply-To: References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: <20081211231452.GC29359@bx9> On Thu, Dec 11, 2008 at 05:57:30PM -0500, Mark Hahn wrote: > why do you have doors on your rack? normally, a rack is filled with > servers with fans that generate the standard front-to-back airflow. > that means that you want no doors, or at least only high-perf mesh ones. Sometimes a server which is off is too hot to turn on, thanks to its neighbors. One way to solve that is to have fans in the rack back door. But these days it's hard to get enough airflow unless the entire door is perforated, which makes fans in the door useless. -- greg From csamuel at vpac.org Thu Dec 11 15:44:29 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Developing software for MPICH clusters? In-Reply-To: Message-ID: <1939446205.112551229039069647.JavaMail.root@mail.vpac.org> /* Second attempt, this time with caffeine.. */ ----- "Robert G. Brown" wrote: > For example, lam is often a "yum install" or "apt get" > away, as is openmpi. I would suggest that if someone is starting out and interested in LAM-MPI then they try OpenMPI first as LAM-MPI is now just in maintenance mode with all development work switched to OpenMPI. According to the LAM web page the developers are trying to encourage "all users to try migrating to Open MPI" cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From dnlombar at ichips.intel.com Thu Dec 11 15:36:04 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <20081211213857.GA7355@bx9> References: <20081211213857.GA7355@bx9> Message-ID: <20081211233604.GA12826@nlxdcldnl2.cl.intel.com> On Thu, Dec 11, 2008 at 01:38:57PM -0800, Greg Lindahl wrote: > I was recently surprised to learn that SSD prices are down in the > $2-$3 per gbyte range. I did a survey of one brand (OCZ) at NexTag > and it was: > > 256 gigs = $700 > 128 gigs = 300 > 64 gigs = 180 > 32 gigs = 70 FWIW, newegg shows a $20 rebate on the 32g. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From csamuel at vpac.org Thu Dec 11 15:54:57 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Parallel software for chemists In-Reply-To: <1951474958.112651229039486564.JavaMail.root@mail.vpac.org> Message-ID: <2119423501.112691229039697517.JavaMail.root@mail.vpac.org> ----- "Prentice Bisbal" wrote: > NAMD is also parallel, but I don't know much about it. I compiled it, > installed it, but then I don't think the comp chemists ever used it > (don't you hate that?). We've got NAMD here (it's a molecular dynamics program), it's not what you'd call a trivial application to build. ;-) Building it as an MPI application (i.e. getting Charm++ to use MPI rather than its own custom framework) makes it a lot easier to use though, especially if you use PBS and have a TM aware MPI launcher installed (like OpenMPI, LAM or Pete Wyckoffs excellent mpiexec replacement). cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From csamuel at vpac.org Thu Dec 11 16:04:47 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <885368590.112911229040025006.JavaMail.root@mail.vpac.org> Message-ID: <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> ----- "Greg Lindahl" wrote: > If you're on the "I like a real system disk" side of the > diskless/diskfull fence, these SSDs ought to be a lot more reliable > than tradtional disks. And I'd like to get rid of the mirrored > disks in our developer desktops... Hmm, I was thinking that until I read this blog post by one of the kernel filesystem developers (Val Henson from Intel) who had some (possibly Apple specific) concerns about data corruption & reliability and why she still chooses spinning disks over SSD. http://valhenson.livejournal.com/25228.html This is one of the reasons I'm *really* interested to get btrfs going on my Dell E4200 which has a 128GB SSD, data checksums (and duplicate copies of data) are good.. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From lindahl at pbm.com Thu Dec 11 16:18:59 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> References: <885368590.112911229040025006.JavaMail.root@mail.vpac.org> <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> Message-ID: <20081212001859.GA7929@bx9> On Fri, Dec 12, 2008 at 11:04:47AM +1100, Chris Samuel wrote: > Hmm, I was thinking that until I read this blog post by > one of the kernel filesystem developers (Val Henson from > Intel) who had some (possibly Apple specific) concerns > about data corruption & reliability and why she still > chooses spinning disks over SSD. Nothing new in that blog post. We'll find out the actual reliability of this generation of flash SSD when they've been around for a while, and not an anecdote sooner. -- greg From james.p.lux at jpl.nasa.gov Thu Dec 11 17:01:23 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> References: <885368590.112911229040025006.JavaMail.root@mail.vpac.org> <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> Message-ID: > > Hmm, I was thinking that until I read this blog post by one > of the kernel filesystem developers (Val Henson from > Intel) who had some (possibly Apple specific) concerns about > data corruption & reliability and why she still chooses > spinning disks over SSD. > > http://valhenson.livejournal.com/25228.html > > This is one of the reasons I'm *really* interested to get > btrfs going on my Dell E4200 which has a 128GB SSD, data > checksums (and duplicate copies of data) are good.. > She raises some interesting points, the most significant of which is that "real data" is very hard to come by. We are starting to use Flash memory for storage on spacecraft, and, of course, wear-out is a big deal. OTOH, it's something we know how to deal with (since spacecraft like Galileo used magnetic tape for storage), even when the medium gets old and decrepit. There's also some traps for the unwary for flash that don't apply to mechanical storage: What if a software bug hammers on one location accidentally, and whips through all 100,000 cycles of its life in a day? Of course, in the space biz, we don't have the issue she identifies of different suppliers. Hah.. We have "traceability to sand", and if someone finds out that a contaminated cigarette butt was dropped on that sand back in 1953, we'll get an alert for all parts potentially made from that sand. (Real fun when the part is something like a 2N2222 NPN transistor or a 51 ohm resistor) From lindahl at pbm.com Thu Dec 11 19:22:54 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: References: <885368590.112911229040025006.JavaMail.root@mail.vpac.org> <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> Message-ID: <20081212032254.GA29300@bx9> On Thu, Dec 11, 2008 at 05:01:23PM -0800, Lux, James P wrote: > We are starting to use Flash memory for storage on spacecraft, and, > of course, wear-out is a big deal. Yeah, but you have the huge advantage that you get to write your own wear-leveling software. -- greg From rgb at phy.duke.edu Thu Dec 11 20:38:58 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <20081212001859.GA7929@bx9> References: <885368590.112911229040025006.JavaMail.root@mail.vpac.org> <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> <20081212001859.GA7929@bx9> Message-ID: On Thu, 11 Dec 2008, Greg Lindahl wrote: > On Fri, Dec 12, 2008 at 11:04:47AM +1100, Chris Samuel wrote: > >> Hmm, I was thinking that until I read this blog post by >> one of the kernel filesystem developers (Val Henson from >> Intel) who had some (possibly Apple specific) concerns >> about data corruption & reliability and why she still >> chooses spinning disks over SSD. > > Nothing new in that blog post. We'll find out the actual reliability > of this generation of flash SSD when they've been around for a while, > and not an anecdote sooner. It does look worth noting that one should get SLC and not MLC SSD for any "disk like" application. It's faster (10x faster) and they argue much more reliable. More expensive, too, of course. I think somebody mentioned Transcend -- they apparently are concentrating on SLC only (and claim ECC) and their prices still look pretty competitive. I'm not sure SSD is perfect for userspace hard storage, but for basic operating system images it seems reasonable. How many times does one write to the read-mostly stuff in /, /usr, /lib, /etc? Surely nothing like the thousands of times minimum one is SUPPOSED to be able to rewrite. rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From james.p.lux at jpl.nasa.gov Thu Dec 11 20:46:30 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <20081212032254.GA29300@bx9> Message-ID: On 12/11/08 7:22 PM, "Greg Lindahl" wrote: > On Thu, Dec 11, 2008 at 05:01:23PM -0800, Lux, James P wrote: > >> We are starting to use Flash memory for storage on spacecraft, and, >> of course, wear-out is a big deal. > > Yeah, but you have the huge advantage that you get to write your own > wear-leveling software. > > -- greg > True enough. But just as the article pointed out with respect to the SSD, we often have some sort of hardware flash controller between the flash and the CPU. Or, you're integrating some subsystem designed and built by someone else (or a spare from a previous mission). But a bigger deal is that the whole wear out thing is not super well understood (in a predictability sense). For instance, it's sensitive to the temperature. From lindahl at pbm.com Thu Dec 11 21:18:06 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: References: <885368590.112911229040025006.JavaMail.root@mail.vpac.org> <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> <20081212001859.GA7929@bx9> Message-ID: <20081212051806.GA7589@bx9> On Thu, Dec 11, 2008 at 11:38:58PM -0500, Robert G. Brown wrote: > It does look worth noting that one should get SLC and not MLC SSD for > any "disk like" application. It's faster (10x faster) and they argue > much more reliable. More expensive, too, of course. Transcend has both MLC and SLC, and they charge 3X as much for SLC, if NexTag is finding low prices properly. (Incidentally, NexTag claims the lowest price for a 32G USB stick is about the same as a 32gb OCZ drive. Hm.) But Fusion-io's specsheet says that their MLC board is only a little slower than their SLC board. That's a high-end controller with more channels, but you'll see that in low-end drives in the next generation. Perhaps most of the problems people report in the low-end drives might be crappy firmware on that Jmicron controller everyone hates. Certainly Apple doesn't seem to have a problem making flash devices reasonably reliable. But they control all the firmware. > I'm not sure SSD is perfect for userspace hard storage, but for basic > operating system images it seems reasonable. How many times does one > write to the read-mostly stuff in /, /usr, /lib, /etc? Not very often, but there's always /var and /tmp and swap to worry about. The other guys at Blekko had a very bad experience with flash 5 years ago on Linux network appliances at their previous startup. It is unclear to me if that was a bad batch of flash, or dumb software, or what. -- greg p.s. While I'm at it, I think that these SATA-to-USB gizmos are pretty cool: http://www.thinkgeek.com/computing/drives/a7ea/?cpg=ab Lots of people seem to sell 'em. From hunting at ix.netcom.com Thu Dec 11 22:09:10 2008 From: hunting at ix.netcom.com (Michael Huntingdon) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: Rear-door heat exchangers and condensation In-Reply-To: <20081211231452.GC29359@bx9> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <20081211231452.GC29359@bx9> Message-ID: <7493914765164F50BFEB0183BBA5995C@MichaelPC> ...or rather than worry about door perforation, you make sure you've looked at solutions that take advantage of the best hw/sw currently available. Today it really is not just about cooling a group of 1u systems or a single hptc cabinet. Today maybe you know you need two densely populated cabinets. But you also know your requirements will double or maybe triple a year from now. You're in math, and you know the same applies to CSE and chemistry. Let's not forget physics, and oh, by the way, campus central computing wants to bring it all together, manage and maintain it for you. Sound familiar? Are they offering to manage your fans or your rear door heat exchangers? If so, they and/or you are in big trouble. Seems as though it's about how you most effectively/efficiently cool each cabinet in concert with the rest of your cabs (systems, storage, networking) and technology in the data center. There are engineering groups that do little more than eat/sleep/drink this stuff, so let me know if you are seriously interested in talking about how to bring all the technologies together necessary to manage the environmental requirements of your densely architected stand alone cluster, or clusters of clusters within a data center. cheers...michael ----- Original Message ----- From: "Greg Lindahl" To: "Mark Hahn" Cc: "Beowulf Mailing List" Sent: Thursday, December 11, 2008 3:14 PM Subject: Re: [Beowulf] Re: Rear-door heat exchangers and condensation > On Thu, Dec 11, 2008 at 05:57:30PM -0500, Mark Hahn wrote: > >> why do you have doors on your rack? normally, a rack is filled with >> servers with fans that generate the standard front-to-back airflow. >> that means that you want no doors, or at least only high-perf mesh ones. > > Sometimes a server which is off is too hot to turn on, thanks to its > neighbors. One way to solve that is to have fans in the rack back > door. But these days it's hard to get enough airflow unless the entire > door is perforated, which makes fans in the door useless. > > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From bernard at vanhpc.org Thu Dec 11 23:01:31 2008 From: bernard at vanhpc.org (Bernard Li) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] ntpd wonky? In-Reply-To: <20081209180633.GA21193@bx9> References: <20081209180633.GA21193@bx9> Message-ID: Hi Greg: On Tue, Dec 9, 2008 at 10:06 AM, Greg Lindahl wrote: > Ever since the US daylight savings time change, I've been seeing a lot > of jitter in the ntp servers I'm synched to... I'm using the redhat > pool. Has anyone else noticed this? On 200 machines I get several > complaints per day of >100 ms jitter from my hourly check-ntp cronjob. Have you tried other pools eg. pool.ntp.org? Cheers, Bernard From eugen at leitl.org Thu Dec 11 23:56:55 2008 From: eugen at leitl.org (Eugen Leitl) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer Message-ID: <20081212075655.GK11544@leitl.org> http://www.goodgearguide.com.au/article/270416/inside_tsubame_-_nvidia_gpu_supercomputer?fp=&fpid=&pf=1 Inside Tsubame - the Nvidia GPU supercomputer Tokyo Tech University's Tsubame supercomputer attained 29th ranking in the new Top 500, thanks in part to hundreds of Nvidia Tesla graphics cards. Martyn Williams (IDG News Service) 10/12/2008 12:20:00 When you enter the computer room on the second floor of Tokyo Institute of Technology's computer building, you're not immediately struck by the size of Japan's second-fastest supercomputer. You can't see the Tsubame computer for the industrial air conditioning units that are standing in your way, but this in itself is telling. With more than 30,000 processing cores buzzing away, the machine consumes a megawatt of power and needs to be kept cool. Tsubame was ranked 29th-fastest supercomputer in the world in the latest Top 500 ranking with a speed of 77.48T Flops (floating point operations per second) on the industry-standard Linpack benchmark. While its position is relatively good, that's not what makes it so special. The interesting thing about Tsubame is that it doesn't rely on the raw processing power of CPUs (central processing units) alone to get its work done. Tsubame includes hundreds of graphics processors of the same type used in consumer PCs, working alongside CPUs in a mixed environment that some say is a model for future supercomputers serving disciplines like material chemistry. Graphics processors (GPUs) are very good at quickly performing the same computation on large amounts of data, so they can make short work of some problems in areas such as molecular dynamics, physics simulations and image processing. "I think in the vast majority of the interesting problems in the future, the problems that affect humanity where the impact comes from nature ... requires the ability to manipulate and compute on a very large data set," said Jen-Hsun Huang, CEO of Nvidia, who spoke at the university this week. Tsubame uses 680 of Nvidia's Tesla graphics cards. Just how much of a difference do the GPUs make? Takayuki Aoki, a professor of material chemistry at the university, said that simulations that used to take three months now take 10 hours on Tsubame. Tsubame itself - once you move past the air-conditioners - is split across several rooms in two floors of the building and is largely made up of rack-mounted Sun x4600 systems. There are 655 of these in all, each of which has 16 AMD Opteron CPU cores inside it, and Clearspeed CSX600 accelerator boards. The graphics chips are contained in 170 Nvidia Tesla S1070 rack-mount units that have been slotted in between the Sun systems. Each of the 1U Nvidia systems has four GPUs inside, each of which has 240 processing cores for a total of 960 cores per system. The Tesla systems were added to Tsubame over the course of about a week while the computer was operating. "People thought we were crazy," said Satoshi Matsuoka, director of the Global Scientific Information and Computing Center at the university. "This is a ?1 billion (US$11 million) supercomputer consuming a megawatt of power, but we proved technically that it was possible." The result is what university staff call version 1.2 of the Tsubame supercomputer. "I think we should have been able to achieve 85 [T Flops], but we ran out of time so it was 77 [T Flops]," said Matsuoka of the benchmarks performed on the system. At 85T Flops it would have risen a couple of places in the Top 500 and been ranked fastest in Japan. There's always next time: A new Top 500 list is due out in June 2009, and Tokyo Institute of Technology is also looking further ahead. "This is not the end of Tsubame, it's just the beginning of GPU acceleration becoming mainstream," said Matsuoka. "We believe that in the world there will be supercomputers registering several petaflops in the years to come, and we would like to follow suit." Tsubame 2.0, as he dubbed the next upgrade, should be here within the next two years and will boast a sustained performance of at least a petaflop (a petaflop is 1,000 teraflops), he said. The basic design for the machine is still not finalized but it will continue the heterogeneous computing base of mixing CPUs and GPUs, he said. From award at uda.ad Fri Dec 12 02:02:03 2008 From: award at uda.ad (Alan Ward) Date: Thu Mar 18 01:08:10 2010 Subject: RS: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer References: <20081212075655.GK11544@leitl.org> Message-ID: Very interesting, but perhaps a bit of an overkill. How many TFlop/Watt does that figure out as? :-( Cheers, -Alan -----Missatge original----- De: beowulf-bounces@beowulf.org en nom de Eugen Leitl Enviat el: dv. 12/12/2008 08:56 Per a: info@postbiota.org; Beowulf@beowulf.org Tema: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer http://www.goodgearguide.com.au/article/270416/inside_tsubame_-_nvidia_gpu_supercomputer?fp=&fpid=&pf=1 Inside Tsubame - the Nvidia GPU supercomputer Tokyo Tech University's Tsubame supercomputer attained 29th ranking in the new Top 500, thanks in part to hundreds of Nvidia Tesla graphics cards. Martyn Williams (IDG News Service) 10/12/2008 12:20:00 When you enter the computer room on the second floor of Tokyo Institute of Technology's computer building, you're not immediately struck by the size of Japan's second-fastest supercomputer. You can't see the Tsubame computer for the industrial air conditioning units that are standing in your way, but this in itself is telling. With more than 30,000 processing cores buzzing away, the machine consumes a megawatt of power and needs to be kept cool. Tsubame was ranked 29th-fastest supercomputer in the world in the latest Top 500 ranking with a speed of 77.48T Flops (floating point operations per second) on the industry-standard Linpack benchmark. While its position is relatively good, that's not what makes it so special. The interesting thing about Tsubame is that it doesn't rely on the raw processing power of CPUs (central processing units) alone to get its work done. Tsubame includes hundreds of graphics processors of the same type used in consumer PCs, working alongside CPUs in a mixed environment that some say is a model for future supercomputers serving disciplines like material chemistry. Graphics processors (GPUs) are very good at quickly performing the same computation on large amounts of data, so they can make short work of some problems in areas such as molecular dynamics, physics simulations and image processing. "I think in the vast majority of the interesting problems in the future, the problems that affect humanity where the impact comes from nature ... requires the ability to manipulate and compute on a very large data set," said Jen-Hsun Huang, CEO of Nvidia, who spoke at the university this week. Tsubame uses 680 of Nvidia's Tesla graphics cards. Just how much of a difference do the GPUs make? Takayuki Aoki, a professor of material chemistry at the university, said that simulations that used to take three months now take 10 hours on Tsubame. Tsubame itself - once you move past the air-conditioners - is split across several rooms in two floors of the building and is largely made up of rack-mounted Sun x4600 systems. There are 655 of these in all, each of which has 16 AMD Opteron CPU cores inside it, and Clearspeed CSX600 accelerator boards. The graphics chips are contained in 170 Nvidia Tesla S1070 rack-mount units that have been slotted in between the Sun systems. Each of the 1U Nvidia systems has four GPUs inside, each of which has 240 processing cores for a total of 960 cores per system. The Tesla systems were added to Tsubame over the course of about a week while the computer was operating. "People thought we were crazy," said Satoshi Matsuoka, director of the Global Scientific Information and Computing Center at the university. "This is a ?1 billion (US$11 million) supercomputer consuming a megawatt of power, but we proved technically that it was possible." The result is what university staff call version 1.2 of the Tsubame supercomputer. "I think we should have been able to achieve 85 [T Flops], but we ran out of time so it was 77 [T Flops]," said Matsuoka of the benchmarks performed on the system. At 85T Flops it would have risen a couple of places in the Top 500 and been ranked fastest in Japan. There's always next time: A new Top 500 list is due out in June 2009, and Tokyo Institute of Technology is also looking further ahead. "This is not the end of Tsubame, it's just the beginning of GPU acceleration becoming mainstream," said Matsuoka. "We believe that in the world there will be supercomputers registering several petaflops in the years to come, and we would like to follow suit." Tsubame 2.0, as he dubbed the next upgrade, should be here within the next two years and will boast a sustained performance of at least a petaflop (a petaflop is 1,000 teraflops), he said. The basic design for the machine is still not finalized but it will continue the heterogeneous computing base of mixing CPUs and GPUs, he said. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081212/e979e129/attachment.html From diep at xs4all.nl Fri Dec 12 02:50:51 2008 From: diep at xs4all.nl (Vincent Diepeveen) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer In-Reply-To: <20081212075655.GK11544@leitl.org> References: <20081212075655.GK11544@leitl.org> Message-ID: <34568AAB-6A20-44B7-B80B-FA8BB92AC1F6@xs4all.nl> On Dec 12, 2008, at 8:56 AM, Eugen Leitl wrote: > > http://www.goodgearguide.com.au/article/270416/inside_tsubame_- > _nvidia_gpu_supercomputer?fp=&fpid=&pf=1 > > Inside Tsubame - the Nvidia GPU supercomputer > > Tokyo Tech University's Tsubame supercomputer attained 29th ranking > in the > new Top 500, thanks in part to hundreds of Nvidia Tesla graphics > cards. > > Martyn Williams (IDG News Service) 10/12/2008 12:20:00 > > When you enter the computer room on the second floor of Tokyo > Institute of > Technology's computer building, you're not immediately struck by > the size of > Japan's second-fastest supercomputer. You can't see the Tsubame > computer for > the industrial air conditioning units that are standing in your > way, but this > in itself is telling. With more than 30,000 processing cores > buzzing away, > the machine consumes a megawatt of power and needs to be kept cool. > 1000000 watt / 77480 gflop = 12.9 watt per gflop. If you run double precision codes on this box it is a big energy waster IMHO. (of course it's very well equipped for all kind of crypto codes using that google library). Vincent > Tsubame was ranked 29th-fastest supercomputer in the world in the > latest Top > 500 ranking with a speed of 77.48T Flops (floating point operations > per > second) on the industry-standard Linpack benchmark. > > While its position is relatively good, that's not what makes it so > special. > The interesting thing about Tsubame is that it doesn't rely on the raw > processing power of CPUs (central processing units) alone to get > its work > done. Tsubame includes hundreds of graphics processors of the same > type used > in consumer PCs, working alongside CPUs in a mixed environment that > some say > is a model for future supercomputers serving disciplines like material > chemistry. > > Graphics processors (GPUs) are very good at quickly performing the > same > computation on large amounts of data, so they can make short work > of some > problems in areas such as molecular dynamics, physics simulations > and image > processing. > > "I think in the vast majority of the interesting problems in the > future, the > problems that affect humanity where the impact comes from > nature ... requires > the ability to manipulate and compute on a very large data set," said > Jen-Hsun Huang, CEO of Nvidia, who spoke at the university this > week. Tsubame > uses 680 of Nvidia's Tesla graphics cards. > > Just how much of a difference do the GPUs make? Takayuki Aoki, a > professor of > material chemistry at the university, said that simulations that > used to take > three months now take 10 hours on Tsubame. > > Tsubame itself - once you move past the air-conditioners - is split > across > several rooms in two floors of the building and is largely made up of > rack-mounted Sun x4600 systems. There are 655 of these in all, each > of which > has 16 AMD Opteron CPU cores inside it, and Clearspeed CSX600 > accelerator > boards. > > The graphics chips are contained in 170 Nvidia Tesla S1070 rack- > mount units > that have been slotted in between the Sun systems. Each of the 1U > Nvidia > systems has four GPUs inside, each of which has 240 processing > cores for a > total of 960 cores per system. > > The Tesla systems were added to Tsubame over the course of about a > week while > the computer was operating. > > "People thought we were crazy," said Satoshi Matsuoka, director of > the Global > Scientific Information and Computing Center at the university. > "This is a ?1 > billion (US$11 million) supercomputer consuming a megawatt of > power, but we > proved technically that it was possible." > > The result is what university staff call version 1.2 of the Tsubame > supercomputer. > > "I think we should have been able to achieve 85 [T Flops], but we > ran out of > time so it was 77 [T Flops]," said Matsuoka of the benchmarks > performed on > the system. At 85T Flops it would have risen a couple of places in > the Top > 500 and been ranked fastest in Japan. > > There's always next time: A new Top 500 list is due out in June > 2009, and > Tokyo Institute of Technology is also looking further ahead. > > "This is not the end of Tsubame, it's just the beginning of GPU > acceleration > becoming mainstream," said Matsuoka. "We believe that in the world > there will > be supercomputers registering several petaflops in the years to > come, and we > would like to follow suit." > > Tsubame 2.0, as he dubbed the next upgrade, should be here within > the next > two years and will boast a sustained performance of at least a > petaflop (a > petaflop is 1,000 teraflops), he said. The basic design for the > machine is > still not finalized but it will continue the heterogeneous > computing base of > mixing CPUs and GPUs, he said. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From Florent.Calvayrac at univ-lemans.fr Fri Dec 12 03:05:56 2008 From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac-Castaing) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer - OpenCL In-Reply-To: <20081212075655.GK11544@leitl.org> References: <20081212075655.GK11544@leitl.org> Message-ID: <49424594.1010709@univ-lemans.fr> Eugen Leitl wrote: > http://www.goodgearguide.com.au/article/270416/inside_tsubame_-_nvidia_gpu_supercomputer?fp=&fpid=&pf=1 > > Inside Tsubame - the Nvidia GPU supercomputer > > Tokyo Tech University's Tsubame supercomputer attained 29th ranking in the > new Top 500, thanks in part to hundreds of Nvidia Tesla graphics cards. > > Interesting. I understand why, when I submitted a joint exploratory project about GPU computing two years ago with a Japanese colleague we were ranked first in Japan and last in France ; the idea seems more popular in Japan if they can fork millions on an architecture it is not very quick to program for (at least maybe not as fast as Moore's law is increasing power). By the way, has anyone on the list any idea on the prospects of Apple's OpenCL ? We just have started working seriously on CUDA but maybe it is time to change for something more open and maybe easier to program with. PS : I hope this message gets approved to the list ; I had a few rejected messages in the past but when I read the recent drivel I can't help wonder on some mysteries of life. From Bogdan.Costescu at iwr.uni-heidelberg.de Fri Dec 12 03:09:20 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: cloning issue, hidden module dependency In-Reply-To: References: Message-ID: On Thu, 11 Dec 2008, David Mathog wrote: > install usb-interface /sbin/modprobe uhci_hcd; /bin/true > install ide-controller /sbin/modprobe via82cxxx; /sbin/modprobe > ide_generic; /bin/true I don't know why you need these. On all the distributions that I've worked with recently such issues are taken care of by running 'depmod', the resulting files are taken into consideration when running 'mkinitrd'. > alias pci:v000010ECd00008139sv000010ECsd00008139bc02sc00i00 8139too And why do you need this ? Didn't the module detect this hardware ? > 4. Need to run lilo now or at reboot it STILL comes up using the wrong > modules. You can guess how I found that out. Lilo won't work reliably > chroot from boel3 with this Mandriva distro, and boel3 has no lilo > of its own. So pxe boot the node with PLD 2.01, then at the prompt: I think that you've just created more problems by mixing all these different distributions then mixing booting from local disk with PXE... I would not have had such problems with CentOS or Fedora, but I work with these distributions for a long time (and Red Hat Linux before that), so I know them pretty well. Maybe it's time for you to learn some more about the tools that your preferred distribution makes available and how all the pieces are put together ? -- Bogdan Costescu IWR, University of Heidelberg, INF 368, D-69120 Heidelberg, Germany Phone: +49 6221 54 8240, Fax: +49 6221 54 8850 E-mail: bogdan.costescu@iwr.uni-heidelberg.de From tortay at cc.in2p3.fr Fri Dec 12 04:01:56 2008 From: tortay at cc.in2p3.fr (Loic Tortay) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer - OpenCL In-Reply-To: <49424594.1010709@univ-lemans.fr> References: <20081212075655.GK11544@leitl.org> <49424594.1010709@univ-lemans.fr> Message-ID: <494252B4.7040106@cc.in2p3.fr> Florent Calvayrac-Castaing wrote: [...] > > Interesting. > > I understand why, when I submitted a joint exploratory project > about GPU computing two years ago with a Japanese > colleague we were ranked first in Japan and last in France ; the > idea seems more popular in Japan if they can fork millions > on an architecture it is not very quick to program for (at least > maybe not as fast as Moore's law is increasing power). > They may be willing to spend millions because they already have programs able to use the GPUs. If I'm not mistaken, the "Tsubame" cluster was initially using Clearspeed accelerators (in Sun X4600 "fat" nodes). Therefore, they probably have appropriate programs that need little adaptation (or less than many) to work on the GPUs. Lo?c. From csamuel at vpac.org Fri Dec 12 04:32:29 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer - OpenCL In-Reply-To: <49424594.1010709@univ-lemans.fr> Message-ID: <623914342.124881229085149551.JavaMail.root@mail.vpac.org> ----- "Florent Calvayrac-Castaing" wrote: > By the way, has anyone on the list any idea on > the prospects of Apple's OpenCL ? I think we need something that is hardware independent. If OpenCL can deliver that (and Apple obviously believe it can otherwise they'd not have it in Snow Leopard) then I think that's going to be great. The big question for me is where are the implementations going to come from ? My understanding is that Snow Leopard will use the LLVM compiler for it [1], and nVidia will ship support it in their CUDA SDK. AMD have already nailed their colours to the mast and based on past behaviour it might be reasonable to expect that they'll use GCC as their base (which would be nice!). As for Intel, well I guess it'll be in their compiler, though I asked about Larabee and OpenCL on the Intel stand at SC and was told "we don't have anyone here who knows about it, we'll get someone to call you" (nothing yet). cheers, Chris [1] - Given that LLVM is BSD licensed it is unclear whether Apples modifications to implement it will be public or not. -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From laytonjb at att.net Fri Dec 12 05:09:07 2008 From: laytonjb at att.net (Jeff Layton) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <20081212001859.GA7929@bx9> References: <885368590.112911229040025006.JavaMail.root@mail.vpac.org> <1073016340.113081229040287297.JavaMail.root@mail.vpac.org> <20081212001859.GA7929@bx9> Message-ID: <49426273.7030803@att.net> Greg Lindahl wrote: > On Fri, Dec 12, 2008 at 11:04:47AM +1100, Chris Samuel wrote: > > >> Hmm, I was thinking that until I read this blog post by >> one of the kernel filesystem developers (Val Henson from >> Intel) who had some (possibly Apple specific) concerns >> about data corruption & reliability and why she still >> chooses spinning disks over SSD. >> > > Nothing new in that blog post. We'll find out the actual reliability > of this generation of flash SSD when they've been around for a while, > and not an anecdote sooner. > This is one of the ugly secrets about SSD's that haven't gotten out very to the masses. They have data corruption issues. JEDEC has certain requirements for data retention basically as a function of capacity. For SSD's at 10% of the rewrite capacity, you need to retain the data for 10 years. Current crops of SSD's with MLC barely meet this goal (actually it's a function of the NAND's). Neat 100% of the rewrite life of the SSD they need to hold the data for only 1 year! So if you are near the end of life of the SSD, get your data off of there and pronto! Jeff From laytonjb at att.net Fri Dec 12 05:14:47 2008 From: laytonjb at att.net (Jeff Layton) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <20081211213857.GA7355@bx9> References: <20081211213857.GA7355@bx9> Message-ID: <494263C7.6030009@att.net> Greg Lindahl wrote: > I was recently surprised to learn that SSD prices are down in the > $2-$3 per gbyte range. I did a survey of one brand (OCZ) at NexTag > and it was: > > 256 gigs = $700 > 128 gigs = 300 > 64 gigs = 180 > 32 gigs = 70 > > Also, Micron is saying that they're going to get into the business of > PCIe-attached flash, which will give us a second source for what > Fusion-io is shipping today. > > If you're on the "I like a real system disk" side of the > diskless/diskfull fence, these SSDs ought to be a lot more reliable > than tradtional disks. And I'd like to get rid of the mirrored > disks in our developer desktops... > Remember that OCZ does not equal Fusion-IO :) There are many factors that go into an SSD that determine performance. So the performance of OCZ is not nearly that of Fusion-IO's product. For example, I've been tracking some performance testing of a wide variety of SSD's and spinning disks in my day job. Some of the SSD's are fairly inexpensive, but the performance is pretty pathetic. For example, if your read/write mix includes more than about 10% writes, then the performance of the SSD's is worse than a spinning disk (this is in terms of IOPS). If you want to move up the food chain and buy some unbelievably fast SSD's you get can get the performance above spinning disks but the price is several orders of magnitude greater than spinning disks. Reliability is another question and I posted a quick response to this list in a different email. Jeff From gdjacobs at gmail.com Fri Dec 12 06:17:19 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <494263C7.6030009@att.net> References: <20081211213857.GA7355@bx9> <494263C7.6030009@att.net> Message-ID: <4942726F.1050705@gmail.com> Jeff Layton wrote: > Greg Lindahl wrote: >> I was recently surprised to learn that SSD prices are down in the >> $2-$3 per gbyte range. I did a survey of one brand (OCZ) at NexTag >> and it was: >> >> 256 gigs = $700 >> 128 gigs = 300 >> 64 gigs = 180 >> 32 gigs = 70 >> >> Also, Micron is saying that they're going to get into the business of >> PCIe-attached flash, which will give us a second source for what >> Fusion-io is shipping today. >> >> If you're on the "I like a real system disk" side of the >> diskless/diskfull fence, these SSDs ought to be a lot more reliable >> than tradtional disks. And I'd like to get rid of the mirrored >> disks in our developer desktops... >> > > Remember that OCZ does not equal Fusion-IO :) There are many > factors that go into an SSD that determine performance. So the > performance of OCZ is not nearly that of Fusion-IO's product. > > For example, I've been tracking some performance testing of a > wide variety of SSD's and spinning disks in my day job. Some of > the SSD's are fairly inexpensive, but the performance is pretty > pathetic. For example, if your read/write mix includes more than > about 10% writes, then the performance of the SSD's is worse > than a spinning disk (this is in terms of IOPS). > > If you want to move up the food chain and buy some unbelievably > fast SSD's you get can get the performance above spinning disks > but the price is several orders of magnitude greater than spinning > disks. Yeah, there's a few vendors out there selling battery backed dram solutions. Basically maxing out the interface, but stupidly expensive. >From the benches I've seen, though, it could be a useful accelerator for workloads akin to databases. > Reliability is another question and I posted a quick response to > this list in a different email. This being my big concern with flash. > > Jeff -- Geoffrey D. Jacobs From hearnsj at googlemail.com Fri Dec 12 06:45:56 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer - OpenCL In-Reply-To: <494252B4.7040106@cc.in2p3.fr> References: <20081212075655.GK11544@leitl.org> <49424594.1010709@univ-lemans.fr> <494252B4.7040106@cc.in2p3.fr> Message-ID: <9f8092cc0812120645x77e3257cq7f477e963bdbf41b@mail.gmail.com> 2008/12/12 Loic Tortay > > If I'm not mistaken, the "Tsubame" cluster was initially using > Clearspeed accelerators (in Sun X4600 "fat" nodes). > > Therefore, they probably have appropriate programs that need little > adaptation (or less than many) to work on the GPUs. > > Emmmm.... I'm no expert on Clearspeed, but AFAIK Clearspeeds selling point is that the cards run standard maths library functions - ie. you just 'drop in' a compatible maths library and the card gets given the computations to do. This is not the same model as GPU programming. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081212/86c0ea14/attachment.html From laytonjb at att.net Fri Dec 12 07:33:11 2008 From: laytonjb at att.net (Jeff Layton) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer - OpenCL In-Reply-To: <9f8092cc0812120645x77e3257cq7f477e963bdbf41b@mail.gmail.com> References: <20081212075655.GK11544@leitl.org> <49424594.1010709@univ-lemans.fr> <494252B4.7040106@cc.in2p3.fr> <9f8092cc0812120645x77e3257cq7f477e963bdbf41b@mail.gmail.com> Message-ID: <49428437.5050704@att.net> John Hearns wrote: > > > 2008/12/12 Loic Tortay > > > > If I'm not mistaken, the "Tsubame" cluster was initially using > Clearspeed accelerators (in Sun X4600 "fat" nodes). > > Therefore, they probably have appropriate programs that need little > adaptation (or less than many) to work on the GPUs. > > Emmmm.... I'm no expert on Clearspeed, but AFAIK Clearspeeds selling > point is that the cards run standard maths library functions - ie. you > just 'drop in' a compatible maths library and the card gets given the > computations to do. > This is not the same model as GPU programming. Yes and No. There are libraries for CUDA for BLAS and FFT's. Clearspeed has this as well. Jeff From laytonjb at att.net Fri Dec 12 07:41:02 2008 From: laytonjb at att.net (Jeff Layton) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <4942726F.1050705@gmail.com> References: <20081211213857.GA7355@bx9> <494263C7.6030009@att.net> <4942726F.1050705@gmail.com> Message-ID: <4942860E.70300@att.net> Geoff Jacobs wrote: > Jeff Layton wrote: > >> Remember that OCZ does not equal Fusion-IO :) There are many >> factors that go into an SSD that determine performance. So the >> performance of OCZ is not nearly that of Fusion-IO's product. >> >> For example, I've been tracking some performance testing of a >> wide variety of SSD's and spinning disks in my day job. Some of >> the SSD's are fairly inexpensive, but the performance is pretty >> pathetic. For example, if your read/write mix includes more than >> about 10% writes, then the performance of the SSD's is worse >> than a spinning disk (this is in terms of IOPS). >> >> If you want to move up the food chain and buy some unbelievably >> fast SSD's you get can get the performance above spinning disks >> but the price is several orders of magnitude greater than spinning >> disks. >> > > Yeah, there's a few vendors out there selling battery backed dram > solutions. Basically maxing out the interface, but stupidly expensive. > >From the benches I've seen, though, it could be a useful accelerator for > workloads akin to databases. > It gets more involved than just adding dram. The controllers can have a huge impact on performance. You will find some high-end drives that have great NAND's but really crappy controllers. This limits performance (I don't know if I've seen any with good controllers and bad NAND's though, but I think there are some out there). Then you have some amazing drives with great controllers and great NAND's - but you will pay dearly for them :) And as others have pointed out, the details of the firmware can also have a big impact on performance. Also, the interface can impact performance as well. Jeff From laytonjb at att.net Fri Dec 12 07:59:29 2008 From: laytonjb at att.net (Jeff Layton) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices - q: how many writes/erases??? In-Reply-To: <20081212153526.GA31871@anuurn.compact> References: <20081211213857.GA7355@bx9> <494263C7.6030009@att.net> <4942726F.1050705@gmail.com> <20081212153526.GA31871@anuurn.compact> Message-ID: <49428A61.3070208@att.net> Peter Jakobi wrote: > On Fri, Dec 12, 2008 at 08:17:19AM -0600, Geoff Jacobs wrote: > Rehi, > >>> Reliability is another question and I posted a quick response to >>> this list in a different email. >>> >> This being my big concern with flash. >> > > related is this topic on SSD / flashes: > > what's the life time when changing the same file frequently? > aka "mapping block writes to cell erases" > aka "how many erases are possible?" > This is somewhat a complicated question. It depends upon a few factors if you are looking at things from the perspective of the drive. In general the cells have a re-write limit that is a function of what kind of cell it is. I don't remember exact numbers, but I think MLC's are something like 10,000 rewrites and SLC's have like 100,000 rewrites. But the wear-leveling algorithms do a reasonable job of moving data to different cells rather than rewrite. This "levels" out the number of rewrites to the cells. What some people are doing to also help SSD's is to reserve a portion of the drive as "backups" for cells that have reached their limit. For example, you take a 64GB drive and make it appear as a 50GB drive. Then the extra 14GB is used by the drive to replace bad cells when needed (think of it as the SSD approach that SATA drives have with spare blocks that are used by the drive). While you lose space on the drive, overall the drive can last longer because of the spare cells. I think this is a good idea for MLC in particular because of the low rewrite limit. There should be some stuff floating around the web on the topic of SSD's. Just treat some of the more "popular" stuff from sites like Tom's Hardware, etc. with skepticism. :) Jeff From brahmaforces at gmail.com Thu Dec 11 02:04:38 2008 From: brahmaforces at gmail.com (arjuna) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: I am in NewDelhi India. However I would prefer to put the cluster together myself, because 1) I am a good python programmer and like programming and playing with computers 2) I will be using the cluster for animation (art + computers) and may have to bend it and tinker with it...therefore it makes sense for me to know it inside out. 3) If I set it up then I can grow it, and i envision it growing, outsourcing the whole thing would be expensive 4) I have been using linux for several years and am comfortable in the environment 5) I have a bunch of old computers lying about which are not so old and run basic versions of linux fast. What is 1u? What is a blade system? I would be putting it in a room with air conditioning. At this time I am trying to figure out the racks. Am meeting the hardware guy on Saturday and we were thinking of opening up the PCS i have lying around and taking measurements of how the mother boards fit into the cases,with the intention of creating a rack from scratch. Any ideas of what goes into a good rack in terms of size and matieral (assuming it has to be insulated) Also again, what might be some upto date books on the subject and any experiences regarding the actual creation of the rack and the physical hardware. I am starting with 3 nodes to be expanded to n nodes....The 3 nodes will allow me to keep complexity down while learning and then i can expand to n nodes once i have it down to increase speed. Am planning to run animation software (like blender) on it. Since animation software requires large processing power i am assuming they have already worked on parrallelizing the code... Anyone using clusters for animation on this list? > > Two pieces of advice > > a) let us know where you are physically. Talk to a clustering company in > your country, or area. > You will be surprised - they will put the whole thing together for you as a > 'turnkey' cluster AND what's more important support it. OK, you don't get > the learning experience which you are after. > > b) if this thing is to sit in your office, think about noise, cooling and > how many amps you can draw from a wall socket. > 1U servers have lots of little high speed fans and the noise gets very, > very annoying. > Think of putting this thing in a separate room, with some air conditioning. > Even a small room with a portable wheeled unit, venting to the outside may > be adequate for you. > > > Have you thought about a blade system for your particular situation? Might > be the ideal solution. > > > > > -- Best regards, arjuna http://www.brahmaforces.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/cd4dbefc/attachment.html From drcoolsanta at gmail.com Thu Dec 11 04:17:04 2008 From: drcoolsanta at gmail.com (Dr Cool Santa) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Parallel software for chemists In-Reply-To: <9f8092cc0812110133q269f58f0o2685ec95a590b727@mail.gmail.com> References: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> <9f8092cc0812110133q269f58f0o2685ec95a590b727@mail.gmail.com> Message-ID: <86b56470812110417p3cf97018na6676ed44213f328@mail.gmail.com> Thanks, seems like a good website. Actually it is my mother who is the chemist. On Thu, Dec 11, 2008 at 3:03 PM, John Hearns wrote: > > > 2008/12/10 Dr Cool Santa > >> Currently in the lab we use Schrodinger and we are looking into NWchem. >> We'd be interested in knowing about software that a chemist could use that >> makes use of a parallel supercomputer. And better if it is linux. >> >> Its probably worth it for you to join the Computational Chemistry list: > > http://www.ccl.net/ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/575e3450/attachment.html From alsimao at gmail.com Thu Dec 11 06:54:03 2008 From: alsimao at gmail.com (Alcides Simao) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] Re: Beowulf Digest, Vol 58, Issue 28 In-Reply-To: <200812111416.mBBEFmHa001826@bluewest.scyld.com> References: <200812111416.mBBEFmHa001826@bluewest.scyld.com> Message-ID: <7be8c36b0812110654h532fdef4ta8c4259c0922a934@mail.gmail.com> Hello 'wulfers! Any news on GPCPU stuff from ATI? Best, Alcides -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081211/f555763e/attachment.html From spambox at emboss.co.nz Thu Dec 11 16:56:13 2008 From: spambox at emboss.co.nz (Michael Brown) Date: Thu Mar 18 01:08:10 2010 Subject: [Beowulf] SSD prices In-Reply-To: <20081211213857.GA7355@bx9> References: <20081211213857.GA7355@bx9> Message-ID: <95046224B0E6475293D97FF9B5819BC4@Forethought> Greg Lindahl wrote: >I was recently surprised to learn that SSD prices are down in the > $2-$3 per gbyte range. I did a survey of one brand (OCZ) at NexTag > and it was: > > 256 gigs = $700 > 128 gigs = 300 > 64 gigs = 180 > 32 gigs = 70 Alas, these drives have lousy random write performance. As in 4 IOps lousy. Read speed is pretty good, but since it appears to take 250 ms for an erase + write cycle on the flash (during which other reads are blocked as well), it's got really rather limited usefulness. People have reported that Vista won't install on the drives, due to timeouts. This is also why the prices are so low - they're basically dumping them to get rid of them. Note that OCZ aren't alone in this issue - all of the "low cost" SSDs have the same issue since they're all just rebadges of the same OEM drive. For good performance, you're AFAIK limited to the Intel X25's and similar. The 80 GB X25-M hits you for $528 according to NexTag, other good 64 GB SSDs are around the $450 - $500 mark, depending on the drive (I can't get NexTag to list them, it only shows a very high price for the MTRON 64 GB drive). They're still not a whole lot faster than spinning rust once you start to have some randomness in your writing. Reliability should be fine in laptops, though I'd be less keen to deploy a rack full of them - they're a lot more sensitive to electrical noise than traditional HDDs when both reading and writing, so their reliability in these situations depends on how good the ADC and DAC converters are in the chips, and how much space they burn for ECC. The fact that the manufacturers don't spec the uncorrected/miscorrected error rate under any circumstances makes me a tad worried. Also, the lifespan question is still unanswered - any particular page of MLC flash is still limited to about 10000 writes, so you've got to hope that your workload doesn't tickle the wear levelling algorithm the wrong way. Cheers, Michael From jakobi at acm.org Fri Dec 12 07:35:26 2008 From: jakobi at acm.org (Peter Jakobi) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] SSD prices - q: how many writes/erases??? In-Reply-To: <4942726F.1050705@gmail.com> References: <20081211213857.GA7355@bx9> <494263C7.6030009@att.net> <4942726F.1050705@gmail.com> Message-ID: <20081212153526.GA31871@anuurn.compact> On Fri, Dec 12, 2008 at 08:17:19AM -0600, Geoff Jacobs wrote: Rehi, > > Reliability is another question and I posted a quick response to > > this list in a different email. > > This being my big concern with flash. related is this topic on SSD / flashes: what's the life time when changing the same file frequently? aka "mapping block writes to cell erases" aka "how many erases are possible?" In the days of yore, that was the limitation on using flash, as writing a block to the same physical location on the flash (for some to be defined sense of physical location :)) requires a whole slew of blocks (let's call it a 'cell', maybe containing a few dozens or thousands of blocks?) to be erased and a subset of them to be written. Does anyone have current and uptodate info or researched this issue already? if so thanx!! Peter === Some of the questions I see before checking recent kernel sources would be: - is there some remapping in the hardware of the ide emulation chip space of say compactflash or usb sticks? - is part of this possible in the ide-emulation in the kernel? - or is part of this in the filesystem, that is suddenly after a decade or more, the fs has to cope again with frequent bad blocks, like the old bad blocks lists of the SCSI days 2 decades past? [basically: is there some 'newish' balancing to limit / redistribute the number of erases over all cells? Is there a way to relocate cells that resist erasing, ...?] - can I place a filesystem containing some files that are always rewritten on flash and use say ordinary ext2 or vfat for this? - might I even be able to SWAP on flash nowadays? - Or do I still have to do voodoo with FUSE overlays or other tricks to reduce the number of writes leading to cell erases? Maybe check if there's a real log-structured filesystem available, that has seen production use outside of labs (and doesn't fail by keeping its some of its frequently changing metadata in always in the same location). -- cu Peter jakobi@acm.org From james.p.lux at jpl.nasa.gov Fri Dec 12 08:58:50 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: ________________________________ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of arjuna Sent: Thursday, December 11, 2008 2:05 AM To: beowulf@beowulf.org Subject: Re: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware I am in NewDelhi India. However I would prefer to put the cluster together myself, because 1) I am a good python programmer and like programming and playing with computers 2) I will be using the cluster for animation (art + computers) and may have to bend it and tinker with it...therefore it makes sense for me to know it inside out. 3) If I set it up then I can grow it, and i envision it growing, outsourcing the whole thing would be expensive 4) I have been using linux for several years and am comfortable in the environment 5) I have a bunch of old computers lying about which are not so old and run basic versions of linux fast. All decent reasons to put together a cluster. > What is 1u? Standard rack mount systems (with 19" wide RETMA/EIA panels) have certain vertical heights for each component (as well as standard hole patterns). 1 U = 1 Unit = 1 7/8" (4U = 7", 2U= 3.5") As a practical matter, 1 U high systems are quite tight inside, and tough to cool, because the fans can only be an inch or so high (maybe 40mm) and to move any amount of air, they have to spin fast. Fast small fan = low efficiency, lots of noise.. >What is a blade system? Where there's a "card cage" into which one slides cards (or "blades") which are a whole PC. They all share a common power supply and they're denser because you don't have extra sheetmetal between PCs. OTOH, denser means more heat in a small volume, which aggravates the cooling problem. Why "blades".. -> it sounds cooler (no, really... It's because of marketing. Cards in a card cage is so 1950s.. Why, PDP-8s and IBM 1401s use cards in a card cage..) >I would be putting it in a room with air conditioning. There's "air conditioning" and "AIR CONDITIONING".. Throw a few kilowatts worth of computers in a room, and you'll find out which one you have. >At this time I am trying to figure out the racks. Am meeting the hardware guy on Saturday and we were thinking of opening up the PCS i have lying around and taking measurements of how the mother boards fit into the cases,with the intention of creating a rack from scratch. Any ideas of what goes into a good rack in terms of size and matieral (assuming it has to be insulated) My favorite "field expedient" scheme is to use half or full size aluminum baking sheets with raised edges (aka jelly roll pans or sheet pans) and double stick foam tape. You can slide them into a standard baker's rack. All readily available or improvised, although I'd look for the real rack (they're cheap and sturdy). The pans can just be sheets of aluminum sheared to size, if you like. Search the archives of the list for some website addresses of kitchen supply places where you can see a picture of this kind of thing. There is CERTAINLY some local source where you live for this stuff (ask at any commercial kitchen or bakery) > Also again, what might be some upto date books on the subject and any experiences regarding the actual creation of the rack and the physical hardware. Catalogs are your friend, as far as packaging goes. >I am starting with 3 nodes to be expanded to n nodes....The 3 nodes will allow me to keep complexity down while learning and then i can expand to n nodes once i have it down to increase speed. Good luck.. At 3 nodes, you can just throw the PCs under the table and hook em up with a ethernet switch. But, if you find a pallet load of surplus PCs, and want to repackage them a bit more densely, then the cookie sheet approach is easy. From hearnsj at googlemail.com Fri Dec 12 09:13:54 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: <9f8092cc0812120913g573f511fq2b5f63282517d96e@mail.gmail.com> 2008/12/11 arjuna > > What is 1u? > Easy question! a "U" is short for a "rack unit". Rack mounted equipment always comes in multiples of a vertical height unit, which is 1.75 inches. I gather this is actually an old Russian unit of measurement (you can check on Wikipedia). So when you put equipment into standard 19inch wide racks, you ask "how many U high is that equipment). It means that you can mix and match different types of equipment in the same rack. Specifically for this discussion, a 1U computer (server) is a server which takes up 1U of vertical space. They are generally very deep to compensate for the lack of space in height, and use lots of small, fast fans rather than the big one you have in a desktop. Air comes in through slots in the front, past the disk drives, over the motherboard and out of the rear. These systems generally pack more compute power per piece of floorspace, and help make the cabling neater as everything is in a standard place to the rear of the rack. > > What is a blade system? > This is where servers are packaged into standard units, generally a bit smaller than the 1U servers above. The blades plug into a chassis, which in turn is mounted in the rack. The chassis provides power to each blade, plus networking connections across a "backplane" In the case of the 1U servers you generally have to connect mains power to each one, and run separate cables for ethernet / Myrinet / Infiniband. This cabling is all wrapped up inside the chassis. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081212/d49e4b61/attachment.html From james.p.lux at jpl.nasa.gov Fri Dec 12 09:15:22 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] SSD prices In-Reply-To: <95046224B0E6475293D97FF9B5819BC4@Forethought> References: <20081211213857.GA7355@bx9> <95046224B0E6475293D97FF9B5819BC4@Forethought> Message-ID: > Reliability should be fine in laptops, though I'd be less > keen to deploy a rack full of them - they're a lot more > sensitive to electrical noise than traditional HDDs when both > reading and writing, so their reliability in these situations > depends on how good the ADC and DAC converters are in the > chips, and how much space they burn for ECC. The fact that > the manufacturers don't spec the uncorrected/miscorrected > error rate under any circumstances makes me a tad worried. That's not a specification that is easily "measureable" or tested, so it doesn't get published. Heck, I'd be happy to run across a "flash memory error simulator" to test our EDAC implementations here. Sure, you can cobble a "flash emulator" up in a FPGA, but who's to say if it's realistic. From hearnsj at googlemail.com Fri Dec 12 09:35:34 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <200812091535.03920.kilian.cavalotti.work@gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: <9f8092cc0812120935t7a001409oe77ec688965d1cae@mail.gmail.com> I was down in our server room with the ICE this afternoon. Its worth describing how they are put together for the purposes of this thread. Each rack has four blade chassis in it. These are called Independent Rack Units in SGI speak. An IRU has sixteen compute blades, plus the mains PSUs and Infiniband blades. Each IRU has an L1 chassis controller. At the rear of each IRU there is a bank of big fans. Each IRU couples up to a 1/4 sized rear rack door, using a foam gasket. Each of these 1/4 sized doors is a swing out heat exchanger. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081212/b4ff9671/attachment.html From prentice at ias.edu Fri Dec 12 09:42:02 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Can't unload OpenIB kernel modules during a reboot Message-ID: <4942A26A.2080507@ias.edu> When I reboot the nodes in my cluster, the openibd scripts hangs when shutting down. If I wait long enough(5-10 minutes, probably closer to 10), it eventually completes, or at least fails so the system can continue shutting down. If I do 'service openibd stop' before doing the reboot, the openibd script does it's thing in only a few seconds, as expected: /etc/init.d/openibd stop Unloading OpenIB kernel modules: [ OK ] I'm using a RHEL-rebuild distro (PU_IAS 5.2), the openibd script is part of the openib package that comes with the distro: rpm -qf /etc/init.d/openibd openib-1.3-3.el5 Any ideas why this script would behave differently during a shutdown? -- Prentice From lynesh at cardiff.ac.uk Fri Dec 12 10:05:00 2008 From: lynesh at cardiff.ac.uk (Huw Lynes) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <9f8092cc0812120935t7a001409oe77ec688965d1cae@mail.gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <9f8092cc0812120935t7a001409oe77ec688965d1cae@mail.gmail.com> Message-ID: <1229105100.5611.1.camel@desktop> On Fri, 2008-12-12 at 17:35 +0000, John Hearns wrote: > At the rear of each IRU there is a bank of big fans. Each IRU couples > up to a 1/4 sized rear rack door, using a foam gasket. Each of these > 1/4 sized doors is a swing out heat exchanger. That's the bit of information I was missing. I'd assumed the entire door swung out as one losing all cooling when you work on the rack. The stable-door approach makes more sense. I still like our APC contained hot-aisle system though. Cheers, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From landman at scalableinformatics.com Fri Dec 12 10:06:45 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Can't unload OpenIB kernel modules during a reboot In-Reply-To: <4942A26A.2080507@ias.edu> References: <4942A26A.2080507@ias.edu> Message-ID: <4942A835.3060307@scalableinformatics.com> Prentice Bisbal wrote: > > Any ideas why this script would behave differently during a shutdown? Hi Prentice Sounds like a race situation. Do you have an NFS mount over IPoIB? Joe > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From hearnsj at googlemail.com Fri Dec 12 10:21:09 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <1229105100.5611.1.camel@desktop> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <9f8092cc0812120935t7a001409oe77ec688965d1cae@mail.gmail.com> <1229105100.5611.1.camel@desktop> Message-ID: <9f8092cc0812121021x4cd4a5b7y354007afd897571d@mail.gmail.com> 2008/12/12 Huw Lynes > > > That's the bit of information I was missing. I'd assumed the entire door > swung out as one losing all cooling when you work on the rack. The > stable-door approach makes more sense. > > I still like our APC contained hot-aisle system though. > > Horses for course, Huw. (*) SGI did an install in Ireland where they have the IRU chassis mounted vertically, in those same APC racks. Seemingly it works quite well - the drawback is that you get three IRUs per rack rather than four. But I guess with the APC racks being narrower you do not lose out that much as you get more racks per aisle. I must measure this up actually. (*) Come on. Its Friday. We have a "race condition" in another thread. Let the horsey puns flow. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081212/79e25238/attachment.html From kus at free.net Fri Dec 12 10:28:19 2008 From: kus at free.net (Mikhail Kuzminsky) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Parallel software for chemists In-Reply-To: <86b56470812100551g277917dag95b93d2dbcaf346@mail.gmail.com> Message-ID: In message from "Dr Cool Santa" (Wed, 10 Dec 2008 19:21:43 +0530): >Currently in the lab we use Schrodinger and we are looking into >NWchem. We'd >be interested in knowing about software that a chemist could use that >makes >use of a parallel supercomputer. And better if it is linux. To say shortly, practically all the modern software for molecular modelling calculations can run "in parallel" on Linux clusters. Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry RAS Moscow > >-- >This message has been scanned for viruses and >dangerous content by MailScanner, and is >believed to be clean. > From james.p.lux at jpl.nasa.gov Fri Dec 12 10:35:33 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:11 2010 Subject: 1U for racks? RE: [Beowulf] Newbie Question: Racks... In-Reply-To: <9f8092cc0812120913g573f511fq2b5f63282517d96e@mail.gmail.com> References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> <9f8092cc0812120913g573f511fq2b5f63282517d96e@mail.gmail.com> Message-ID: From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of John Hearns 2008/12/11 arjuna What is 1u? Easy question! a "U" is short for a "rack unit". Rack mounted equipment always comes in multiples of a vertical height unit, which is 1.75 inches. I gather this is actually an old Russian unit of measurement (you can check on Wikipedia). So when you put equipment into standard 19inch wide racks, you ask "how many U high is that equipment). It means that you can mix and match different types of equipment in the same rack. -- The Wikipedia entry just says it's coincidence: "Coincidentally, a rack unit is equal to a vershok, an obsolete Russian length unit." I used to think that the rack hole spacing is almost certainly from Western Electric or something like this (back when they were called "relay racks"). The rack dimensions are an old RETMA standard ("Radio Electron Television Manufacturing Association) (RETMA changed to EIA in the late 50s) which I'm pretty sure derives from some older standard, which in turn probably derives from an early telegraphy standard, so it, could, like the stories about railroad gauge, be derived from the dimensions of Roman donkeys. A usenet post from Larry Lippman in 1990 gives what sounds like a fairly authoritative description: Ma Bell -> 23", 2" spacing (starting in 1917 or thereabouts) Other older -> 19", 1.75" spacing (he thought RCA, perhaps, in the early 20s) The web is a wonderful place to waste time.. Now I can amaze folks at work with a truly arcane piece of information. From lindahl at pbm.com Fri Dec 12 10:52:41 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Re: Rear-door heat exchangers and condensation In-Reply-To: <7493914765164F50BFEB0183BBA5995C@MichaelPC> References: <20081211231452.GC29359@bx9> <7493914765164F50BFEB0183BBA5995C@MichaelPC> Message-ID: <20081212185241.GA11559@bx9> On Thu, Dec 11, 2008 at 10:09:10PM -0800, Michael Huntingdon wrote: > Today it really is not just about cooling a group of 1u systems or a single > hptc cabinet. Today maybe you know you need two densely populated cabinets. Like most people, I can't use very dense systems, due to the power/cubic foot limitation of my colo. I just stack up 2U systems, and it's basically idiot-proof. -- greg From prentice at ias.edu Fri Dec 12 11:32:37 2008 From: prentice at ias.edu (Prentice Bisbal) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Can't unload OpenIB kernel modules during a reboot In-Reply-To: <4942A835.3060307@scalableinformatics.com> References: <4942A26A.2080507@ias.edu> <4942A835.3060307@scalableinformatics.com> Message-ID: <4942BC55.8010501@ias.edu> Joe Landman wrote: > Prentice Bisbal wrote: > >> >> Any ideas why this script would behave differently during a shutdown? > > Hi Prentice > > > Sounds like a race situation. Do you have an NFS mount over IPoIB? > > Joe I do have NFS mounts, but *NOT* through IPoIB. At least they *shouldn't* be. I don't think that's the problem. If I had NFS mounts over IB, I should get errors when I shutdown IB by itself with 'service openibd stop' command. I have no need for IPoIB at the moment. Is they're any way to confirm it's not being used, or explicitly disable it. -- Prentice From mathog at caltech.edu Fri Dec 12 11:38:56 2008 From: mathog at caltech.edu (David Mathog) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Re: cloning issue, hidden module dependency Message-ID: Bogdan Costescu wrote: > On Thu, 11 Dec 2008, David Mathog wrote: > > > install usb-interface /sbin/modprobe uhci_hcd; /bin/true > > install ide-controller /sbin/modprobe via82cxxx; /sbin/modprobe > > ide_generic; /bin/true > > I don't know why you need these. On all the distributions that I've > worked with recently such issues are taken care of by running > 'depmod', the resulting files are taken into consideration when > running 'mkinitrd'. The kernel doesn't have either the via82cxxx or USB_UHCI_HCD modules built into it. These lines apparently tell mkinitrd to include the needed modules in the initrd, and also to add some corresponding modprobe lines in the init file which it contains. > > > alias pci:v000010ECd00008139sv000010ECsd00008139bc02sc00i00 8139too > > And why do you need this ? Didn't the module detect this hardware ? This one is mysterious to me too. Those sorts of lines only appeared with Mandriva 2008.1 (kernel 2.6.24.7). The 8139too is built as a module, but it isn't included in the initrd, so that isn't it. All I can tell you is that every Mandriva 2008.1 system I have installed created a similar line for its ethernet driver (whatever that happened to be), and when there were two such interfaces which were identical, there was only one such line. It seems to be related to this line in /lib/modules/*/modules.alias alias pci:v000010ECd00008139sv*sd*bc*sc*i* 8139too There are many other similar lines in modules.alias, differing in the leading numeric field, but only this one pattern matches with the one from modprobe.conf. Why this one and not the others - I have no idea. Regards, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From rgb at phy.duke.edu Fri Dec 12 11:51:07 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: On Thu, 11 Dec 2008, arjuna wrote: > I am in NewDelhi India. However I would prefer to put the cluster together > myself, because Ya, that's where I lived for seven years growing up. > 1) I am a good python programmer and like programming and playing with > computers > 2) I will be using the cluster for animation (art + computers) and may have > to bend it and tinker with it...therefore it makes sense for me to know it > inside out. > 3) If I set it up then I can grow it, and i envision it growing, outsourcing > the whole thing would be expensive > 4) I have been using linux for several years and am comfortable in the > environment > 5) I have a bunch of old computers lying about which are not so old and run > basic versions of linux fast. All excellent and traditional reasons, although you'll want to learn a compiler, either C, C++ or Fortran. Which one is most appropriate depends a little bit on the application space you want to work in, a little bit on your personality. None are terribly like python. > What is 1u? A rack comes in "U"nits of height with prespecified/standard layouts for screws and so on. A "1U" rack chassis occupies 1.75" of vertical rack space in a rack that is typically anywhere from 20U to 42U in height (the latter is basically the height of a person; MUCH higher and you start having difficult working on the upper slots and can interfere with the ceiling or overhead cable trays, etc). See: http://www.webopedia.com/TERM/1/1U.html While we're on this question, remember Google Is Your Friend (GIYF). These days, so are the various online references, especially e.g. wikipedia. So while we're always HAPPY to answer questions, you should (as a good student:-) always try to answer them yourself first, especially the easy ones. > What is a blade system? Here's another example. I google a second, pick the wikipedia article and: http://en.wikipedia.org/wiki/Blade_server complete with pictures. The Google return also has dozens of links to Tier 1 builders of bladed systems and vendors that resell them. From the latter you can actually look over specific blade servers and maybe even get prices. > I would be putting it in a room with air conditioning. Sure, but ESPECIALLY in New Delhi, with post-monsoon summertime temperatures in the 40C range outdoors (and with monsoon humidity AND heat before that) you will need to take special care with your environment. The AC will needed to dehumidify and keep the room cooled down to (ideally) 20C or lower all year long, summer and winter. To help you estimate the cooling capacity: AC is usually sold in "tons". 1 ton of AC can remove 3500 joules of heat per second (3500 watts). It needs some of this capacity to maintain a temperature DIFFERENTIAL between inside and outside; a 20+ C differential will use an easy 10% of the capacity, maybe more. So you can look at your AC unit and figure out how many systems you can put into the space before it starts to get too warm -- most systems draw between 100 and 300 watts loaded (sorry about the large variation, but there is everything from single core UP to dual quad core out there with lots of combinations of memory and accessory hardware). If you have a half-ton of AC (say), your body and the electric lights are probably 200W, heat infiltration through the walls another 100W or more depending on where the room is, so you can run as many as 10 systems or as few as three, depending. Note that you'll pay for energy twice -- once for the power coming in, again for the power used by the AC to remove it. Oh, and New Delhi has one other unique-ish environmental constraint, unless things have changed a lot since I lived there. Post-monsoon, when it dries out again you have dust storms. I don't think most list members can really imagine them, but I can (I used to climb a tree outside of our house and feel the dust stinging my cheeks and erasing the buildings all from sight). You will need to be able to keep the dust that infiltrates EVERYWHERE in the houses at that time out of the computer room, as computers (especially the cooling fans) don't like dust. After a big one, you may need to shut down and vaccuum out the insides of your systems. > At this time I am trying to figure out the racks. Am meeting the hardware > guy on Saturday and we were thinking of opening up the PCS i have lying > around and taking measurements of how the mother boards fit into the > cases,with the intention of creating a rack from scratch. Any ideas of what > goes into a good rack in terms of size and matieral (assuming it has to be > insulated) Let's talk terminology. What you are calling "a rack" we call "shelving". A rack is the thing described in the article up above -- a completely standardized computer/telecom equipment holding arrangement. When somebody talks about "rackmount equipment" they refer to stuff boxed up to "slide into a rack" -- made a precise size and with screws and/or rails in just the right places to accomplish this. What you're talking about is a form of interesting homebrew cluster, I think. Periodically people talk about this sort of racking up of motherboards in a homemade (cheap but still effective) way on list. Search back through the archives and you'll find some great discussions. "Recipes" that I can recall include: a) Mounting motherboards on cookie sheets and using a baking rack for a cluster. b) Mounting motherboards on cookie sheets and using heavy duty steel shelving with wooden shelves to make a sort of "vertically bladed" cluster, sliding the cookie sheets in and out of slots cut into the wood. c) Clusters built into standard file cabinets. and several others. In the discussions were some suggestions concerning safety (fire and otherwise) and electomagnetic isolation and noise. Links to pictures, as well, let's see: http://www.beowulf.org/archive/2006-March/015209.html and as you can see, nearly everything is in the beowulf archives somewhere if you search for it cleverly. I think Andrew is still around and may be listening in case the links have been moved in the meantime. > Also again, what might be some upto date books on the subject and any > experiences regarding the actual creation of the rack and the physical > hardware. People don't build racks. People buy racks. However, if you have a machine shop and access to steel and know how to bend and tap it and weld it, you could probably, from the link up above and perhaps some more stuff like it gleaned from the web, build a simple four poster that would "work" to hold standard rackmount chassis. Heck, even building rackmount cases has been discussed on list. Sheet aluminum or steel, cut to spec, fold and weld, and so on. Here it isn't worth the time any more -- rackmount boxes and racks aren't THAT expensive compared to the time needed to DIY -- but I suppose it is possible. > I am starting with 3 nodes to be expanded to n nodes....The 3 nodes will > allow me to keep complexity down while learning and then i can expand to n > nodes once i have it down to increase speed. Sure. Good plan. Get yourself an 8 port (or better) gigabit ethernet switch to use as your first network, too. > Am planning to run animation software (like blender) on it. Since animation > software requires large processing power i am assuming they have already > worked on parrallelizing the code... Assume nothing, unfortunately. However, even if they haven't, if you can partition up the tasks and just run it N times in a batch mode on N systems, that's pretty good parallel speed up right there, and likely doable for a task that is basically embarrassingly parallel. > Anyone using clusters for animation on this list? Don't know. I doubt it. Not QUITE HPC, although I do know physicists who have e.g. animated simulations and so on on clusters. However, the animation itself wasn't done in parallel, only the generation of data to animate. rgb > > ? > ? > Two pieces of advice > ? > a) let us know where you are physically. Talk to a clustering company > in your country, or area. > You will be surprised - they will put the whole thing together for you > as a 'turnkey' cluster AND what's more important support it. OK, you > don't get the learning experience which you are after. > ? > b) if this thing is to sit in your office, think about noise, cooling > and how many amps you can draw from a wall socket. > 1U servers have lots of little high speed fans and the noise gets > very, very annoying. > Think of putting this thing in a separate room, with some air > conditioning. Even a small room with a portable wheeled unit, venting > to the outside may be adequate for you. > ? > ? > Have you thought about a blade system for your particular situation? > Might be the ideal solution. > ? > ? > ? > ? > > > > > -- > Best regards, > arjuna > http://www.brahmaforces.com > > Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From i.n.kozin at googlemail.com Fri Dec 12 11:58:58 2008 From: i.n.kozin at googlemail.com (Igor Kozin) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer In-Reply-To: <34568AAB-6A20-44B7-B80B-FA8BB92AC1F6@xs4all.nl> References: <20081212075655.GK11544@leitl.org> <34568AAB-6A20-44B7-B80B-FA8BB92AC1F6@xs4all.nl> Message-ID: 23.55 Mflops/W according to green500 estimates (#488 in thier list) 2008/12/12 Vincent Diepeveen > > On Dec 12, 2008, at 8:56 AM, Eugen Leitl wrote: > > >> http://www.goodgearguide.com.au/article/270416/inside_tsubame_- >> _nvidia_gpu_supercomputer?fp=&fpid=&pf=1 >> >> Inside Tsubame - the Nvidia GPU supercomputer >> >> Tokyo Tech University's Tsubame supercomputer attained 29th ranking in the >> new Top 500, thanks in part to hundreds of Nvidia Tesla graphics cards. >> >> Martyn Williams (IDG News Service) 10/12/2008 12:20:00 >> >> When you enter the computer room on the second floor of Tokyo Institute of >> Technology's computer building, you're not immediately struck by the size >> of >> Japan's second-fastest supercomputer. You can't see the Tsubame computer >> for >> the industrial air conditioning units that are standing in your way, but >> this >> in itself is telling. With more than 30,000 processing cores buzzing away, >> the machine consumes a megawatt of power and needs to be kept cool. >> >> > 1000000 watt / 77480 gflop = 12.9 watt per gflop. > > If you run double precision codes on this box it is a big energy waster > IMHO. > (of course it's very well equipped for all kind of crypto codes using that > google library). > > Vincent > > > Tsubame was ranked 29th-fastest supercomputer in the world in the latest >> Top >> 500 ranking with a speed of 77.48T Flops (floating point operations per >> second) on the industry-standard Linpack benchmark. >> >> While its position is relatively good, that's not what makes it so >> special. >> The interesting thing about Tsubame is that it doesn't rely on the raw >> processing power of CPUs (central processing units) alone to get its work >> done. Tsubame includes hundreds of graphics processors of the same type >> used >> in consumer PCs, working alongside CPUs in a mixed environment that some >> say >> is a model for future supercomputers serving disciplines like material >> chemistry. >> >> Graphics processors (GPUs) are very good at quickly performing the same >> computation on large amounts of data, so they can make short work of some >> problems in areas such as molecular dynamics, physics simulations and >> image >> processing. >> >> "I think in the vast majority of the interesting problems in the future, >> the >> problems that affect humanity where the impact comes from nature ... >> requires >> the ability to manipulate and compute on a very large data set," said >> Jen-Hsun Huang, CEO of Nvidia, who spoke at the university this week. >> Tsubame >> uses 680 of Nvidia's Tesla graphics cards. >> >> Just how much of a difference do the GPUs make? Takayuki Aoki, a professor >> of >> material chemistry at the university, said that simulations that used to >> take >> three months now take 10 hours on Tsubame. >> >> Tsubame itself - once you move past the air-conditioners - is split across >> several rooms in two floors of the building and is largely made up of >> rack-mounted Sun x4600 systems. There are 655 of these in all, each of >> which >> has 16 AMD Opteron CPU cores inside it, and Clearspeed CSX600 accelerator >> boards. >> >> The graphics chips are contained in 170 Nvidia Tesla S1070 rack-mount >> units >> that have been slotted in between the Sun systems. Each of the 1U Nvidia >> systems has four GPUs inside, each of which has 240 processing cores for a >> total of 960 cores per system. >> >> The Tesla systems were added to Tsubame over the course of about a week >> while >> the computer was operating. >> >> "People thought we were crazy," said Satoshi Matsuoka, director of the >> Global >> Scientific Information and Computing Center at the university. "This is a >> ?1 >> billion (US$11 million) supercomputer consuming a megawatt of power, but >> we >> proved technically that it was possible." >> >> The result is what university staff call version 1.2 of the Tsubame >> supercomputer. >> >> "I think we should have been able to achieve 85 [T Flops], but we ran out >> of >> time so it was 77 [T Flops]," said Matsuoka of the benchmarks performed on >> the system. At 85T Flops it would have risen a couple of places in the Top >> 500 and been ranked fastest in Japan. >> >> There's always next time: A new Top 500 list is due out in June 2009, and >> Tokyo Institute of Technology is also looking further ahead. >> >> "This is not the end of Tsubame, it's just the beginning of GPU >> acceleration >> becoming mainstream," said Matsuoka. "We believe that in the world there >> will >> be supercomputers registering several petaflops in the years to come, and >> we >> would like to follow suit." >> >> Tsubame 2.0, as he dubbed the next upgrade, should be here within the next >> two years and will boast a sustained performance of at least a petaflop (a >> petaflop is 1,000 teraflops), he said. The basic design for the machine is >> still not finalized but it will continue the heterogeneous computing base >> of >> mixing CPUs and GPUs, he said. >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081212/504a9591/attachment.html From niftyompi at niftyegg.com Fri Dec 12 13:24:47 2008 From: niftyompi at niftyegg.com (Nifty Tom Mitchell) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] For grins...India In-Reply-To: References: <493F7BF9.70306@aplpi.com> <0D49B15ACFDF2F46BF90B6E08C90048A064D922BD5@quadbrsex1.quadrics.com> <20081210213929.GB3449@compegg.wr.niftyegg.com> Message-ID: <20081212212447.GA3146@compegg.wr.niftyegg.com> On Thu, Dec 11, 2008 at 12:01:23AM +0100, Vincent Diepeveen wrote: > > What is most interesting from supercomputer viewpoint seen is the > comments i > got from some scientists when speaking about climate calculations. > > At a presentation at SARA at 11 september 2008 with some bobo's there > (minister bla bla),i > there was a few sheets from the North-Atlantic. > > It was done in rectangles from 40x40KM. Interesting... the north Atlantic, If the width of the straits on the other side of the Arctic is about 60 miles the straits might be represented by three data points in the model. Three data points to ask a global question... seems almost silly from my seat here in the peanut gallery. -- T o m M i t c h e l l Found me a new hat, now what? From csamuel at vpac.org Fri Dec 12 14:01:31 2008 From: csamuel at vpac.org (Chris Samuel) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Inside Tsubame - the Nvidia GPU supercomputer - OpenCL In-Reply-To: <9f8092cc0812120645x77e3257cq7f477e963bdbf41b@mail.gmail.com> Message-ID: <894439029.167191229119291931.JavaMail.root@mail.vpac.org> ----- "John Hearns" wrote: > Emmmm.... I'm no expert on Clearspeed, but AFAIK > Clearspeeds selling point is that the cards run > standard maths library functions - ie. you just > 'drop in' a compatible maths library and the card > gets given the computations to do. AMD are working on a version of their Core Maths Library (ACML) that can offload to a compatible ATI GPU if it's installed. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From hahn at mcmaster.ca Fri Dec 12 19:20:57 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: > What is 1u? rack-mounted hardware is measured in units called "units" ;) 1U means 1 rack unit: roughly 19" wide and 1.75" high. racks are all the same width, and rackmount unit consumes some number of units in height. (rack depth is moderately variable.) (a full rack is generally 42"). a 1U server is a basic cluster building block - pretty well suited, since it's not much taller than a disk, and fits a motherboard pretty nicely (clearance for dimms if designed properly, a couple optional cards, passive CPU heatsinks.) > What is a blade system? it is a computer design that emphasizes an enclosure and fastening mechanism that firmly locks buyers into a particular vendor's high-margin line ;) in theory, the idea is to factor a traditional server into separate components, such as shared power supply, unified management, and often some semi-integrated network/san infrastructure. one of the main original selling points was power management: that a blade enclosure would have fewer, more fully loaded, more efficnet PSUs. and/or more reliable. blades are often claimed to have superior managability. both of these factors are very, very arguable, since it's now routine for 1U servers to have nearly the same PSU efficiency, for instance. and in reality, simple managability interfaces like IPMI are far better (scalably scriptable) than a too-smart gui per enclosure, especially if you have 100 enclosures... > goes into a good rack in terms of size and matieral (assuming it has to be > insulated) ignoring proprietary crap, MB sizes are quite standardized. and since 10 million random computer shops put them together, they're incredibly forgiving when it comes to mounting, etc. I'd recommend just glue-gunning stuff into place, and not worring too much. > Anyone using clusters for animation on this list? not much, I think. this list is mainly "using commodity clusters to do stuff fairly reminiscent of traditional scientific supercomputing". animation is, in HPC terms, embarassingly parallel and often quite IO-intensive. both those are somewhat derogatory. all you need to do an animation farm is some storage, a network, nodes and probably a scheduler or at least task queue-er. From oneal at dbi.udel.edu Fri Dec 12 08:30:31 2008 From: oneal at dbi.udel.edu (Doug ONeal) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Re: Rear-door heat exchangers and condensation In-Reply-To: References: <200812091535.03920.kilian.cavalotti.work@gmail.com> Message-ID: On 12/11/2008 05:57 PM, Mark Hahn wrote: >> Netshelter VX racks and the powered ventilation rear doors are not >> sufficient any more. > > why do you have doors on your rack? normally, a rack is filled with > servers with fans that generate the standard front-to-back airflow. > that means that you want no doors, or at least only high-perf mesh ones. > The rear doors are apc air removal units with three 8" fans that vent the hot air out the top of the unit. It is possible to attach ducts to the units to vent the air out of the server room completely but my physical setup does not allow for that. There are no front doors on the racks. From vlad at geociencias.unam.mx Fri Dec 12 09:05:39 2008 From: vlad at geociencias.unam.mx (Vlad Manea) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] File server on ROCKS cluster Message-ID: <494299E3.8020603@geociencias.unam.mx> Hi, I need to add to my new ROCKS 5.1 cluster a fileserver, the /export partition of the first disk on the frontend might not be enough. First question: Is there any documentation on how rocks do this? Second: is out there anyone with experience on Dell MD3000(i) with rocks? I will probably buy one... Thanks, Vlad From dmitri.chubarov at gmail.com Fri Dec 12 09:47:23 2008 From: dmitri.chubarov at gmail.com (Dmitri Chubarov) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: Hello, my first reply missed the list by mistake so I will repeat a few points that I mentioned there. > What is 1u? > > What is a blade system? > Compute clusters are often built of rack-server hardware meaning boxes different from desktop boxes and chipset that have features not necessary for desktop PCs like ECC memory, redundant power supply units, integrated management processors, RAID controllers, that altogether provide better reliability, since the failure rate for a cluster of a 100 nodes is 100 times higher than for a single node. You may not need any of it for a 16 node rendering farm. Anyone using clusters for animation on this list? > We are just writing up on a research project on distributed rendering. Rendering is the part in the animation process that requires the most processing power. We used 3dStudio Max (3DS) for modelling and V-Ray for rendering. 3DS has its own utility, called Backburner, for distributing frames among a number of cluster nodes. We observed that V-Ray failed on some certain frames thus stopping the whole rendering queue, therefore the process was not completely automated. I would also repeat that a storage subsystem that uses an array of disks is essential for performance. > > At this time I am trying to figure out the racks. Am meeting the hardware > guy on Saturday and we were thinking of opening up the PCS i have lying > around and taking measurements of how the mother boards fit into the > cases,with the intention of creating a rack from scratch. Any ideas of what > goes into a good rack in terms of size and matieral (assuming it has to be > insulated) > This sort of rack is more of a research project. On the contrary, the usual kind of rack is an IEC standard server rack, http://en.wikipedia.org/wiki/19_inch_rack > Am planning to run animation software (like blender) on it. Since animation > software requires large processing power i am assuming they have already > worked on parrallelizing the code... > Blender does not seem to have a driver to distribute rendering (I might be wrong) but it can generate PovRay scripts and povray can make use of parallel processing in a number of ways. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081212/7d31f14d/attachment.html From william.a.sellers at nasa.gov Fri Dec 12 11:53:31 2008 From: william.a.sellers at nasa.gov (Bill Sellers) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <9f8092cc0812121021x4cd4a5b7y354007afd897571d@mail.gmail.com> References: <200812091535.03920.kilian.cavalotti.work@gmail.com><9f8092cc0812120935t7a001409oe77ec688965d1cae@mail.gmail.com><1229105100.5611.1.camel@desktop> <9f8092cc0812121021x4cd4a5b7y354007afd897571d@mail.gmail.com> Message-ID: <4942C13B.8010309@nasa.gov> John Hearns wrote: > > > 2008/12/12 Huw Lynes > > > > > That's the bit of information I was missing. I'd assumed the > entire door > swung out as one losing all cooling when you work on the rack. The > stable-door approach makes more sense. > > I still like our APC contained hot-aisle system though. > > Horses for course, Huw. (*) > > > SGI did an install in Ireland where they have the IRU chassis mounted > vertically, in those same APC racks. > Seemingly it works quite well - the drawback is that you get three > IRUs per rack rather than four. > But I guess with the APC racks being narrower you do not lose out that > much as you get more racks > per aisle. I must measure this up actually. > > > (*) Come on. Its Friday. We have a "race condition" in another thread. > Let the horsey puns flow. > > > We have had the SGI water cooled doors here for some time now. They are very effective. Early models had issues with condensation pooling under the rack and general pipe sweating, but newer models have drains. Our facility has plenty of chilled water, so this solution made sense. I wouldn't recommend such a system for a single 19" rack. There is quite a bit of plumbing involved and without an economy of scale, it wouldn't make sense to me. http://www.sgi.com/company_info/newsroom/media_coverage/downloads/hpcwire_datacenterchill.pdf Bill -- Bill Sellers, CISSP Team Lead/Systems Administrator, ConITS Sr Systems Analyst, NCI Inc. From hearnsj at googlemail.com Sat Dec 13 01:08:59 2008 From: hearnsj at googlemail.com (John Hearns) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Rear-door heat exchangers and condensation In-Reply-To: <4942C13B.8010309@nasa.gov> References: <200812091535.03920.kilian.cavalotti.work@gmail.com> <9f8092cc0812120935t7a001409oe77ec688965d1cae@mail.gmail.com> <1229105100.5611.1.camel@desktop> <9f8092cc0812121021x4cd4a5b7y354007afd897571d@mail.gmail.com> <4942C13B.8010309@nasa.gov> Message-ID: <9f8092cc0812130108u4cc250e4v705d54c3908eebff@mail.gmail.com> 2008/12/12 Bill Sellers > J I wouldn't recommend such a system for a single 19" rack. There is > quite a bit of plumbing involved and without an economy of scale, it > wouldn't make sense to me. > > Bill, it can also make sense for a small number of racks if you a) have a small room available with either none or a small amount of A/C b) an existing supply of chilled water into the building. In my situation, as I've said in this thread we have our own cooling lake, complete with resident fish and a heron! -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081213/8bab85f9/attachment.html From landman at scalableinformatics.com Sat Dec 13 20:10:43 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] GPU-HMMer for interested people Message-ID: <49448743.7000605@scalableinformatics.com> Hi folks GPU-HMMer (part of the MPI-HMMer effort) has just been announced/released at http://www.mpihmmer.org MPI-HMMer has itself been improved with parallel-IO and better scalability features. JP has measured some large number (about 180x) over single cores on a cluster for the MPI run. Enjoy! Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From brahmaforces at gmail.com Sat Dec 13 03:48:04 2008 From: brahmaforces at gmail.com (arjuna) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: Hello All, Thank you for your detailed responses. Following your line of thought, advice and web links, it seems that it is not difficult to build a small cluster to get started. I explored the photos of the various clusters that have been posted and it seems quite straightforward. It seems I have been siezed by a mad inspiration to do this...The line of thought is t make a 19 inch rack with aluminum plates on which the mother boards are mounted. The plan is first to simply create one using the old computers i have...This can be an experimental one to get going...Thereafter it would make sense to research the right mother boards, cooling and so on... It seems that I am going to take the plunge next week and wire these three computers on a home grown rack... A simple question though...Aluminum plates are used because aluminum is does not conduct electricity. Is this correct? Also for future reference, I saw a reference to dc-dc converters for power supply. Is it possible to use motherboards that do not guzzle electricity and generate a lot of heat and are yet powerful. It seems that not much more is needed that motherboards, CPUs, memory, harddrives and an ethernet card. For a low energy system, has any one explored ultra low energy consuming and heat generating power solutions that maybe use low wattage DC? On Sat, Dec 13, 2008 at 8:50 AM, Mark Hahn wrote: > What is 1u? >> > > rack-mounted hardware is measured in units called "units" ;) > 1U means 1 rack unit: roughly 19" wide and 1.75" high. racks are all > the same width, and rackmount unit consumes some number of units in height. > (rack depth is moderately variable.) (a full rack is generally 42"). > > a 1U server is a basic cluster building block - pretty well suited, > since it's not much taller than a disk, and fits a motherboard pretty > nicely (clearance for dimms if designed properly, a couple optional cards, > passive CPU heatsinks.) > > What is a blade system? >> > > it is a computer design that emphasizes an enclosure and fastening > mechanism > that firmly locks buyers into a particular vendor's high-margin line ;) > > in theory, the idea is to factor a traditional server into separate > components, such as shared power supply, unified management, and often > some semi-integrated network/san infrastructure. one of the main original > selling points was power management: that a blade enclosure would have > fewer, more fully loaded, more efficnet PSUs. and/or more reliable. blades > are often claimed to have superior managability. both of these factors are > very, very arguable, since it's now routine for 1U servers to have nearly > the same PSU efficiency, for instance. and in reality, simple managability > interfaces like IPMI are far better (scalably scriptable) > than a too-smart gui per enclosure, especially if you have 100 > enclosures... > > goes into a good rack in terms of size and matieral (assuming it has to be >> insulated) >> > > ignoring proprietary crap, MB sizes are quite standardized. and since 10 > million random computer shops put them together, they're incredibly > forgiving when it comes to mounting, etc. I'd recommend just glue-gunning > stuff into place, and not worring too much. > > Anyone using clusters for animation on this list? >> > > not much, I think. this list is mainly "using commodity clusters to do > stuff fairly reminiscent of traditional scientific supercomputing". > > animation is, in HPC terms, embarassingly parallel and often quite > IO-intensive. both those are somewhat derogatory. all you need to do > an animation farm is some storage, a network, nodes and probably a > scheduler or at least task queue-er. > -- Best regards, arjuna http://www.brahmaforces.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20081213/10253b36/attachment.html From james.p.lux at jpl.nasa.gov Sun Dec 14 08:24:03 2008 From: james.p.lux at jpl.nasa.gov (Lux, James P) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: Message-ID: On 12/13/08 3:48 AM, "arjuna" wrote: > Hello All, > > Thank you for your detailed responses. Following your line of thought, advice > and web links, it seems that it is not difficult to build a small cluster to > get started. I explored the photos of the various clusters that have been > posted and it seems quite straightforward. > > It seems I have been siezed by a mad inspiration to do this...The line of > thought is t make a 19 inch rack with aluminum plates on which the mother > boards are mounted. > > The plan is first to simply create one using the old computers i have...This > can be an experimental one to get going...Thereafter it would make sense to > research the right mother boards, cooling and so on.. > > It seems that I am going to take the plunge next week and wire these three > computers on a home grown rack... > > A simple question though...Aluminum plates are used because aluminum is does > not conduct electricity. Is this correct? No.. Aluminum is a good conductor. Aluminum is used because it's cheap and easy to work with and doesn't rust. Steel is even cheaper, but harder to work with handtools, heavier, and it needs to be painted. > > Also for future reference, I saw a reference to dc-dc converters for power > supply. Is it possible to use motherboards that do not guzzle electricity and > generate a lot of heat and are yet powerful. It seems that not much more is > needed that motherboards, CPUs, memory, harddrives and an ethernet card. For a > low energy system, has any one explored ultra low energy consuming and heat > generating power solutions that maybe use low wattage DC? In general, the efficiency of line voltage AC to DC power supplies is higher than DC to DC converters, especially once you factor in the need to get the DC that the DC/DC converter starts with. It's a matter of IR losses on the primary side, mostly. For beowulfery, especially for novices, you're looking for inexpensive commodity consumer gear, and that's the standard PC power supplies. As far as the overall power consumption goes, total up the consumption of all the pieces, and it adds up fairly fast. One can use low power devices (e.g. Like those used in battery powered applications such as notebook computers), but typically, you also take a performance hit. Since the vast majority of clusters are not battery powered, and you're interested in computational speed, there's no advantage in replacing 5 standard PCs with 10 lowpower, low speed PCs. From amacater at galactic.demon.co.uk Sun Dec 14 08:40:43 2008 From: amacater at galactic.demon.co.uk (Andrew M.A. Cater) Date: Thu Mar 18 01:08:11 2010 Subject: [Beowulf] Newbie Question: Racks versus boxes and good rack solutions for commodity hardware In-Reply-To: References: <9f8092cc0812110131g28d1103cta38d9aa5fe880326@mail.gmail.com> Message-ID: <20081214164043.GA2668@galactic.demon.co.uk> On Sat, Dec 13, 2008 at 05:18:04PM +0530, arjuna wrote: > Hello All, > > Thank you for your detailed responses. Following your line of thought, > advice and web links, it seems that it is not difficult to build a small > cluster to get started. I explored the photos of the various cluster