From richard.walsh at comcast.net Mon Feb 1 07:24:13 2010 From: richard.walsh at comcast.net (richard.walsh at comcast.net) Date: Mon, 1 Feb 2010 15:24:13 +0000 (UTC) Subject: [Beowulf] Re: GPU Beowulf Clusters In-Reply-To: Message-ID: <724359053.1362061265037853168.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> David Mathog wrote: >Jon Forrest wrote: > >> Are there any other issues I'm leaving out? > >Yes, the time and expense of rewriting your code from a CPU model to a >GPU model, and the learning curve for picking up this new skill. (Unless >you are lucky and somebody has already ported the software you use.) Coming in on this late, but to reduce this work load there is PGI's version 10.0 compiler suite which supports accelerator compiler directives. This will reduce the coding effort, but probably suffer from the classical "if it is easy, it won't perform as well" trade-off. My experience is limited, but a nice intro can be found at: http://www.pgroup.com/lit/articles/insider/v1n1a1.htm You might also inquire with PGI about their SC09 course and class notes or Google for them. rbw _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From Shainer at mellanox.com Mon Feb 1 10:24:14 2010 From: Shainer at mellanox.com (Gilad Shainer) Date: Mon, 1 Feb 2010 10:24:14 -0800 Subject: [Beowulf] HPC Advisory Council: The 2010 (March 15-17) Switzerland HPC Conference Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F025D8266@mtiexch01.mti.com> Sending on behalf of the HPC Advisory Council. The HPC Advisory Council will hold the Switzerland HPC Workshop in March 2010 together with the Swiss National Supercomputing Centre (www.cscs.ch). The workshop will be dedicated to HPC training (interconnect architecture and advanced features, network management, HPC storage, CPU technologies, High performance visualization, accelerators and more). This is an excellent training and education opportunity for HPC IT professionals. The 3-day conference is free for attendees but registration is required. The workshop will include coffee breaks, lunch and evening events courtesy of the HPC Advisory Council. More info on the workshop can be found on the workshop web site - http://www.hpcadvisorycouncil.com/events/switzerland_workshop/. For registration, please use http://www.hpcadvisorycouncil.com/events/switzerland_workshop/attendee_reg.php. Thanks, Gilad From jlforrest at berkeley.edu Mon Feb 1 11:53:30 2010 From: jlforrest at berkeley.edu (Jon Forrest) Date: Mon, 01 Feb 2010 11:53:30 -0800 Subject: [Beowulf] Re: GPU Beowulf Clusters In-Reply-To: <724359053.1362061265037853168.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> References: <724359053.1362061265037853168.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> Message-ID: <4B67313A.7090009@berkeley.edu> On 2/1/2010 7:24 AM, richard.walsh at comcast.net wrote: > Coming in on this late, but to reduce this work load there is PGI's version > 10.0 compiler suite which supports accelerator compiler directives. This > will reduce the coding effort, but probably suffer from the classical > "if it is > easy, it won't perform as well" trade-off. My experience is limited, but > a nice intro can be found at: I'm not sure how much traction such a thing will get. Let's say you have a big Fortran program that you want to port to CUDA. Let's assume you already know where the program spends its time, so you know which routines are good candidates for running on the GPU. Rather than rewriting the whole program in C[++], wouldn't it be easiest to leave all the non-CUDA parts of the program in Fortran, and then to call CUDA routines written in C[++]. Since the CUDA routines will have to be rewritten anyway, why write them in a language which would require purchasing yet another compiler? Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From richard.walsh at comcast.net Mon Feb 1 12:54:45 2010 From: richard.walsh at comcast.net (richard.walsh at comcast.net) Date: Mon, 1 Feb 2010 20:54:45 +0000 (UTC) Subject: [Beowulf] Re: GPU Beowulf Clusters In-Reply-To: <4B67313A.7090009@berkeley.edu> Message-ID: <186160193.1523781265057685252.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> Jon Forrest wrote: >On 2/1/2010 7:24 AM, richard.walsh at comcast.net wrote: > >> Coming in on this late, but to reduce this work load there is PGI's version >> 10.0 compiler suite which supports accelerator compiler directives. This >> will reduce the coding effort, but probably suffer from the classical >> "if it is >> easy, it won't perform as well" trade-off. My experience is limited, but >> a nice intro can be found at: > >I'm not sure how much traction such a thing will get. >Let's say you have a big Fortran program that you want >to port to CUDA. Let's assume you already know where the >program spends its time, so you know which routines >are good candidates for running on the GPU. > >Rather than rewriting the whole program in C[++], >wouldn't it be easiest to leave all the non-CUDA >parts of the program in Fortran, and then to call >CUDA routines written in C[++]. Since the CUDA >routines will have to be rewritten anyway, why >write them in a language which would require >purchasing yet another compiler? Mmm ... not sure I understand the response, but perhaps this response was to a different message ... ?? In any case, the PGI software supports accelerator directives for both C and Fortran, so for those languages I do not see a need to rewrite whole applications. The question presented is the same as always, what does the performance-programming effort function look like and how well does your code perform with directives to start with. The PGI models is also hardware generic and the code runs on the CPU in parallel when there is no GPU around I believe. What will gate interest is how well PGI compiler group does at delivering performance and how important portability is to the person developing the code. HMPP make offers a similar proposition ... rbw _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From michf at post.tau.ac.il Mon Feb 1 15:56:44 2010 From: michf at post.tau.ac.il (Micha) Date: Tue, 02 Feb 2010 01:56:44 +0200 Subject: [Beowulf] Re: GPU Beowulf Clusters In-Reply-To: <186160193.1523781265057685252.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> References: <186160193.1523781265057685252.JavaMail.root@sz0135a.emeryville.ca.mail.comcast.net> Message-ID: <4B676A3C.20504@post.tau.ac.il> On 01/02/2010 22:54, richard.walsh at comcast.net wrote: > > Jon Forrest wrote: > > >On 2/1/2010 7:24 AM, richard.walsh at comcast.net wrote: > > > >> Coming in on this late, but to reduce this work load there is PGI's > version > >> 10.0 compiler suite which supports accelerator compiler directives. This > >> will reduce the coding effort, but probably suffer from the classical > >> "if it is > >> easy, it won't perform as well" trade-off. My experience is limited, but > >> a nice intro can be found at: > > > >I'm not sure how much traction such a thing will get. > >Let's say you have a big Fortran program that you want > >to port to CUDA. Let's assume you already know where the > >program spends its time, so you know which routines > >are good candidates for running on the GPU. > > > >Rather than rewriting the whole program in C[++], > >wouldn't it be easiest to leave all the non-CUDA > >parts of the program in Fortran, and then to call > >CUDA routines written in C[++]. Since the CUDA > >routines will have to be rewritten anyway, why > >write them in a language which would require > >purchasing yet another compiler? > > Mmm ... not sure I understand the response, but perhaps this response > was to a different message ... ?? In any case, the PGI software supports > accelerator directives for both C and Fortran, so for those languages I do > not see a need to rewrite whole applications. The question presented is > the same as always, what does the performance-programming effort function > look like and how well does your code perform with directives to start > with. The PGI models is also hardware generic and the code runs on > the CPU in parallel when there is no GPU around I believe. What will > gate interest is how well PGI compiler group does at delivering performance > and how important portability is to the person developing the code. > As far as I know pgi also has a Cuda Fortran similar to cuda c, not only a directive based approach, but I have to admit that I don't have any experience with it. As for why spend money on a compiler since the code has to be re-written. Even an expensive compiler is cheap with regards to a programmer's time. Even for the salary of a cheap programmer you can buy the compiler in at most two weeks salary's worth. On the other hand, you have a programmer that already knows fortran and a piece of code that is already written and debugged in fortran. Quite a few programs can produce a first unoptimized version with very little work. Just sorting through counter based bugs and memory order bugs can cost you a lot more than the compiler. Fortran is 1 based compared to c that is 0 based (actually fortran 90/95 can use any index range for matrices). Fortran is column order while c is row order. Do you know how much head ache that can bring into the porting? Translating matlab code into fortran is also much easier that into c due to these issues. > HMPP make offers a similar proposition ... > > rbw > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From vanw+beowulf at sabalcore.com Mon Feb 1 15:44:39 2010 From: vanw+beowulf at sabalcore.com (Kevin Van Workum) Date: Mon, 1 Feb 2010 18:44:39 -0500 Subject: [Beowulf] sysstat experience Message-ID: <643a61251002011544g6e44d79by86911a3a673d8f80@mail.gmail.com> Does anyone have any experience using the sysstat tool? What are your opinions? My basic concern is its safety. I'd also like to know if you think it gives good results. sysstat is at: http://pagesperso-orange.fr/sebastien.godard/ -- Kevin Van Workum, PhD Sabalcore Computing Inc. Run your code on 500 processors. Sign up for a free trial account. www.sabalcore.com 877-492-8027 ext. 11 -------------- next part -------------- An HTML attachment was scrubbed... URL: From tegner at renget.se Mon Feb 1 23:45:56 2010 From: tegner at renget.se (tegner at renget.se) Date: Tue, 2 Feb 2010 08:45:56 +0100 Subject: [Beowulf] (no subject) Message-ID: This will boil down to a questions eventually, but I need to give some background first. We are a small group doing CFD, and when we several years ago realized that beowulfs would be the right choice for us we decided to extend our computational capabilities gradually. Every year, or every second year we bought two gigabit switches and a bunch of nodes connected to these switches. One of the switches is used for mpi communications and one for connecting the nodes to a fileserver and a master node. As of today we have five "subclusters", all connected to the same filserver and master node (torque/maui is used to distribute the jobs on the different subclusters). This has worked out great for us, and we do believe the strategy of buying gradually has been advantageous to us (instead of doing larger purchases less often), and we want to continue extending our hardware in this fashion. Up till now we have not been hurt by the fact that we have a single fileserver (connected to a bunch of raided drives), but we anticipate there will be issues when we further extend the number of nodes. And we plan on building a separate "infiniband storage network" (consisting of a 24 DDR switch) and connect a number of "gluster nodes" to it. Each subcluster will then be connected to this "infiniband storage network" via one (or maybe several) ports. However, we will still limit the jobs to run within there separate subcluster and we are going to accept lower bandwidth between the subclusters. By doing this we gain the following: (i) We can get more computational nodes, since we are limiting the number of ports used to connect the switches to each other. (ii) For our application I/O is not as demanding as the "mpi-communiction" but we are still getting - hopefully - acceptable I/O performance. (iii) We can extend our storage by adding more "gluster nodes" to the "infiniband storage network" when needed. (iv) We can continue adding subclusters when we have the money. And we can also remove old ones when they "cost" too much (in terms of electricity/performance, maintenance etc.). Since we havent worked with infiniband before, the question is simply if there could be issues with this approach? Regards, and thanks, /jon From tegner at renget.se Tue Feb 2 00:36:36 2010 From: tegner at renget.se (tegner at renget.se) Date: Tue, 2 Feb 2010 09:36:36 +0100 Subject: [Beowulf] storage solution/investment strategy Message-ID: Please excuse me, I forgot to put in the subject. Probably best to just disregard my previous post (content is the same). /jon This will boil down to a questions eventually, but I need to give some background first. We are a small group doing CFD, and when we several years ago realized that beowulfs would be the right choice for us we decided to extend our computational capabilities gradually. Every year, or every second year we bought two gigabit switches and a bunch of nodes connected to these switches. One of the switches is used for mpi communications and one for connecting the nodes to a fileserver and a master node. As of today we have five "subclusters", all connected to the same filserver and master node (torque/maui is used to distribute the jobs on the different subclusters). This has worked out great for us, and we do believe the strategy of buying gradually has been advantageous to us (instead of doing larger purchases less often), and we want to continue extending our hardware in this fashion. Up till now we have not been hurt by the fact that we have a single fileserver (connected to a bunch of raided drives), but we anticipate there will be issues when we further extend the number of nodes. And we plan on building a separate "infiniband storage network" (consisting of a 24 DDR switch) and connect a number of "gluster nodes" to it. Each subcluster will then be connected to this "infiniband storage network" via one (or maybe several) ports. However, we will still limit the jobs to run within there separate subcluster and we are going to accept lower bandwidth between the subclusters. By doing this we gain the following: (i) We can get more computational nodes, since we are limiting the number of ports used to connect the switches to each other. (ii) For our application I/O is not as demanding as the "mpi-communiction" but we are still getting (hopefully) acceptable I/O performance. (iii) We can extend our storage by adding more "gluster nodes" to the "infiniband storage network" when needed. (iv) We can continue adding subclusters when we have the money. And we can also remove old ones when they "cost" too much (in terms of electricity/performance, maintenance etc.). Since we havent worked with infiniband before, the question is simply if there could be issues with this approach? Regards, and thanks, /jon From diep at xs4all.nl Tue Feb 2 01:41:44 2010 From: diep at xs4all.nl (Vincent Diepeveen) Date: Tue, 2 Feb 2010 10:41:44 +0100 Subject: [Beowulf] hardware question - which PSU for this? Message-ID: <971A1CC5-ABBD-4E02-8010-1260805E2DD1@xs4all.nl> hi, This seems ideal mainboard for beowulf clusters. built in infiniband it seems. http://cgi.ebay.com/Arima-AMD-Opteron-Quad-Core-Socket-F-3000-series- Server_W0QQitemZ390149471460QQcmdZViewItemQQptZCOMP_EN_Networking_Compon ents?hash=item5ad6b87ce4 they get offered regurarly and cheap. on ebay a while ago 10 of 'em got offered: http://cgi.ebay.com/ws/eBayISAPI.dll? ViewItem&item=360198390884&ssPageName=ADME:X:RTQ:US:1123 The problem is no manufacturer lists these boards and which PSU fits on it and what cpu's is unclear. that's not a good way to build a cluster huh? well suppose there is a psu that works, then that's ideal to build a cluster with. those barcelona cores do like $40 - $90 a piece on ebay now. So most expensive part of the machine is the DDR2- ECC-reg ram. Some 'cheap offers' on ebay regrettably are not ecc-reg despite what the description. For the coming time of course pricewise nothing beats 16 core AMD's. Those cpu's way faster than i had thought. Intel is too expensive second hand simply and always will be, as they are very good in marketing their hardware as being fast, despite that already for like half a decade they have no good 4 socket platform getting sold (if we forget about Dunnington - that thing is too expensive anyway still so let's forget about Dunnington). intel and amd seem to have canned all type of new cpu's. i read now nehalem-ex is just 6 cores, no longer 8 and 2.26Ghz, so that's not so interesting as IPC wise for well written software you can mathematical prove that AMD is nearly same speed. If intel has that problem i suppose amd won't have a 12 core version of their chip any soon, besides those are like $2k a piece the highend versions. Too much. At least for my software the barcelona's are way faster than i had thought. DDR3 is a lot faster latency of course than DDR2, but the total cpu speed is what counts in first place and the price you can get it for. Considering testreports of those 8 core cpu's i guess we won't see them soon. What was it 700mm^2? They can't produce those in times like this i guess. Too expensive, no company wants to buy that. So hence my quest for cheap alternatives :) sorry for off-topic posting. guess though more of you are looking for cheap clusters. Seems NASA has to size down also nowadays, so no more intel hardware for it in future either i bet. Vincent From john.hearns at mclaren.com Tue Feb 2 03:24:38 2010 From: john.hearns at mclaren.com (Hearns, John) Date: Tue, 2 Feb 2010 11:24:38 -0000 Subject: [Beowulf] hardware question - which PSU for this? In-Reply-To: <971A1CC5-ABBD-4E02-8010-1260805E2DD1@xs4all.nl> References: <971A1CC5-ABBD-4E02-8010-1260805E2DD1@xs4all.nl> Message-ID: <68A57CCFD4005646957BD2D18E60667B0F245EC1@milexchmb1.mil.tagmclarengroup.com> > > intel and amd seem to have canned all type of new cpu's. i read now > nehalem-ex is just 6 cores, no longer 8 and 2.26Ghz, Vincent, please can you provide a reference for this? My understanding is that Nehalem-EX will be available in eight core versions and a six core edition, which is attractive for HPC use. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. From jlforrest at berkeley.edu Tue Feb 2 14:00:37 2010 From: jlforrest at berkeley.edu (Jon Forrest) Date: Tue, 02 Feb 2010 14:00:37 -0800 Subject: [Beowulf] Transient NFS Problems in New Cluster Message-ID: <4B68A085.2070406@berkeley.edu> I have a new cluster running CentOS 5.3. The cluster uses a Sun 7310 storage server that provides NFS service over a private 1Gb/s ethernet with 9K jumbo frames to the cluster. We've noticed that a number of the compute nodes sometimes generate the automount[15023]: umount_autofs_indirect: ask umount returned busy /home message. When this happens the program running on the node dies. This has happened between 10 and 20 times. We're not sure what's going on on a node when this happens. Most of the time everything is fine and the home directories are automounted without problem. I've googled for this problem and I see that other people have seen it too, but I've never seen a resolution, especially not for RHEL5. The auto.master line for this mount is /home /etc/auto.home --timeout=1200 noatime,nodiratime,rw,noacl,rsize=32768,wsize=32768 The network interface configuration is eth0 Link encap:Ethernet HWaddr 00:30:48:B9:F6:52 inet addr:10.1.255.233 Bcast:10.1.255.255 Mask:255.255.0.0 inet6 addr: fe80::230:48ff:feb9:f652/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1 RX packets:32999308 errors:0 dropped:0 overruns:0 frame:0 TX packets:27468315 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:24225053296 (22.5 GiB) TX bytes:73313582546 (68.2 GiB) Interrupt:74 Base address:0x2000 Any advice on what to do? Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 From landman at scalableinformatics.com Tue Feb 2 14:29:13 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 02 Feb 2010 17:29:13 -0500 Subject: [Beowulf] Transient NFS Problems in New Cluster In-Reply-To: <4B68A085.2070406@berkeley.edu> References: <4B68A085.2070406@berkeley.edu> Message-ID: <4B68A739.3020805@scalableinformatics.com> Jon Forrest wrote: > I have a new cluster running CentOS 5.3. > The cluster uses a Sun 7310 storage server > that provides NFS service over a private > 1Gb/s ethernet with 9K jumbo frames to the > cluster. > > We've noticed that a number of the compute > nodes sometimes generate the > > automount[15023]: umount_autofs_indirect: ask umount returned busy /home [...] > Any advice on what to do? We still recommend turning off autofs for home directories. We've seen lots of problems with it on many clusters. Hard mounts are IMO better. That server should be able to handle it. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From jlforrest at berkeley.edu Tue Feb 2 15:27:20 2010 From: jlforrest at berkeley.edu (Jon Forrest) Date: Tue, 02 Feb 2010 15:27:20 -0800 Subject: [Beowulf] Transient NFS Problems in New Cluster In-Reply-To: <4B68A739.3020805@scalableinformatics.com> References: <4B68A085.2070406@berkeley.edu> <4B68A739.3020805@scalableinformatics.com> Message-ID: <4B68B4D8.3010807@berkeley.edu> On 2/2/2010 2:29 PM, Joe Landman wrote: > We still recommend turning off autofs for home directories. We've seen > lots of problems with it on many clusters. Hard mounts are IMO better. > That server should be able to handle it. These problems were also happening for another non-home mount, but I hear what you're saying. This is the only cluster we're seeing the problem on, but then this is the only cluster with a Sun storage server. All the others are using CentOS 5.X. Do you think this problem could be caused by the server? Also, what do you believe the fundamental cause is? Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From christiansuhendra at gmail.com Mon Feb 1 22:26:50 2010 From: christiansuhendra at gmail.com (christian suhendra) Date: Mon, 1 Feb 2010 18:26:50 -1200 Subject: [Beowulf] problem of mpich-1.2.7p1 Message-ID: hello guys i have installed mpich-1.2.7p1 on ubuntu 9.04, i have configured hte NFS and RSH.. i use device=ch_p4,, but when i ran my program it's like not working i've got this result : root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 1 canon Process 0 of 1 on cluster3 Total Time: 4.316000 msecs root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 4 canon Process 0 of 4 on cluster3 Total Time: 21.552000 msecs Process 2 of 4 on cluster2 Process 1 of 4 on cluster1 Process 3 of 4 on cluster1 root at cluster3:/mirror/mpich-1.2.7p1# the process only wotk in 1 node.. but when i test the machine it connected to all node.. root at cluster3:/mirror/mpich-1.2.7p1# /mirror/mpich-1.2.7p1/sbin/tstmachines -v LINUX Trying true on cluster1 ... Trying true on cluster2 ... Trying true on cluster3 ... Trying true on cluster4 ... Trying ls on cluster1 ... Trying ls on cluster2 ... Trying ls on cluster3 ... Trying ls on cluster4 ... Trying user program on cluster1 ... Trying user program on cluster2 ... Trying user program on cluster3 ... Trying user program on cluster4 ... i don't know where exactly the problem so that my program cannot run in all node.. please help me... my deadline its about 1 week later... i'm very excpeting your help... i attached my listing program so you can test on your system thank you very much... regards christian -------------- next part -------------- An HTML attachment was scrubbed... URL: From gus at ldeo.columbia.edu Tue Feb 2 17:48:07 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 02 Feb 2010 20:48:07 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: References: Message-ID: <4B68D5D7.1010907@ldeo.columbia.edu> Hi Christian Somehow your program was not attached to the message. In any case, you didn't say anything about your "machinefile" contents. You need to list the nodes you want to use there. The command line will be something like this: mpirun -np 4 -machinefile my_machinefile canon "man mpirun" may help you with the details. (I assume you are using the mpirun that comes with mpich1.) Having said that, I suggest that you move from MPICH-1 to OpenMPI or to MPICH2. MPICH-1 (mpich-1.2.7p1) is old, not maintained or supported anymore, and often times breaks in current Linux kernels. The MPICH developers also recommend upgrading to MPICH2. OpenMPI and MPICH2 are free, easy to install, stable, up to date, and more efficient than MPICH1. Upgrading to one of them is likely to avoid more trouble later, specially with your tight deadline. See: http://www.open-mpi.org/ http://www.mcs.anl.gov/research/projects/mpich2/ I hope this helps, Gus Correa --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- christian suhendra wrote: > hello guys > i have installed mpich-1.2.7p1 on ubuntu 9.04, i have configured hte NFS > and RSH.. > i use device=ch_p4,, > but when i ran my program it's like not working i've got this result : > root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 1 canon > Process 0 of 1 on cluster3 > Total Time: 4.316000 msecs > root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 4 canon > Process 0 of 4 on cluster3 > Total Time: 21.552000 msecs > Process 2 of 4 on cluster2 > Process 1 of 4 on cluster1 > Process 3 of 4 on cluster1 > root at cluster3:/mirror/mpich-1.2.7p1# > > the process only wotk in 1 node.. > but when i test the machine it connected to all node.. > root at cluster3:/mirror/mpich-1.2.7p1# > /mirror/mpich-1.2.7p1/sbin/tstmachines -v LINUX > Trying true on cluster1 ... > Trying true on cluster2 ... > Trying true on cluster3 ... > Trying true on cluster4 ... > Trying ls on cluster1 ... > Trying ls on cluster2 ... > Trying ls on cluster3 ... > Trying ls on cluster4 ... > Trying user program on cluster1 ... > Trying user program on cluster2 ... > Trying user program on cluster3 ... > Trying user program on cluster4 ... > > i don't know where exactly the problem so that my program cannot run in > all node.. > please help me... > my deadline its about 1 week later... > i'm very excpeting your help... > > > i attached my listing program so you can test on your system > thank you very much... > > > > > regards > christian > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From gus at ldeo.columbia.edu Tue Feb 2 17:58:01 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 02 Feb 2010 20:58:01 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: <4B68D5D7.1010907@ldeo.columbia.edu> References: <4B68D5D7.1010907@ldeo.columbia.edu> Message-ID: <4B68D829.4010604@ldeo.columbia.edu> PS - And don't run the programs as root! Gus Correa Gus Correa wrote: > Hi Christian > > Somehow your program was not attached to the message. > > In any case, you didn't say anything about your "machinefile" contents. > You need to list the nodes you want to use there. > The command line will be something like this: > > mpirun -np 4 -machinefile my_machinefile canon > > "man mpirun" may help you with the details. > (I assume you are using the mpirun that comes with mpich1.) > > Having said that, I suggest that you move from MPICH-1 to > OpenMPI or to MPICH2. > MPICH-1 (mpich-1.2.7p1) is old, not maintained or supported anymore, > and often times breaks in current Linux kernels. > The MPICH developers also recommend upgrading to MPICH2. > > OpenMPI and MPICH2 are free, easy to install, stable, up to date, > and more efficient than MPICH1. > Upgrading to one of them is likely to avoid more trouble later, > specially with your tight deadline. > > See: > http://www.open-mpi.org/ > http://www.mcs.anl.gov/research/projects/mpich2/ > > > I hope this helps, > Gus Correa > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > > > christian suhendra wrote: >> hello guys >> i have installed mpich-1.2.7p1 on ubuntu 9.04, i have configured hte >> NFS and RSH.. >> i use device=ch_p4,, >> but when i ran my program it's like not working i've got this result : >> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 1 canon >> Process 0 of 1 on cluster3 >> Total Time: 4.316000 msecs >> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 4 canon >> Process 0 of 4 on cluster3 >> Total Time: 21.552000 msecs >> Process 2 of 4 on cluster2 >> Process 1 of 4 on cluster1 >> Process 3 of 4 on cluster1 >> root at cluster3:/mirror/mpich-1.2.7p1# >> >> the process only wotk in 1 node.. >> but when i test the machine it connected to all node.. >> root at cluster3:/mirror/mpich-1.2.7p1# >> /mirror/mpich-1.2.7p1/sbin/tstmachines -v LINUX >> Trying true on cluster1 ... >> Trying true on cluster2 ... >> Trying true on cluster3 ... >> Trying true on cluster4 ... >> Trying ls on cluster1 ... >> Trying ls on cluster2 ... >> Trying ls on cluster3 ... >> Trying ls on cluster4 ... >> Trying user program on cluster1 ... >> Trying user program on cluster2 ... >> Trying user program on cluster3 ... >> Trying user program on cluster4 ... >> >> i don't know where exactly the problem so that my program cannot run >> in all node.. >> please help me... >> my deadline its about 1 week later... >> i'm very excpeting your help... >> >> >> i attached my listing program so you can test on your system >> thank you very much... >> >> >> >> >> regards >> christian >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From gus at ldeo.columbia.edu Tue Feb 2 18:31:49 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Tue, 02 Feb 2010 21:31:49 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: <4B68D829.4010604@ldeo.columbia.edu> References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> Message-ID: <4B68E015.6040202@ldeo.columbia.edu> Hi Christian What is the content of your file /mirror/mpich-1.2.7p1/share/machines.LINUX? Please send it on your next message, it may clarify. It looks like to me that your program is working correctly. (I am guessing a bit, because you didn't send the source code.) When you did "mpirun -np 1 canon" it ran one process on cluster3: See: >>> Process 0 of 1 on cluster3 >>> Total Time: 4.316000 msecs When you did "mpirun -np 4 canon" it ran two processes on cluster1, and one in cluster2 and cluster3. See: >>> Process 0 of 4 on cluster3 >>> Total Time: 21.552000 msecs >>> Process 2 of 4 on cluster2 >>> Process 1 of 4 on cluster1 >>> Process 3 of 4 on cluster1 Did you expect more output than this? Did you expect a different output? Did you expect it to use a different set of computers? Anyway, you would be better off upgrading to OpenMPI or MPICH2. The README file in the OpenMPI tarball has all information you need to install it. Chances are that MPICH1 will break in more complicated programs. And remember not to run user-level programs as root. That's not really safe. I hope this helps. Gus Correa --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- Gus Correa wrote: > PS - And don't run the programs as root! > > Gus Correa > > Gus Correa wrote: >> Hi Christian >> >> Somehow your program was not attached to the message. >> >> In any case, you didn't say anything about your "machinefile" contents. >> You need to list the nodes you want to use there. >> The command line will be something like this: >> >> mpirun -np 4 -machinefile my_machinefile canon >> >> "man mpirun" may help you with the details. >> (I assume you are using the mpirun that comes with mpich1.) >> >> Having said that, I suggest that you move from MPICH-1 to >> OpenMPI or to MPICH2. >> MPICH-1 (mpich-1.2.7p1) is old, not maintained or supported anymore, >> and often times breaks in current Linux kernels. >> The MPICH developers also recommend upgrading to MPICH2. >> >> OpenMPI and MPICH2 are free, easy to install, stable, up to date, >> and more efficient than MPICH1. >> Upgrading to one of them is likely to avoid more trouble later, >> specially with your tight deadline. >> >> See: >> http://www.open-mpi.org/ >> http://www.mcs.anl.gov/research/projects/mpich2/ >> >> >> I hope this helps, >> Gus Correa >> --------------------------------------------------------------------- >> Gustavo Correa >> Lamont-Doherty Earth Observatory - Columbia University >> Palisades, NY, 10964-8000 - USA >> --------------------------------------------------------------------- >> >> >> christian suhendra wrote: >>> hello guys >>> i have installed mpich-1.2.7p1 on ubuntu 9.04, i have configured hte >>> NFS and RSH.. >>> i use device=ch_p4,, >>> but when i ran my program it's like not working i've got this result : >>> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 1 canon >>> Process 0 of 1 on cluster3 >>> Total Time: 4.316000 msecs >>> root at cluster3:/mirror/mpich-1.2.7p1# mpirun -np 4 canon >>> Process 0 of 4 on cluster3 >>> Total Time: 21.552000 msecs >>> Process 2 of 4 on cluster2 >>> Process 1 of 4 on cluster1 >>> Process 3 of 4 on cluster1 >>> root at cluster3:/mirror/mpich-1.2.7p1# >>> >>> the process only wotk in 1 node.. >>> but when i test the machine it connected to all node.. >>> root at cluster3:/mirror/mpich-1.2.7p1# >>> /mirror/mpich-1.2.7p1/sbin/tstmachines -v LINUX >>> Trying true on cluster1 ... >>> Trying true on cluster2 ... >>> Trying true on cluster3 ... >>> Trying true on cluster4 ... >>> Trying ls on cluster1 ... >>> Trying ls on cluster2 ... >>> Trying ls on cluster3 ... >>> Trying ls on cluster4 ... >>> Trying user program on cluster1 ... >>> Trying user program on cluster2 ... >>> Trying user program on cluster3 ... >>> Trying user program on cluster4 ... >>> >>> i don't know where exactly the problem so that my program cannot run >>> in all node.. >>> please help me... >>> my deadline its about 1 week later... >>> i'm very excpeting your help... >>> >>> >>> i attached my listing program so you can test on your system >>> thank you very much... >>> >>> >>> >>> >>> regards >>> christian >>> >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From atp at piskorski.com Tue Feb 2 21:03:05 2010 From: atp at piskorski.com (Andrew Piskorski) Date: Wed, 3 Feb 2010 00:03:05 -0500 Subject: [Beowulf] hardware question - which PSU for this? In-Reply-To: <971A1CC5-ABBD-4E02-8010-1260805E2DD1@xs4all.nl> References: <971A1CC5-ABBD-4E02-8010-1260805E2DD1@xs4all.nl> Message-ID: <20100203050305.GA93678@piskorski.com> On Tue, Feb 02, 2010 at 10:41:44AM +0100, Vincent Diepeveen wrote: > This seems ideal mainboard for beowulf clusters. built in infiniband > it seems. > > http://cgi.ebay.com/Arima-AMD-Opteron-Quad-Core-Socket-F-3000-series-Server_W0QQitemZ390149471460QQcmdZViewItemQQptZCOMP_EN_Networking_Components?hash=item5ad6b87ce4 Vincent, what makes you think that motherboard has Infiniband? Do you see connectors for it in the pictures or something? Arima's older SW500 quad socket 940 Opteron motherboard came in four versions, two of which had Infiniband (Mellanox MT25208). Thus, it's quite possible that this newer quad socket F board also had an Infiniband option, but I have no idea if this particular board has it or not. http://www.flextronics.com/computing/support/server/Product/ViewProduct.asp?View=SW500 > The problem is no manufacturer lists these boards and which PSU fits > on it and what cpu's is unclear. It's an Arima model 40GCMG020-D400-100. Perhaps it was intended as an OEM motherboard for Gateway or Dell; at least, I can't think of any other good reason why there's essentially no information whatsoever about it on web. Arima sold its computer business to Flextronics in 2007, perhaps that's related somehow. Chris Morrell (aka, "[XC] gomeler", a Georgia Tech student and sysadmin) seems to have made good progress in figuring out the functions of most of the pins on the motherboard's non-standard power connectors, but hasn't yet succeeded in getting his to boot: http://www.xtremesystems.org/forums/showpost.php?p=4194423&postcount=137 http://www.xtremesystems.org/forums/showpost.php?p=4224859&postcount=175 http://atrejus.net/arima/arima-gr.jpg http://forums.2cpu.com/showthread.php?s=e603598f0c265e1e0725a0b10b1a3757&p=772296#post772296 -- Andrew Piskorski http://www.piskorski.com/ From henning.fehrmann at aei.mpg.de Tue Feb 2 23:28:45 2010 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Wed, 3 Feb 2010 08:28:45 +0100 Subject: [Beowulf] Transient NFS Problems in New Cluster In-Reply-To: <4B68A085.2070406@berkeley.edu> References: <4B68A085.2070406@berkeley.edu> Message-ID: <20100203072845.GA3809@gretchen.aei.mpg.de> On Tue, Feb 02, 2010 at 02:00:37PM -0800, Jon Forrest wrote: > I have a new cluster running CentOS 5.3. > The cluster uses a Sun 7310 storage server > that provides NFS service over a private > 1Gb/s ethernet with 9K jumbo frames to the > cluster. > > We've noticed that a number of the compute > nodes sometimes generate the > > automount[15023]: umount_autofs_indirect: ask umount returned busy /home > > message. When this happens the program running on the > node dies. This has happened between 10 and 20 times. > We're not sure what's going on on a node when this > happens. Most of the time everything is fine and > the home directories are automounted without problem. > > I've googled for this problem and I see that other people > have seen it too, but I've never seen a resolution, > especially not for RHEL5. I guess the problem has not directly something to do with RHEL5. You might want to post this question to autofs at linux.kernel.org They need to know the version of autofs and the kernel. > > The auto.master line for this mount is > > /home /etc/auto.home --timeout=1200 You could try to reduce the timeout. Nothing speaks against a timeout of 60s. Many things can happen in 1200s - especially on the server side. > noatime,nodiratime,rw,noacl,rsize=32768,wsize=32768 You could try nolock on the client side and async on the server side. The user should take care that not two processes are writing into the same files to avoid race conditions. Cheers, Henning From prentice at ias.edu Wed Feb 3 06:56:36 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 03 Feb 2010 09:56:36 -0500 Subject: [Beowulf] GPU Beowulf Clusters In-Reply-To: <4B65C8B0.9060300@pathscale.com> References: <4B61CB86.9060105@berkeley.edu> <20100130143145.43de588a@vivalunalitshi.luna.local> <4B647949.9030700@berkeley.edu> <4B64B83B.9080303@pathscale.com> <4B64DD37.50201@berkeley.edu> <20100131193358.395acc13@vivalunalitshi.luna.local> <4B65C8B0.9060300@pathscale.com> Message-ID: <4B698EA4.3050601@ias.edu> C. Bergstr?m wrote: >> NVidia techs told me that the performance difference can be about 1:2. >> > That used to be true, but I thought they fixed that? (How old is your > information) I heard this myself many times SC09. And that was in reference to Fermi, so doubt it's changed much since then. -- Prentice From prentice at ias.edu Wed Feb 3 07:22:17 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 03 Feb 2010 10:22:17 -0500 Subject: [Beowulf] Transient NFS Problems in New Cluster In-Reply-To: <4B68A085.2070406@berkeley.edu> References: <4B68A085.2070406@berkeley.edu> Message-ID: <4B6994A9.3090304@ias.edu> Jon Forrest wrote: > I have a new cluster running CentOS 5.3. > The cluster uses a Sun 7310 storage server > that provides NFS service over a private > 1Gb/s ethernet with 9K jumbo frames to the > cluster. > > We've noticed that a number of the compute > nodes sometimes generate the > > automount[15023]: umount_autofs_indirect: ask umount returned busy /home > > message. When this happens the program running on the > node dies. This has happened between 10 and 20 times. > We're not sure what's going on on a node when this > happens. Most of the time everything is fine and > the home directories are automounted without problem. > > I've googled for this problem and I see that other people > have seen it too, but I've never seen a resolution, > especially not for RHEL5. > > The auto.master line for this mount is > > /home /etc/auto.home --timeout=1200 > noatime,nodiratime,rw,noacl,rsize=32768,wsize=32768 > > The network interface configuration is > Jon, I had this same exact problem a couple of weeks ago after changing the autmounting scheme on our network, requiring all nodes to reread the automounter configuration. It only happened on a few nodes. My only solution was reboot the nodes with the problem. After rebooting, 'service autofs reload' or 'service autofs restart' worked without a problem. I'm sure that's not the answer you were looking for, but that's all I got. Sorry. I suspect its a bug in the automount daemon, but I can't prove it. -- Prentice From prentice at ias.edu Wed Feb 3 07:27:35 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 03 Feb 2010 10:27:35 -0500 Subject: [Beowulf] Transient NFS Problems in New Cluster In-Reply-To: <4B68A739.3020805@scalableinformatics.com> References: <4B68A085.2070406@berkeley.edu> <4B68A739.3020805@scalableinformatics.com> Message-ID: <4B6995E7.5050604@ias.edu> Joe Landman wrote: > Jon Forrest wrote: >> I have a new cluster running CentOS 5.3. >> The cluster uses a Sun 7310 storage server >> that provides NFS service over a private >> 1Gb/s ethernet with 9K jumbo frames to the >> cluster. >> >> We've noticed that a number of the compute >> nodes sometimes generate the >> >> automount[15023]: umount_autofs_indirect: ask umount returned busy /home > > [...] > >> Any advice on what to do? > > We still recommend turning off autofs for home directories. We've seen > lots of problems with it on many clusters. Hard mounts are IMO better. > That server should be able to handle it. > How do you handle situations where home directories are spread across multiple servers? Do have a large /etc/fstab on ever NFS client? -- Prentice From prentice at ias.edu Wed Feb 3 07:29:52 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Wed, 03 Feb 2010 10:29:52 -0500 Subject: [Beowulf] Transient NFS Problems in New Cluster In-Reply-To: <4B68B4D8.3010807@berkeley.edu> References: <4B68A085.2070406@berkeley.edu> <4B68A739.3020805@scalableinformatics.com> <4B68B4D8.3010807@berkeley.edu> Message-ID: <4B699670.9050708@ias.edu> Jon Forrest wrote: > On 2/2/2010 2:29 PM, Joe Landman wrote: > >> We still recommend turning off autofs for home directories. We've seen >> lots of problems with it on many clusters. Hard mounts are IMO better. >> That server should be able to handle it. > > These problems were also happening for another > non-home mount, but I hear what you're saying. > > This is the only cluster we're seeing the problem > on, but then this is the only cluster with a > Sun storage server. All the others are using > CentOS 5.X. Do you think this problem could > be caused by the server? Also, what do you > believe the fundamental cause is? In my case, the fileserver was a NetApp, and the client was a rebuild of RHEL 2.8WS -- Prentice From gus at ldeo.columbia.edu Wed Feb 3 11:02:21 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed, 03 Feb 2010 14:02:21 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> Message-ID: <4B69C83D.2020301@ldeo.columbia.edu> Hi Christian The program attachment didn't come again. You may try to cut and paste the program to the bottom of the message. Now I see, you are worried about MPI performance, not the program correctness at this point. If your program does too little work, it is likely that the initialization/finalization and the whole MPI setup and communication take more time than the actual computation. If this is the case, and particularly if your network is slow (say Ethernet 100), you will see better performance for less nodes when the "problem size is small". There is nothing wrong with this. This phenomenon, and several variants of it, are called "Amdahl's Law": http://en.wikipedia.org/wiki/Amdahl's_law In general the "problem size" is controlled by one or a few numbers on your code or on your parameter files. Problem size may be controlled by, say, the size of an array or matrix, the number of iterations of a main loop, etc. Could you perhaps increase the problem size on your code, say boost it up 10 or 100 times, and see if the performance in many nodes still beats one node alone? I hope this helps, Gus Correa --------------------------------------------------------------------- Gustavo Correa Lamont-Doherty Earth Observatory - Columbia University Palisades, NY, 10964-8000 - USA --------------------------------------------------------------------- christian suhendra wrote: > oh...the content of /mirror/mpich-1.2.7p1/share/ > machines.LINUX are the hostname of each node.. > here is the content: > cluster1 > cluster2 > cluster3 > cluster4 > > when i ran the canon on 1 node i got the total time is 4.316000 msecs > but when i ran canon in 4 node. > see: > mpirun -np 4 the total is 21.552000 msecs > > it takes a long time then i node..it supposed to be more faster then 1 > node/PC.. > i this case i juzt need the mpich or my program work in all of node so > that the total time would be more faster then run in 1 node.. > > i attached my program so you could investigated the problem, but i > thougt the real problem is on the configuration.. > > > thank you so much mr. gus... > i really need your help i don't know how to solve this problem even my > lecturer on my university doesn't know how to solve this..actually this > is my final project for my thesis.. > and i take this because i wants to be an expert on this field sometimes.. > > > regards > christian From kus at free.net Wed Feb 3 11:11:10 2010 From: kus at free.net (Mikhail Kuzminsky) Date: Wed, 03 Feb 2010 22:11:10 +0300 Subject: [Beowulf] top500 clusters using Message-ID: Last top500 lists contain a set of sites which are not from HPC world. I understand that there may be parallelized applications which may use whole cluster for one task, but this task isn't floating-point oriented. But (in top500) there is a set of "unnamed" sites like "IT service provider" etc. IMHO they may use clusters for Web-hosting etc, where may be load balancing is used. I.e. it's not "supercomputer" (I means computer, all CPUs/cores of which may be used for solving of one task). Am I right - or they really work w/applications, which may involve whole cluster for one task solving ? Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry RAS Moscow From Shainer at mellanox.com Wed Feb 3 11:34:49 2010 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed, 3 Feb 2010 11:34:49 -0800 Subject: [Beowulf] top500 clusters using References: Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F025D8623@mtiexch01.mti.com> Not all of the Top500 systems are really HPC systems. Several systems are basically used for enterprise applications but before moving them into the "production" environment, the associate vendor run the Linpack benchmark and submit the system to the list. Check out http://www.hpcwire.com/features/17905159.html. Somewhat old but still valid for some cases. Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Mikhail Kuzminsky Sent: Wednesday, February 03, 2010 11:11 AM To: beowulf at beowulf.org Subject: [Beowulf] top500 clusters using Last top500 lists contain a set of sites which are not from HPC world. I understand that there may be parallelized applications which may use whole cluster for one task, but this task isn't floating-point oriented. But (in top500) there is a set of "unnamed" sites like "IT service provider" etc. IMHO they may use clusters for Web-hosting etc, where may be load balancing is used. I.e. it's not "supercomputer" (I means computer, all CPUs/cores of which may be used for solving of one task). Am I right - or they really work w/applications, which may involve whole cluster for one task solving ? Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry RAS Moscow _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Wed Feb 3 13:27:00 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 3 Feb 2010 16:27:00 -0500 (EST) Subject: [Beowulf] top500 clusters using In-Reply-To: References: Message-ID: > Last top500 lists contain a set of sites which are not from HPC world. sure - it always has. top500 is not particularly HPC-specific, since linpack only weakly measures important factors like memory and interconnect bandwidth and latency. the main appeal of linpack is that it's pretty well-understood and uses hardware (FPUs) present in all conventional machines... > I understand that there may be parallelized applications which may use whole > cluster for one task, but this task isn't floating-point oriented. I would almost say that linpack is not particularly about FP (since you can derive integer rates from the scores if you want.) > etc. IMHO they may use clusters for Web-hosting etc, where may be load > balancing is used. I.e. it's not "supercomputer" (I means computer, all > CPUs/cores of which may be used for solving of one task). I think it's more common than you think for clusters to be load-balanced among many applications - or conversely that single-job clusters, which might strictly be called "capability", are rare. From gus at ldeo.columbia.edu Wed Feb 3 19:01:17 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Wed, 03 Feb 2010 22:01:17 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> <4B69C83D.2020301@ldeo.columbia.edu> Message-ID: <4B6A387D.4000900@ldeo.columbia.edu> Hi Christian Is the code trying to multiply two matrices using block decomposition? In MPICH2 you need to establish passwordless ssh (not rsh!) connection across your machines. You also need to start the mpd daemon ring (although there is now also the Hydra mechanism, but I am not familiar to it). Finally you launch your program with mpirun (now called also mpiexec), and you can pass your machine file in the command line (in MPICH1 you could do the same too). "man mpirun" will help you. In OpenMPI you also need to establish passwordless ssh connection across the machines. However, there is no daemon ring to start, making the process easier. Likewise, you can give a hostfile/machine file on the command line, or you can list the hosts one by one in the command line. In my opinion OpenMPI is easier to use and install (and more flexible, and has more features). I hope this helps, Gus Correa christian suhendra wrote: > please take a look my source code.. > > mr.gus i wanna ask something about the mpich2 > if i swith my mpich1 to mpich2 where do i have supposed to put the list > of machine of each node.. > if in mpich1 the list in machines.LINUX.. > in mpich2 where is it?? > > about amdahl law..i still confused about that..^_^ > maybe i have to read many times so i can understand.. > > thank you mr.gus.. > i still need your advice > > > > > regards > christian christian suhendra wrote: > hello mr.gus here is my source code.. > when i compile that it works.. > > #include > #include > #include > #include > > #define N 100 /* 512*/ > > void Print_Matrix(int x, int y, int myid,int S, int *M); > int readmat(char *fname, int *mat, int n); > int writemat(char *fname, int *mat, int n); > > int A[N][N],B[N][N],C[N][N]; > int main(int argc, char **argv){ > int myid, S, M, nproc,source,dest; > int i,j,k,m,l,repeat,temp,S2; > int namelen; > char processor_name[MPI_MAX_PROCESSOR_NAME]; > MPI_Status status; > double t1, t2,t3,t4; > > int *T,*TA,*TB,*t,*TC; > MPI_Comm GRID_COMM; > int ndims,dims[2]; > int periods[2],coords[2],grid_rank; > > MPI_Init(&argc, &argv); /* initialization MPI */ > MPI_Comm_rank(MPI_COMM_WORLD,&myid); > MPI_Comm_size(MPI_COMM_WORLD,&nproc); /* #procesor */ > MPI_Get_processor_name(processor_name,&namelen); > > printf("Process %d of %d on %s\n",myid, nproc, processor_name); > > if(myid==0) { > /* read data from files: "A_file", "B_file"*/ > if (readmat("A_file.txt", (int *) A,N) < 0) > exit( 1 + printf("file problem\n") ); > if (readmat("B_file.txt", (int *) B, N) < 0) > exit( 1 + printf("file problem\n") ); > /*catat waktu*/ > t1=MPI_Wtime(); > } > /* topologi*/ > M=(int)sqrt(nproc); > S=N/M; /*dimensi blok*/ > S2=S*S; /*dimensi blok*/ > dims[0]=dims[1]=M; /*dimensi topologi*/ > periods[0]=periods[1]=1; > MPI_Cart_create(MPI_COMM_WORLD,2,dims,periods,0,&GRID_COMM); > > MPI_Comm_rank(GRID_COMM,&grid_rank); > MPI_Cart_coords(GRID_COMM,grid_rank,2,coords); > myid=grid_rank; > source=coords[0]; > dest=coords[1]; > > /*place for matrix input and output*/ > TA=(int *)malloc(sizeof(MPI_INT)*S2); > TB=(int *)malloc(sizeof(MPI_INT)*S2); > TC=(int *)malloc(sizeof(MPI_INT)*S2); > > for(i=0; i TC[i]=0; > > /*start cannon*/ > if(myid==0) > { > T=(int *)malloc(sizeof(MPI_INT)*S2); > t3=MPI_Wtime(); /*timing*/ > for(k=0; k MPI_Cart_coords(GRID_COMM,k,2,coords); > if(k==0){ > t=TA; > for(i=k; i temp=(k*S)%N; > for(j=temp; j *t=A[i][j]; > t++; > } > } > } > else{ > t=T; > for(i=coords[0]*S; i<(coords[0]+1)*S; i++) > for(j=coords[1]*S; j<(coords[1]+1)*S; j++){ > *t=A[i][j]; > t++; > } > MPI_Send(T,S2,MPI_INT,k,0,GRID_COMM); > } > } > > for(k=0; k MPI_Cart_coords(GRID_COMM,k,2,coords); > if(k==0){ > t=TB; > for(i=k; i temp=(k*S)%N; > for(j=temp; j *t=B[i][j]; > t++; > } > } > } > else{ > t=T; > for(i=coords[0]*S; i<(coords[0]+1)*S; i++) > for(j=coords[1]*S; j<(coords[1]+1)*S; j++){ > *t=B[i][j]; > t++; > } > MPI_Send(T,S2,MPI_INT,k,1,GRID_COMM); > } > } > > coords[0]=source; > coords[1]=dest; > t4= MPI_Wtime(); > > /*matriks multiplication*/ > for(repeat=0; repeat for(i=0; i for(j=0; j for(k=0; k TC[i*S+j]+=TA[i*S+k]*TB[j+k*S]; > } > } > } > /*swivel block & fill the value*/ > MPI_Cart_shift(GRID_COMM,1,-1,&source,&dest); > > MPI_Sendrecv_replace(TA,S2,MPI_INT,dest,0,source,0,GRID_COMM,&status); > > MPI_Cart_shift(GRID_COMM,0,-1,&source,&dest); > > MPI_Sendrecv_replace(TB,S2,MPI_INT,dest,0,source,0,GRID_COMM,&status); > } > > for(i=0; i for(j=0; j C[i][j]=TC[i*S+j]; > } > > for(i=1; i l=0; > m=0; > > MPI_Recv(T,S2,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG,GRID_COMM,&status); > MPI_Cart_coords(GRID_COMM,status.MPI_TAG,2,coords); > > for(j=coords[0]*S; j<(coords[0]+1)*S; j++){ > for(k=coords[1]*S; k<(coords[1]+1)*S; k++){ > C[j][k]=T[l*S+m]; > m++; > } > l++; > m=0; > } > } > > t2= MPI_Wtime(); > printf("Total Time: %lf msecs \n",(t2 - t1) / 0.001); > //printf("Transmit Time: %lf msecs \n",(t4 - t3) / 0.001); > writemat("C_file_par", (int *) C, N); > > free(T); > free(TA); > free(TB); > free(TC); > } > else > { > MPI_Recv(TA,S2,MPI_INT,0,0,GRID_COMM,&status); > MPI_Recv(TB,S2,MPI_INT,0,1,GRID_COMM,&status); > > MPI_Cart_shift(GRID_COMM,1,-coords[0],&source,&dest); > MPI_Sendrecv_replace(TA,S2,MPI_INT,dest,0,source,0,GRID_COMM,&status); > > MPI_Cart_shift(GRID_COMM,0,-coords[1],&source,&dest); > MPI_Sendrecv_replace(TB,S2,MPI_INT,dest,0,source,0,GRID_COMM,&status); > for(repeat=0; repeat for(i=0; i for(j=0; j for(k=0; k TC[i*S+j]+=TA[i*S+k]*TB[j+k*S]; > } > } > } > > MPI_Cart_shift(GRID_COMM,1,-1,&source,&dest); > > MPI_Sendrecv_replace(TA,S2,MPI_INT,dest,0,source,0,GRID_COMM,&status); > > MPI_Cart_shift(GRID_COMM,0,-1,&source,&dest); > > MPI_Sendrecv_replace(TB,S2,MPI_INT,dest,0,source,0,GRID_COMM,&status); > } > > MPI_Send(TC,S2,MPI_INT,0,myid,GRID_COMM); > free(TA); > free(TB); > free(TC); > } > MPI_Finalize(); > return(0); > } > /*function of read the input and put to the output/ > #define _mat(i,j) (mat[(i)*n + (j)]) > int readmat(char *fname, int *mat, int n){ > FILE *fp; > int i, j; > if ((fp = fopen(fname, "r")) == NULL) > return (-1); > for (i = 0; i < n; i++) > for (j = 0; j < n; j++) > if (fscanf(fp, "%d", &_mat(i,j)) == EOF){ > fclose(fp); > return (-1); > }; > fclose(fp); > return (0); > } > int writemat(char *fname, int *mat, int n){ > FILE *fp; > int i, j; > > if ((fp = fopen(fname, "w")) == NULL) > return (-1); > for (i = 0; i < n; fprintf(fp, "\n"), i++) > for (j = 0; j < n; j++) > fprintf(fp, "%d ", _mat(i, j)); > fclose(fp); > return (0); > } > void Print_Matrix(int x, int y, int myid, int S, int *M){ > int i,j; > printf("myid:%d\n",myid); > for(i=0; i { > for(j=0; j printf(" %d ",M[i*S+j]); > printf("\n"); > } > } > From tegner at renget.se Thu Feb 4 07:18:13 2010 From: tegner at renget.se (tegner at renget.se) Date: Thu, 4 Feb 2010 16:18:13 +0100 Subject: [Beowulf] Pxe boot over infiniband Message-ID: <91b02d46ce89bb91cd4ad96b2e434e2a.squirrel@webmail01.one.com> When googling for "pxe boot over infiniband" it seems this is not possible for all types of hardware. Is this correct? And if so, are there other solutions (except connecting all nodes to a gigabit switch as well)? Regards, /jon From mdidomenico4 at gmail.com Thu Feb 4 07:48:06 2010 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Thu, 4 Feb 2010 10:48:06 -0500 Subject: [Beowulf] Pxe boot over infiniband In-Reply-To: <91b02d46ce89bb91cd4ad96b2e434e2a.squirrel@webmail01.one.com> References: <91b02d46ce89bb91cd4ad96b2e434e2a.squirrel@webmail01.one.com> Message-ID: at one point etherboot had images for mt23108 and mt25208 cards. last i can recall (3yrs ago) mellanox did some work in this area, but i'm not sure if it went very far when i was with qlogic we looked at the feasibility and found that most compute nodes had an ethernet connection anyhow, so the effort and interest was not there to build out the software On Thu, Feb 4, 2010 at 10:18 AM, wrote: > When googling for "pxe boot over infiniband" it seems this is not possible > for all types of hardware. Is this correct? And if so, are there other > solutions (except connecting all nodes to a gigabit switch as well)? > > Regards, > > /jon > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From john.hearns at mclaren.com Thu Feb 4 07:50:39 2010 From: john.hearns at mclaren.com (Hearns, John) Date: Thu, 4 Feb 2010 15:50:39 -0000 Subject: [Beowulf] Pxe boot over infiniband In-Reply-To: <91b02d46ce89bb91cd4ad96b2e434e2a.squirrel@webmail01.one.com> References: <91b02d46ce89bb91cd4ad96b2e434e2a.squirrel@webmail01.one.com> Message-ID: <68A57CCFD4005646957BD2D18E60667B0F31D053@milexchmb1.mil.tagmclarengroup.com> > > When googling for "pxe boot over infiniband" it seems this is not > possible > for all types of hardware. Is this correct? And if so, are there other > solutions (except connecting all nodes to a gigabit switch as well)? > Go back to basics - get a keyboard/monitor and a USB CD drive. Spend a fun day on your knees being roasted and deafened at the same time installing by hand from DVD. Or get a long, long patch lead from your install server and spend several happy hours running back and forth. Seriously though - an alternate approach might be to use the functionality in BMC cards to present a virtual floppy/CD/DVD drive and boot from that, either booting a minimal ramdisk with infiniband support, or just install from the virtual DVD. I've never done this, mind... Then again this means running network cables for your BMC/IPMI cards... The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. From gus at ldeo.columbia.edu Thu Feb 4 08:35:37 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 04 Feb 2010 11:35:37 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> <4B69C83D.2020301@ldeo.columbia.edu> <4B6A387D.4000900@ldeo.columbia.edu> Message-ID: <4B6AF759.9030208@ldeo.columbia.edu> Hi Christian If you already set up passwordless ssh across the nodes OpenMPI will probably get you up and running faster than MPICH2. OpenMPI is very easy to install, say, with gcc, g++, and gfortran (make sure you have them installed on your main machine, use Ubuntu apt-get, if you don't have them). Just configure with something like this: ./configure --prefix=/your/OpenMPI/install/dir CC=gcc CXX=g++ F77=gfortran FC=gfortran You can add optimization flags if you want, but their defaults are good. This will give you the C,C++,Fortran 77, and Fortran 90 MPI bindings. You can install OpenMPI a NFS mounted directory, and set your PATH LD_LIBRARY_PATH, and MANPATH on your .bashrc/.cshrc file to point also to the OpenMPI sub-directories. This way you do a single install, no need to install on the other computer nodes also. A more laborious alternative is to install on all nodes. For details check the README file and their FAQ: http://www.open-mpi.org/faq/ I prefer OpenMPI because it is easier to handle and is more flexible, but that is a matter of personal taste and needs. I hope this helps. Gus Correa PS - If this is a Beowulf discussion, please Cc. your messages to Bewoulf . christian suhendra wrote: > yes..the input is on the txt file..matrix A an matrix B > i didn't send you the file because its to large. and i can't attached it > from my PC.. > > > oh thanks.. > i will try to use mpich2 or openmpi.. > what do you prefer of this mpich2 or openmpi?? > i mean for easy configuration,,,.. > thank you very much Mr.gus > > > > regards > christian From hahn at mcmaster.ca Thu Feb 4 09:27:18 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 4 Feb 2010 12:27:18 -0500 (EST) Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: <4B6A387D.4000900@ldeo.columbia.edu> References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> <4B69C83D.2020301@ldeo.columbia.edu> <4B6A387D.4000900@ldeo.columbia.edu> Message-ID: > In MPICH2 you need to establish passwordless ssh (not rsh!) > connection across your machines. it should be said that mpich2 doesn't strictly require this. all mpi flavors can operate with other spawning methods - some have hooks to be spawned by the resource manager, for instance, which bypasses ssh/rsh type access. > In OpenMPI you also need to establish passwordless ssh connection same for OpenMPI: doesn't actually require passwordless ssh. but if you do want passwordless ssh, IMO the only sane solution is to configure hostbased trust. having an unencrypted private key in your home directory is hideous (moral equivalent of putting your password in a file, in the clear...) From Shainer at mellanox.com Thu Feb 4 09:56:38 2010 From: Shainer at mellanox.com (Gilad Shainer) Date: Thu, 4 Feb 2010 09:56:38 -0800 Subject: [Beowulf] Pxe boot over infiniband References: <91b02d46ce89bb91cd4ad96b2e434e2a.squirrel@webmail01.one.com> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F025D874C@mtiexch01.mti.com> There is a solution that covers at least all the Mellanox InfiniBand adapters. It is called FlexBoot and it is on the Mellanox web site - http://www.mellanox.com/content/pages.php?pg=products_dyn&product_family =34&menu_section=34#tab-two The Mellanox devices supported are: * ConnectX(r) / ConnectX(r)-2 * InfiniHost(r) III Ex * InfiniHost(r) III Lx Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of tegner at renget.se Sent: Thursday, February 04, 2010 7:18 AM To: beowulf at beowulf.org Subject: [Beowulf] Pxe boot over infiniband When googling for "pxe boot over infiniband" it seems this is not possible for all types of hardware. Is this correct? And if so, are there other solutions (except connecting all nodes to a gigabit switch as well)? Regards, /jon _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From mathog at caltech.edu Thu Feb 4 10:04:34 2010 From: mathog at caltech.edu (David Mathog) Date: Thu, 04 Feb 2010 10:04:34 -0800 Subject: [Beowulf] problem of mpich-1.2.7p1 Message-ID: Gus Correa wrote > If you already set up passwordless ssh across the nodes > OpenMPI will probably get you up and running faster than MPICH2. > > OpenMPI is very easy to install, say, with gcc, g++, and > gfortran (make sure you have them installed on your main machine, > use Ubuntu apt-get, if you don't have them). Well on Linux maybe, but since OpenMPI has been soundly kicking my butt trying to get it installed and working on a Solaris 5.8 Sparc system for the last day, I can't let that slide as a general statement. OpenMPI 1.4.1 needed a few minor code mods to build at all using gcc on this system (it expects some defines that aren't present, this is with the sunfreeware gcc versions), and those mods were just about counting CPUs, which wasn't an issue in this case because it is a single CPU system. These same issues were also reported by another fellow for 1.3.1 on a Solaris 8 system: http://www.open-mpi.org/community/lists/users/2009/02/7994.php The gcc version works so long as mpirun only sends jobs to itself. Sadly, try to send ANYTHING to a remote machine (linux Intel, in case that matters) and it treats one to: mca_oob_tcp_msg_send_handler: writev failed: Bad file descriptor This on a build with no warnings or errors. Definitely a problem on the Solaris side, since any of the linux machines can initiate an mpirun to another node, or all other nodes, that works with the example programs. So with gcc, OpenMPI not too useful for the front end of an MPI cluster. Today I'm trying again using Sun's Forte 7 tools, which requires a fairly complex configure line: ./configure --with-sge --prefix=/opt/ompi141 CFLAGS="-xarch=v8plusa" CXXFLAGS="-xarch=v8plusa" FFLAGS="-xarch=v8plusa" FCFLAGS="-xarch=v8plusa" CC=/opt/SUNWspro/bin/cc CXX=/opt/SUNWspro/bin/CC F77=/opt/SUNWspro/bin/f77 FC=/opt/SUNWspro/bin/f95 CCAS=/opt/SUNWspro/bin/cc CCASFLAGS="-xarch=v8plusa" >configure_4.log 2>&1 & Not sure yet if that is sufficient, as none of the preceding configure variants resulted in a set of Makefiles which would actually run to completion, and this one is still building. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From dnlombar at ichips.intel.com Thu Feb 4 10:36:12 2010 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Thu, 4 Feb 2010 10:36:12 -0800 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> <4B69C83D.2020301@ldeo.columbia.edu> <4B6A387D.4000900@ldeo.columbia.edu> Message-ID: <20100204183612.GA18535@nlxdcldnl2.cl.intel.com> On Thu, Feb 04, 2010 at 10:27:18AM -0700, Mark Hahn wrote: > > but if you do want passwordless ssh, IMO the only sane solution is to > configure hostbased trust. having an unencrypted private key in your > home directory is hideous (moral equivalent of putting your password > in a file, in the clear...) Completely agree that host-based passwordless SSH is the best approach, especially when jobs are submitted via a resource manager.. Also agree that an empty passphrase is a particularly bad approach. But, when done via ssh-agent, I don't see partiularly onerous security issues for a usage where you're manually launching jobs from an interactive session unless you have no faith in the system's integrity at all... -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From hahn at mcmaster.ca Thu Feb 4 10:55:11 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 4 Feb 2010 13:55:11 -0500 (EST) Subject: [Beowulf] Pxe boot over infiniband In-Reply-To: <68A57CCFD4005646957BD2D18E60667B0F31D053@milexchmb1.mil.tagmclarengroup.com> References: <91b02d46ce89bb91cd4ad96b2e434e2a.squirrel@webmail01.one.com> <68A57CCFD4005646957BD2D18E60667B0F31D053@milexchmb1.mil.tagmclarengroup.com> Message-ID: >> When googling for "pxe boot over infiniband" it seems this is not >> possible >> for all types of hardware. Is this correct? And if so, are there other >> solutions (except connecting all nodes to a gigabit switch as well)? > > Go back to basics - get a keyboard/monitor and a USB CD drive. well, maybe a usb flash stick. I think that's what I'd do, and it would be pretty cheap, reliable, fast, etc. the only thing on the flash stick would be to fetch and boot the usual kernel/initrd configuration, so the flash image would never need to change (and would be read-only, so presumably not prone to wear. might need kexec if syslinux/etc can't be persuaded to behave quite like this. > Then again this means running network cables for your BMC/IPMI cards... virtual media via BMC is a good idea, but sounds fragile and vendor-specific to me. From hahn at mcmaster.ca Thu Feb 4 11:10:06 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 4 Feb 2010 14:10:06 -0500 (EST) Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: <20100204183612.GA18535@nlxdcldnl2.cl.intel.com> References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> <4B69C83D.2020301@ldeo.columbia.edu> <4B6A387D.4000900@ldeo.columbia.edu> <20100204183612.GA18535@nlxdcldnl2.cl.intel.com> Message-ID: >> but if you do want passwordless ssh, IMO the only sane solution is to >> configure hostbased trust. having an unencrypted private key in your >> home directory is hideous (moral equivalent of putting your password >> in a file, in the clear...) > > Completely agree that host-based passwordless SSH is the best approach, > especially when jobs are submitted via a resource manager.. > > Also agree that an empty passphrase is a particularly bad approach. > > But, when done via ssh-agent, I don't see partiularly onerous security issues > for a usage where you're manually launching jobs from an interactive session > unless you have no faith in the system's integrity at all... absolutely. I spoke sloppily - I use agent-based PK logins myself, and only wanted to badmouth password and unencrypted PK logins. I think it's really important even for end-users to understand the basics of ssh: - first stage is mutual authentication of _machines_. this is what all that "hostkey of xxx has changed; maybe a hack!". once this is done, hosts have an encrypted channel between authentic hosts. - second stage is user PK authentication: the client is challenged to prove knowlege of the private key, which can happen by an un-encrypted private key in ~/.ssh, or by prompting the user for the passphrase to an encrypted privkey, or by interacting with ssh-agent. - finally, as a last resort, username/password can be used - basically the worst case security-wise: maximal exposure to clocal keyboard logging and remote daemon compromise. A QUESTION: how many clusters used/managed by people on this list mandate the use of PK login (ie, rule out passwords)? I know some do, but we haven't, figuring there would be an outcry (not to mention making our systems harder to use for the technically weaker users.) we've thought of providing users with a customized package of windows ssh client with a unique encrypted PK preinstalled. might work... if you think of threat models, it's interesting to note that if an sshable account is attacked through windows-based clients, keylogging is probably the more likley issue. if compromise is of clients on a *nix system, I'm guessing the main risk is unencrypted PKs in /home/*/.ssh. server-side compromise seems to usually be of the daemon, which simply logs password-based logins (not outgoing connections in the versions I've seen, and no compromise of ssh-agent to collect passphrase+key combos...) From gus at ldeo.columbia.edu Thu Feb 4 12:09:43 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 04 Feb 2010 15:09:43 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> <4B69C83D.2020301@ldeo.columbia.edu> <4B6A387D.4000900@ldeo.columbia.edu> Message-ID: <4B6B2987.2090407@ldeo.columbia.edu> Yes, Mark, you are right. Passwordless ssh is not a strict requirement, although it is a simple way to make things work. Yes, host based trust is better than unencrypted private keys. I don't even know if the cluster in question is connected to the Internet, though. My suggestions were directed to someone who is still using MPICH-1, claimed to have trouble with it, not to be familiar to clusters and MPI, and to have a pressing deadline. Therefore, I thought it would be more helpful to give him simple and focused suggestions, rather than a full gamut of possibilities. I guess it would be a great help to him if you post simple instructions (or a link) on how to setup passwordless ssh through host based trust. Regards, Gus Correa Mark Hahn wrote: >> In MPICH2 you need to establish passwordless ssh (not rsh!) >> connection across your machines. > > it should be said that mpich2 doesn't strictly require this. > all mpi flavors can operate with other spawning methods - some have > hooks to be spawned by the resource manager, for instance, which > bypasses ssh/rsh type access. > >> In OpenMPI you also need to establish passwordless ssh connection > > same for OpenMPI: doesn't actually require passwordless ssh. > > but if you do want passwordless ssh, IMO the only sane solution is to > configure hostbased trust. having an unencrypted private key in your > home directory is hideous (moral equivalent of putting your password in > a file, in the clear...) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From gus at ldeo.columbia.edu Thu Feb 4 12:30:48 2010 From: gus at ldeo.columbia.edu (Gus Correa) Date: Thu, 04 Feb 2010 15:30:48 -0500 Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: References: Message-ID: <4B6B2E78.1080404@ldeo.columbia.edu> Hi David Sorry to hear that OpenMPI is a troublemaker in your Open Solaris machines. Have you questioned the OpenMPI list about that? I installed OpenMPI on Linux (Fedora, CentOS) without problems, on clusters with Infiniband, Gigabit Ethernet, old Ethernet 100, and on standalone machines, using Gnu, PGI, and Intel compilers. In my experience it installs easily and works well on Linux. Among other reasons, I recommended it to the person who asked for help because he said his is a Linux cluster (Ubuntu). We also have and use MPICH2 and MVAPICH2 here, though. Gus Correa David Mathog wrote: > Gus Correa wrote >> If you already set up passwordless ssh across the nodes >> OpenMPI will probably get you up and running faster than MPICH2. >> >> OpenMPI is very easy to install, say, with gcc, g++, and >> gfortran (make sure you have them installed on your main machine, >> use Ubuntu apt-get, if you don't have them). > > Well on Linux maybe, but since OpenMPI has been soundly kicking my butt > trying to get it installed and working on a Solaris 5.8 Sparc system for > the last day, I can't let that slide as a general statement. > > OpenMPI 1.4.1 needed a few minor code mods to build at all using gcc on > this system (it expects some defines that aren't present, this is with > the sunfreeware gcc versions), and those mods were just about counting > CPUs, which wasn't an issue in this case because it is a single CPU > system. These same issues were also reported by another fellow for 1.3.1 > on a Solaris 8 system: > > http://www.open-mpi.org/community/lists/users/2009/02/7994.php > > The gcc version works so long as mpirun only sends jobs to itself. > Sadly, try to send ANYTHING to a remote machine (linux Intel, in case > that matters) and it treats one to: > > mca_oob_tcp_msg_send_handler: writev failed: Bad file descriptor > > This on a build with no warnings or errors. Definitely a problem on the > Solaris side, since any of the linux machines can initiate an mpirun to > another node, or all other nodes, that works with the example programs. > So with gcc, OpenMPI not too useful for the front end of an MPI cluster. > > Today I'm trying again using Sun's Forte 7 tools, which requires a > fairly complex configure line: > > ./configure --with-sge --prefix=/opt/ompi141 CFLAGS="-xarch=v8plusa" > CXXFLAGS="-xarch=v8plusa" FFLAGS="-xarch=v8plusa" > FCFLAGS="-xarch=v8plusa" CC=/opt/SUNWspro/bin/cc > CXX=/opt/SUNWspro/bin/CC F77=/opt/SUNWspro/bin/f77 > FC=/opt/SUNWspro/bin/f95 CCAS=/opt/SUNWspro/bin/cc > CCASFLAGS="-xarch=v8plusa" >configure_4.log 2>&1 & > > Not sure yet if that is sufficient, as none of the preceding configure > variants resulted in a set of Makefiles which would actually run to > completion, and this one is still building. > > Regards, > > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hahn at mcmaster.ca Thu Feb 4 12:43:35 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 4 Feb 2010 15:43:35 -0500 (EST) Subject: [Beowulf] problem of mpich-1.2.7p1 In-Reply-To: <4B6B2987.2090407@ldeo.columbia.edu> References: <4B68D5D7.1010907@ldeo.columbia.edu> <4B68D829.4010604@ldeo.columbia.edu> <4B68E015.6040202@ldeo.columbia.edu> <4B69C83D.2020301@ldeo.columbia.edu> <4B6A387D.4000900@ldeo.columbia.edu> <4B6B2987.2090407@ldeo.columbia.edu> Message-ID: > simple instructions (or a link) on how to setup passwordless ssh > through host based trust. it's fairly simple. hosts need to know each other (ie, host keys in /etc/ssh/ssh_known_hosts), and each machine needs a list of trusted hosts in /etc/ssh/shosts.equiv. target machines need sshd_config to contain "HostbasedAuthentication yes". source machines need ssh_config to contain "EnableSSHKeysign yes" (I don't remember whether clients can do this via "ssh -oEnableSSHKeysign=yes" or not.) one nice thing about hostbased trust is that it can (and probably should be) asymmetric. to be useful, compute nodes probably need to trust admin and/or login nodes, but your login node doesn't have to trust compute nodes. of course, you should never use this for machines you don't, well, "trust" (such as random client machines outside your admin control...) unencrypted public keys are very easy, and they work - the problem is that it's like putting your password into a file called ".hacker.please.take" ;) From mdidomenico4 at gmail.com Thu Feb 4 14:11:52 2010 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Thu, 4 Feb 2010 17:11:52 -0500 Subject: [Beowulf] 48/96 disk jbods? Message-ID: Can anyone recommend a decent 48 or 96 (if they exist) disk jbod on the market? I don't need the everything that goes along with a raid system or anything that's network attached. I just need to hang a lot of storage from a machine, sas/sata/fc doesn't matter I found this one on the web, but i'm curious if anyone knows of any others that might be out there. http://www.xtore-es.com/downloads/datasheet/XJ2000%20SAS_PUB-00036-B_Final.pdf thanks From landman at scalableinformatics.com Thu Feb 4 14:30:25 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 04 Feb 2010 17:30:25 -0500 Subject: [Beowulf] 48/96 disk jbods? In-Reply-To: References: Message-ID: <4B6B4A81.5020008@scalableinformatics.com> Michael Di Domenico wrote: > Can anyone recommend a decent 48 or 96 (if they exist) disk jbod on > the market? I don't need the everything that goes along with a raid > system or anything that's network attached. I just need to hang a lot > of storage from a machine, sas/sata/fc doesn't matter Self built or pre-built? If the latter, you can use our delta V units as this: http://www.scalableinformatics.com/delta-v . JBOD over iSCSI or similar. We software RAID them by default, though there is no reason we couldn't have many iSCSI targets over 10GbE/IB/ethernet. You don't see it there, but there is a DV5 ... 48 bay top load. The unit you pointed to has the unfortunate problem of requiring you take two devices out to get at one failed drive. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From farmantrout at burnsmcd.com Fri Feb 5 05:06:27 2010 From: farmantrout at burnsmcd.com (Armantrout, Fred) Date: Fri, 5 Feb 2010 07:06:27 -0600 Subject: [Beowulf] 48/96 disk jbods? In-Reply-To: <4B6B4A81.5020008@scalableinformatics.com> References: <4B6B4A81.5020008@scalableinformatics.com> Message-ID: <7F611EB6D6C2064883F59190F87FFD620BF58E58EE@BMCDMAIL01.burnsmcd.com> I have seen a large drive setup from HP called HP StorageWorks 600 Modular Disk System Info includes 5U rackmount form factor Supports seventy 3.5" LFF Universal hot pluggable SAS or SATA drives Two pull-out drive drawers support hot plug large form factor dual-ported SAS or archival-class SATA drives in just 5U of rack space (35 hot plug drives per drawer) -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Joe Landman Sent: Thursday, February 04, 2010 4:30 PM To: Michael Di Domenico Cc: Beowulf Mailing List Subject: Re: [Beowulf] 48/96 disk jbods? Michael Di Domenico wrote: > Can anyone recommend a decent 48 or 96 (if they exist) disk jbod on > the market? I don't need the everything that goes along with a raid > system or anything that's network attached. I just need to hang a lot > of storage from a machine, sas/sata/fc doesn't matter Self built or pre-built? If the latter, you can use our delta V units as this: http://www.scalableinformatics.com/delta-v . JBOD over iSCSI or similar. We software RAID them by default, though there is no reason we couldn't have many iSCSI targets over 10GbE/IB/ethernet. You don't see it there, but there is a DV5 ... 48 bay top load. The unit you pointed to has the unfortunate problem of requiring you take two devices out to get at one failed drive. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From hearnsj at googlemail.com Sat Feb 6 02:55:54 2010 From: hearnsj at googlemail.com (John Hearns) Date: Sat, 6 Feb 2010 10:55:54 +0000 Subject: [Beowulf] Cisco OTV Message-ID: <9f8092cc1002060255t4d130c60sf3d2ffee7755d10@mail.gmail.com> http://www.theregister.co.uk/2010/02/05/cisco_report_otv_nexus_7000/ This looks interesting from the point of view of clusters in a cloud computing environment. Then again, your application is going to see much greater latencies if half your virtual machines get shifted to another data centre! (Having said that, AFAIK Amazon has an option where you can specify all the machins you rent are spatially together). From hearnsj at googlemail.com Sat Feb 6 13:03:32 2010 From: hearnsj at googlemail.com (John Hearns) Date: Sat, 6 Feb 2010 21:03:32 +0000 Subject: [Beowulf] Low cost IB cards Message-ID: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.com> There was discussion on this list on lwo cost IB cards. Can someone remind me of the vendor? Similarly for switches. From brockp at umich.edu Sat Feb 6 13:45:15 2010 From: brockp at umich.edu (Brock Palen) Date: Sat, 6 Feb 2010 16:45:15 -0500 Subject: [Beowulf] Low cost IB cards In-Reply-To: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.com> References: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.com> Message-ID: <00DF270B-AF99-4017-A07F-6EEDE752B77A@umich.edu> Colfax Direct: http://www.colfaxdirect.com/store/pc/home.asp Though had cases where switches with 'integrated subnet manager' really means 'no subnet manager' Brock Palen www.umich.edu/~brockp Center for Advanced Computing brockp at umich.edu (734)936-1985 On Feb 6, 2010, at 4:03 PM, John Hearns wrote: > There was discussion on this list on lwo cost IB cards. > Can someone remind me of the vendor? Similarly for switches. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > From Michael.Frese at NumerEx-LLC.com Sat Feb 6 16:13:18 2010 From: Michael.Frese at NumerEx-LLC.com (Michael H. Frese) Date: Sat, 06 Feb 2010 17:13:18 -0700 Subject: [Beowulf] Low cost IB cards In-Reply-To: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.co m> References: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.com> Message-ID: <6.2.5.6.2.20100206170128.06d5db38@NumerEx-LLC.com> Be sure your OS -- as in "cat /etc/issue" -- has all the bells and whistles for IB. We bought those 4X SDR cards and an 8 port switch and tried and tried and tried for over a year off and on and finally failed on various flavors of Fedora to bring up OFA/OFED on IB. We finally succeeded with CentOS 5.3 or 4. Then we brought up CentOS on a mixed cluster and now its NFS is failing miserably to communicate with some of the older OS's, on which problem you hear more later.... Ugh! There didn't seem to be any hardware problems, though. Mike At 02:03 PM 2/6/2010, you wrote: >There was discussion on this list on lwo cost IB cards. >Can someone remind me of the vendor? Similarly for switches. >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf From sabujp at gmail.com Sat Feb 6 16:23:13 2010 From: sabujp at gmail.com (Sabuj Pattanayek) Date: Sat, 6 Feb 2010 18:23:13 -0600 Subject: [Beowulf] Low cost IB cards In-Reply-To: <00DF270B-AF99-4017-A07F-6EEDE752B77A@umich.edu> References: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.com> <00DF270B-AF99-4017-A07F-6EEDE752B77A@umich.edu> Message-ID: Dell. We got a Mellanox 36 port unmanaged infiniscale iv QDR IB switch with a single psu for ~$4.8k and an extra PSU for $640. We looked at all the turnkey vendors. Not a single one could compete with their prices. On Sat, Feb 6, 2010 at 3:45 PM, Brock Palen wrote: > Colfax Direct: > http://www.colfaxdirect.com/store/pc/home.asp From landman at scalableinformatics.com Sat Feb 6 16:46:47 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Sat, 06 Feb 2010 19:46:47 -0500 Subject: [Beowulf] Low cost IB cards In-Reply-To: <6.2.5.6.2.20100206170128.06d5db38@NumerEx-LLC.com> References: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.com> <6.2.5.6.2.20100206170128.06d5db38@NumerEx-LLC.com> Message-ID: <4B6E0D77.8000103@scalableinformatics.com> Michael H. Frese wrote: > Be sure your OS -- as in "cat /etc/issue" -- has all the bells and > whistles for IB. We bought those 4X SDR cards and an 8 port switch and > tried and tried and tried for over a year off and on and finally failed > on various flavors of Fedora to bring up OFA/OFED on IB. We finally Last I checked, OFED isn't supported by Fedora. Understand, Fedora *is* a rapidly moving target. I don't have anything against it, I just used it to replace a failing Ubuntu 9.10 load on a home franken-machine. But I know that lots of things won't work on it. Right now I am struggling with their ideology ... nouveau versus Nvidia. I want to remove the former and install the latter. And it ain't easy. OFED is picky about its kernels. No matter which distro you use, kernels matter. > succeeded with CentOS 5.3 or 4. Then we brought up CentOS on a mixed > cluster and now its NFS is failing miserably to communicate with some of > the older OS's, on which problem you hear more later.... Older RHEL kernels that Centos are based upon aren't terribly good at NFS. > > Ugh! > > There didn't seem to be any hardware problems, though. We typically build and install our own kernels. Its hard to stabilize RHEL kernels at high data rates from IO systems (and networks, but thats another story). This said, there are lots of crappy hardware bits out there. IB tends to be pretty good. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From hahn at mcmaster.ca Sat Feb 6 18:41:22 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 6 Feb 2010 21:41:22 -0500 (EST) Subject: [Beowulf] Low cost IB cards In-Reply-To: <4B6E0D77.8000103@scalableinformatics.com> References: <9f8092cc1002061303u3c510a9al3296e65dc7ede011@mail.gmail.com> <6.2.5.6.2.20100206170128.06d5db38@NumerEx-LLC.com> <4B6E0D77.8000103@scalableinformatics.com> Message-ID: > Last I checked, OFED isn't supported by Fedora. Understand, Fedora *is* a > rapidly moving target. fedora is great for desktops and other less "entangled" machines; centos is more appropriate for the latter. I have no opinion about debian derivatives except that they seem redundant and often come with unwelcome and unwarranted attitude... > work on it. Right now I am struggling with their ideology ... nouveau versus > Nvidia. I want to remove the former and install the latter. And it ain't > easy. it _is_ easy with akmods. until vendors do what it takes for open-source drivers, getting along with binary blobs is good. as far as I can see, akmods are the right way to do it: just recompile the shim when necessary. regards, mark hahn. From gerry.creager at tamu.edu Mon Feb 8 13:05:52 2010 From: gerry.creager at tamu.edu (Gerald Creager) Date: Mon, 08 Feb 2010 15:05:52 -0600 Subject: [Beowulf] UPS signaling scripts? Message-ID: <4B707CB0.1000808@tamu.edu> Looking for a usable script that will allow me to listen to an APC data center UPS and power down a cluster when we go to UPS power. Anyone got a solution? My last experience with PowerChute was pretty sad, I'm afraid. gerry -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From orion at cora.nwra.com Mon Feb 8 13:28:17 2010 From: orion at cora.nwra.com (Orion Poplawski) Date: Mon, 08 Feb 2010 14:28:17 -0700 Subject: [Beowulf] UPS signaling scripts? In-Reply-To: <4B707CB0.1000808@tamu.edu> References: <4B707CB0.1000808@tamu.edu> Message-ID: <4B7081F1.5050803@cora.nwra.com> On 2/8/2010 2:05 PM, Gerald Creager wrote: > Looking for a usable script that will allow me to listen to an APC > data center UPS and power down a cluster when we go to UPS power. > Anyone got a solution? My last experience with PowerChute was pretty > sad, I'm afraid. > > gerry Have you tried apcupsd http://www.apcupsd.com/ ? From dnlombar at ichips.intel.com Mon Feb 8 13:32:44 2010 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Mon, 8 Feb 2010 13:32:44 -0800 Subject: [Beowulf] UPS signaling scripts? In-Reply-To: <4B707CB0.1000808@tamu.edu> References: <4B707CB0.1000808@tamu.edu> Message-ID: <20100208213244.GB8320@nlxcldnl2.cl.intel.com> On Mon, Feb 08, 2010 at 01:05:52PM -0800, Gerald Creager wrote: > Looking for a usable script that will allow me to listen to an APC data > center UPS and power down a cluster when we go to UPS power. Anyone got > a solution? My last experience with PowerChute was pretty sad, I'm afraid. $ yum info apcupsd Loaded plugins: refresh-packagekit Available Packages Name : apcupsd Arch : x86_64 Version : 3.14.8 Release : 1.fc12 Size : 283 k Repo : updates Summary : APC UPS Power Control Daemon for Linux URL : http://www.apcupsd.com License : GPLv2 Description: Apcupsd can be used for controlling most APC UPSes. During a : power failure, apcupsd will inform the users about the power : failure and that a shutdown may occur. If power is not restored, : a system shutdown will follow when the battery is exausted, a : timeout (seconds) expires, or the battery runtime expires based : on internal APC calculations determined by power consumption : rates. If the power is restored before one of the above shutdown : conditions is met, apcupsd will inform users about this fact. : Some features depend on what UPS model you have (simple or smart). $ -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From brs at usf.edu Mon Feb 8 13:43:09 2010 From: brs at usf.edu (Brian Smith) Date: Mon, 08 Feb 2010 16:43:09 -0500 Subject: [Beowulf] UPS signaling scripts? In-Reply-To: <4B707CB0.1000808@tamu.edu> References: <4B707CB0.1000808@tamu.edu> Message-ID: <1265665389.2478.26.camel@localhost.localdomain> http://www.networkupstools.org/ This worked for me back in the day. Otherwise, bash+snmpget+cron is the alternative. -- Brian Smith Senior Systems Administrator IT Research Computing, University of South Florida 4202 E. Fowler Ave. ENB204 Office Phone: +1 813 974-1467 Organization URL: http://rc.usf.edu On Mon, 2010-02-08 at 15:05 -0600, Gerald Creager wrote: > Looking for a usable script that will allow me to listen to an APC data > center UPS and power down a cluster when we go to UPS power. Anyone got > a solution? My last experience with PowerChute was pretty sad, I'm afraid. > > gerry From smulcahy at atlanticlinux.ie Mon Feb 8 13:57:58 2010 From: smulcahy at atlanticlinux.ie (stephen mulcahy) Date: Mon, 08 Feb 2010 21:57:58 +0000 Subject: [Beowulf] UPS signaling scripts? In-Reply-To: <4B707CB0.1000808@tamu.edu> References: <4B707CB0.1000808@tamu.edu> Message-ID: <4B7088E6.1060905@atlanticlinux.ie> On 08/02/2010 21:05, Gerald Creager wrote: > Looking for a usable script that will allow me to listen to an APC data > center UPS and power down a cluster when we go to UPS power. Anyone got > a solution? My last experience with PowerChute was pretty sad, I'm afraid. > > gerry I've been using apcupsd for a few months and have found it to work well - I think I tested the shutdown part once during initial install, thankfully haven't needed it since. -stephen -- Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway) From prentice at ias.edu Mon Feb 8 14:07:59 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Mon, 08 Feb 2010 17:07:59 -0500 Subject: [Beowulf] UPS signaling scripts? In-Reply-To: <4B7088E6.1060905@atlanticlinux.ie> References: <4B707CB0.1000808@tamu.edu> <4B7088E6.1060905@atlanticlinux.ie> Message-ID: <4B708B3F.3050800@ias.edu> stephen mulcahy wrote: > On 08/02/2010 21:05, Gerald Creager wrote: >> Looking for a usable script that will allow me to listen to an APC data >> center UPS and power down a cluster when we go to UPS power. Anyone got >> a solution? My last experience with PowerChute was pretty sad, I'm >> afraid. >> >> gerry > > I've been using apcupsd for a few months and have found it to work well > - I think I tested the shutdown part once during initial install, > thankfully haven't needed it since. > > -stephen > I've also had excellent experience with apcupsd in the past. (I'm not using it now, since we're not using APC UPSes here). -- Prentice From henning.fehrmann at aei.mpg.de Mon Feb 8 22:14:14 2010 From: henning.fehrmann at aei.mpg.de (Henning Fehrmann) Date: Tue, 9 Feb 2010 07:14:14 +0100 Subject: [Beowulf] UPS signaling scripts? In-Reply-To: <4B707CB0.1000808@tamu.edu> References: <4B707CB0.1000808@tamu.edu> Message-ID: <20100209061414.GA3585@gretchen.aei.mpg.de> Hi Gerry, On Mon, Feb 08, 2010 at 03:05:52PM -0600, Gerald Creager wrote: > Looking for a usable script that will allow me to listen to an APC > data center UPS and power down a cluster when we go to UPS power. > Anyone got a solution? My last experience with PowerChute was pretty > sad, I'm afraid. There is NUT - Network UPS Tools. It works fine for at least ~ 1700 clients. http://www.networkupstools.org/ The NUT server provides the information whether a UPS is running on battery and how much battery charge is approximately left. The clients get the information and start scripts - e.g. a shutdown routine. Cheers, Henning From forum.san at gmail.com Mon Feb 8 23:12:49 2010 From: forum.san at gmail.com (Sangamesh B) Date: Tue, 9 Feb 2010 12:42:49 +0530 Subject: [Beowulf] UPS signaling scripts? In-Reply-To: <20100209061414.GA3585@gretchen.aei.mpg.de> References: <4B707CB0.1000808@tamu.edu> <20100209061414.GA3585@gretchen.aei.mpg.de> Message-ID: Hi Gerry, What problem are you facing with APC Powerchute software? Because we also have started using this. As of now its working fine. Thank you, Sangamesh On Tue, Feb 9, 2010 at 11:44 AM, Henning Fehrmann < henning.fehrmann at aei.mpg.de> wrote: > Hi Gerry, > > > On Mon, Feb 08, 2010 at 03:05:52PM -0600, Gerald Creager wrote: > > Looking for a usable script that will allow me to listen to an APC > > data center UPS and power down a cluster when we go to UPS power. > > Anyone got a solution? My last experience with PowerChute was pretty > > sad, I'm afraid. > > There is NUT - Network UPS Tools. It works fine for at least > ~ 1700 clients. > > http://www.networkupstools.org/ > > The NUT server provides the information whether a UPS is running on > battery and how much battery charge is approximately left. > The clients get the information and start scripts - e.g. a shutdown > routine. > > Cheers, > Henning > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathog at caltech.edu Thu Feb 11 13:51:24 2010 From: mathog at caltech.edu (David Mathog) Date: Thu, 11 Feb 2010 13:51:24 -0800 Subject: [Beowulf] Thinking about going used Message-ID: There are a lot of rack mount servers showing up on ebay and elsewhere after they come out of service at data centers after a few years. These might not cut it for the cutting edge folks, but would be fine for us to replace our existing (ancient) cluster. Any suggestions for models to look at for a good price/performance point for say 2-4 cores in the box, 1 or 2 SATA disks, and at least one 1000baseT interface, and a good reliability history? (The disks would probably be bought separately as a lot of these are sold diskless, and new disks aren't that expensive.) For instance, there have recently been a lot of Arima/Rioworks HDAMA based dual dual core Opteron systems advertised. Possibly because they are old enough that nobody wants to buy them ;-). Those would be fast enough for us, but the SATA controller seems to be just for RAID usage. Might work though if these support "raid 0" on a single disk. Thanks, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From gerry.creager at tamu.edu Thu Feb 11 14:52:39 2010 From: gerry.creager at tamu.edu (Gerald Creager) Date: Thu, 11 Feb 2010 16:52:39 -0600 Subject: [Beowulf] Thinking about going used In-Reply-To: References: Message-ID: <4B748A37.6060005@tamu.edu> I've been getting Dell 1425's for <$100 and adding $150 worth of memory to make 'em a usable small server. I think they're OK for low-end compute nodes, but watching the list there are other machines more suitable. gerry David Mathog wrote: > There are a lot of rack mount servers showing up on ebay and elsewhere > after they come out of service at data centers after a few years. These > might not cut it for the cutting edge folks, but would be fine for us to > replace our existing (ancient) cluster. > > Any suggestions for models to look at for a good price/performance point > for say 2-4 cores in the box, 1 or 2 SATA disks, and at least one > 1000baseT interface, and a good reliability history? (The disks would > probably be bought separately as a lot of these are sold diskless, and > new disks aren't that expensive.) > > For instance, there have recently been a lot of Arima/Rioworks HDAMA > based dual dual core Opteron systems advertised. Possibly because they > are old enough that nobody wants to buy them ;-). Those would be fast > enough for us, but the SATA controller seems to be just for RAID usage. > Might work though if these support "raid 0" on a single disk. > > Thanks, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From james.p.lux at jpl.nasa.gov Thu Feb 11 16:18:55 2010 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Thu, 11 Feb 2010 16:18:55 -0800 Subject: [Beowulf] Thinking about going used In-Reply-To: <4B748A37.6060005@tamu.edu> References: <4B748A37.6060005@tamu.edu> Message-ID: > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Gerald Creager > Sent: Thursday, February 11, 2010 2:53 PM > To: David Mathog > Cc: beowulf at beowulf.org > Subject: Re: [Beowulf] Thinking about going used > > I've been getting Dell 1425's for <$100 and adding $150 worth of memory > to make 'em a usable small server. I think they're OK for low-end > compute nodes, but watching the list there are other machines more suitable. > > gerry > At first, I thought you were referring to the inspiron 1425s, which are a laptop of sorts.. then I found the SC1425. For a demo cluster or fooling around, at $200-250/each (4GB RAM, 80GB disk, etc.) this kind of thing seems pretty attractive. Makes the under $2K non-trivial cluster doable. From rpnabar at gmail.com Thu Feb 11 23:41:36 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 12 Feb 2010 01:41:36 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? Message-ID: I came across this very interesting thread on a related mailing list that I thought would be quite relevant to those of us using Dell hardware. Apparently Dell has started hardware-blocking hard-drives that are not "Dell certified". http://lists.us.dell.com/pipermail/linux-poweredge/2010-February/041274.html Don't get mad at me, Dell; I believe the larger public interest outweighs any reluctance on my part to name vendor names. Conflict of interest statement: I've been very annoyed by Dell tech-support last week. I'm prejudiced. -- Rahul From rpnabar at gmail.com Fri Feb 12 00:06:31 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 12 Feb 2010 02:06:31 -0600 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: Message-ID: On Fri, Feb 12, 2010 at 1:41 AM, Rahul Nabar wrote: > I came across this very interesting thread on a related mailing ?list > that I thought would be quite relevant to those of us using Dell > hardware. Apparently Dell has started hardware-blocking hard-drives > that are not "Dell certified". I guess I should qualify the original statement in the interest of fairness: So far only with the Gen11 Dell servers. R710 etc.with H700 and a couple of other PERC RAID controllers seem to be the ones that have this policy. So, it seems only their new and cutting edge hardware has this. I still feel this isn't the right approach. -- Rahul From kilian.cavalotti.work at gmail.com Fri Feb 12 00:51:09 2010 From: kilian.cavalotti.work at gmail.com (Kilian CAVALOTTI) Date: Fri, 12 Feb 2010 09:51:09 +0100 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: Message-ID: On Fri, Feb 12, 2010 at 9:06 AM, Rahul Nabar wrote: > On Fri, Feb 12, 2010 at 1:41 AM, Rahul Nabar wrote: >> I came across this very interesting thread on a related mailing ?list >> that I thought would be quite relevant to those of us using Dell >> hardware. Apparently Dell has started hardware-blocking hard-drives >> that are not "Dell certified". http://www.channelregister.co.uk/2010/02/10/dell_perc_11th_gen_qualified_hdds_only/ Cheers, -- Kilian From smulcahy at atlanticlinux.ie Fri Feb 12 02:21:21 2010 From: smulcahy at atlanticlinux.ie (stephen mulcahy) Date: Fri, 12 Feb 2010 10:21:21 +0000 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: Message-ID: <4B752BA1.2040901@atlanticlinux.ie> Kilian CAVALOTTI wrote: > On Fri, Feb 12, 2010 at 9:06 AM, Rahul Nabar wrote: >> On Fri, Feb 12, 2010 at 1:41 AM, Rahul Nabar wrote: >>> I came across this very interesting thread on a related mailing list >>> that I thought would be quite relevant to those of us using Dell >>> hardware. Apparently Dell has started hardware-blocking hard-drives >>> that are not "Dell certified". > > http://www.channelregister.co.uk/2010/02/10/dell_perc_11th_gen_qualified_hdds_only/ > > Cheers, I purchased some cheap Dell servers a few years ago Poweredge 1600SC's I think. I went to upgrade their CDROM drives with some DVDROM drives we had spare and the systems refused to boot. When I phoned Dell support they told me the servers wouldn't operate with non-Dell drives so I don't think this is a new policy - but maybe they had different policies for different servers. Anyways, I haven't had much interest in Dell servers since then - I'm ok with your support entitlement being degraded if the BIOS detects non-qualified parts in the system but refusing to operate a commodity PC with other commodity PC components isn't what I want from my system vendor. -stephen -- Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway) From reuti at staff.uni-marburg.de Fri Feb 12 02:21:43 2010 From: reuti at staff.uni-marburg.de (Reuti) Date: Fri, 12 Feb 2010 11:21:43 +0100 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: Message-ID: <6B51FF25-AFF1-42F3-9829-1E127D6CFD2E@staff.uni-marburg.de> Hi, Am 12.02.2010 um 09:51 schrieb Kilian CAVALOTTI: > On Fri, Feb 12, 2010 at 9:06 AM, Rahul Nabar > wrote: >> On Fri, Feb 12, 2010 at 1:41 AM, Rahul Nabar >> wrote: >>> I came across this very interesting thread on a related mailing >>> list >>> that I thought would be quite relevant to those of us using Dell >>> hardware. Apparently Dell has started hardware-blocking hard-drives >>> that are not "Dell certified". > > http://www.channelregister.co.uk/2010/02/10/ > dell_perc_11th_gen_qualified_hdds_only/ are they just blocking non-qualified drives, but you could still install qualified ones bought not from Dell? And: will their qualified ones work on other controllers? This sounds like ProStor prevents usage of other disks in RDX drives' cartridges (AFAIK they are doing this by ATA passwords*), and you are limited to their cartridges. And in emergency case you can't use the disks with other controllers due to this. -- Reuti * http://www.heise.de/foren/S-Re-Erfahrungen-Tandberg-RDX-QuikStor/ forum-7273/msg-16513128/read/ > Cheers, > -- > Kilian > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From gerry.creager at tamu.edu Fri Feb 12 08:52:03 2010 From: gerry.creager at tamu.edu (Gerry Creager) Date: Fri, 12 Feb 2010 10:52:03 -0600 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: <6B51FF25-AFF1-42F3-9829-1E127D6CFD2E@staff.uni-marburg.de> References: <6B51FF25-AFF1-42F3-9829-1E127D6CFD2E@staff.uni-marburg.de> Message-ID: <4B758733.3040101@tamu.edu> We discussed this in our HPC group meeting yesterday. I've long been dissatisfied with PERC controllers, but this is now a show-stopper for me. I might order Dell, but not the PERC, ever again. Who do they think they are? NetApp? gerry Reuti wrote: > Hi, > > Am 12.02.2010 um 09:51 schrieb Kilian CAVALOTTI: > >> On Fri, Feb 12, 2010 at 9:06 AM, Rahul Nabar wrote: >>> On Fri, Feb 12, 2010 at 1:41 AM, Rahul Nabar wrote: >>>> I came across this very interesting thread on a related mailing list >>>> that I thought would be quite relevant to those of us using Dell >>>> hardware. Apparently Dell has started hardware-blocking hard-drives >>>> that are not "Dell certified". >> >> http://www.channelregister.co.uk/2010/02/10/dell_perc_11th_gen_qualified_hdds_only/ >> > > are they just blocking non-qualified drives, but you could still install > qualified ones bought not from Dell? And: will their qualified ones work > on other controllers? > > This sounds like ProStor prevents usage of other disks in RDX drives' > cartridges (AFAIK they are doing this by ATA passwords*), and you are > limited to their cartridges. And in emergency case you can't use the > disks with other controllers due to this. > > -- Reuti > > * > http://www.heise.de/foren/S-Re-Erfahrungen-Tandberg-RDX-QuikStor/forum-7273/msg-16513128/read/ > > > >> Cheers, >> -- >> Kilian From rpnabar at gmail.com Fri Feb 12 09:36:14 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 12 Feb 2010 11:36:14 -0600 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: <4B758733.3040101@tamu.edu> References: <6B51FF25-AFF1-42F3-9829-1E127D6CFD2E@staff.uni-marburg.de> <4B758733.3040101@tamu.edu> Message-ID: On Fri, Feb 12, 2010 at 10:52 AM, Gerry Creager wrote: > We discussed this in our HPC group meeting yesterday. I've long been > dissatisfied with PERC controllers, but this is now a show-stopper for me. I > might order Dell, but not the PERC, ever again. ?Who do they think they are? > NetApp? Don't you *have* to use the PERC? Will the HDDs talk with other controllers? Can I hear some more about your PERC dissatisfaction, Gerry? I just bought a few and might be better knowing what I'm up against! -- Rahul From john.hearns at mclaren.com Fri Feb 12 09:56:56 2010 From: john.hearns at mclaren.com (Hearns, John) Date: Fri, 12 Feb 2010 17:56:56 -0000 Subject: [Beowulf] Register survey on HPC Message-ID: <68A57CCFD4005646957BD2D18E60667B0F4FDB76@milexchmb1.mil.tagmclarengroup.com> http://www.theregister.co.uk/2010/02/12/hpc_for_the_masses/ It's as close as the IT industry will ever get to "2 Fast, 2 Furious" - gangs of highly technical experts pushing their custom-built computers to the limit with an aim to win that ultimate prize, a place in the world supercomputing rankings. That's me that is! I'm suddenly young and happening. Yayyyy..... Challenge yer all to a race - 11pm tonight, car park behind the Mall. How many Tflops you got under the hood then? The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. From lindahl at pbm.com Fri Feb 12 10:12:24 2010 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 12 Feb 2010 10:12:24 -0800 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: <4B758733.3040101@tamu.edu> References: <6B51FF25-AFF1-42F3-9829-1E127D6CFD2E@staff.uni-marburg.de> <4B758733.3040101@tamu.edu> Message-ID: <20100212181224.GC5965@bx9.net> On Fri, Feb 12, 2010 at 10:52:03AM -0600, Gerry Creager wrote: > Who do they think they are? NetApp? That's what you get for buying commodity stuff that's not marketed to you. If you have a business-critical database that needs a lot of 9's of uptime, and to never, ever lose data, it makes sense to only buy disks which are the exact hardware/firmware that's been tested with that controller. Note that this is what Dell considers a high end controller. If you're an HPC shop, you don't want to pay for that kind of reliability. I'm having a similar fun situation with HP. They don't sell any MLC SSDs, because they aren't reliable enough to run a traditional database that gets lots of writes. Well, I know I don't write that much. Also, their support organization no longer understands the concept of parts that wear out. I managed to talk them into selling me caddies for MLC SSDs, but only because it was the end of their fiscal year. And don't get me started about the fun of getting support when I'm using their "high end" P410 RAID card as a JBOD... -- greg From kuenching at gmail.com Thu Feb 11 10:43:12 2010 From: kuenching at gmail.com (Tsz Kuen Ching) Date: Thu, 11 Feb 2010 13:43:12 -0500 Subject: [Beowulf] PVM 3.4.5-12 terminates when adding Host on Ubuntu 9.10 Message-ID: Whenever I attempt to add a host in PVM it ends up terminating the process in the master program. The process does run in the slave node, however because the PVM terminates I do not get access to the node. I'm currently using Ubuntu 9.10, and I used apt-get to install pvm ( pvmlib, pvmdev, pvm). Thus $PVM_ROOT is set automatically, and so is $PVM_ARCH As for the other variables, I have not looked for them. I can ssh into the the slave without the need of a password. Any Ideas or suggestions? This is what happens: user at laptop> pvm pvm> add slave-slave add slave-slave Terminated user at laptop> ... The logs are as followed: Laptop log --- [t80040000] 02/11 10:23:32 laptop (127.0.1.1:55884) LINUX 3.4.5 [t80040000] 02/11 10:23:32 ready Thu Feb 11 10:23:32 2010 [t80040000] 02/11 10:23:32 netoutput() sendto: errno=22 [t80040000] 02/11 10:23:32 em=0x2c24f0 [t80040000] 02/11 10:23:32 [49/?][6e/?][76/?][61/?][6c/?][69/?][64/?][20/?][61/?][72/?] [t80040000] 02/11 10:23:32 netoutput() sendto: Invalid argument [t80040000] 02/11 10:23:32 pvmbailout(0) slave-log --- [t80080000] 02/11 10:23:25 slave-slave (xxx.x.x.xxx:57344) LINUX64 3.4.5 [t80080000] 02/11 10:23:25 ready Thu Feb 11 10:23:25 2010 [t80080000] 02/11 10:28:26 work() run = STARTUP, timed out waiting for master [t80080000] 02/11 10:28:26 pvmbailout(0) -------------- next part -------------- An HTML attachment was scrubbed... URL: From himikehawaii1 at yahoo.com Fri Feb 12 10:51:28 2010 From: himikehawaii1 at yahoo.com (MDG) Date: Fri, 12 Feb 2010 10:51:28 -0800 (PST) Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: Message-ID: <708266.89474.qm@web54104.mail.re2.yahoo.com> Well here is a small taste of problems with Dell and PERC controllers, I have tried to use Dell Perc Controllers all the way back to Perc 2 Quad-Channels, I needed new batteries for the back up memory I ordered using Dell's own part number they sent the wrong part, I returned, I sent them the ACTUAL BATTERY, THEY SENT ME ANOTHER BATTERY AGAIN WRONG, this after dozens of phone calls.? Will I ever buy another Dell product? Not on your life, when they cannot even get a battery right when they have the actual battery in hand that is very bad service! They may make a good product till you need service then forget it, and you can also forget stripping out working parts from a dead server for use as they match all products to the original computer case's serial number, so if you want to keep legacy systems running but need to make a change such as a new CPU box then Dell not only has hard-code/password protected the use of non-Dell Hard drive but also refuses to service Dell equipment, such as sale you the back-up battery, unless you happen to have the old serial case number and you better be the original buyer otherwise their database says you are not the customer and another rejection.? Assuming they can ever find it even when they have the old one in hand so it is impossible not to have the correct part number the tag is on it, as well as they have the part in their hand, as well as when they pull the alleged correct part from stock, after weeks of calls, the replacement they send the wrong one along with the old one and they are obviously incorrect as not even a size, or connector that will plug into their own RAID disk controller, made by Adaptec, who I called after trying to get the battery, Adpatec, who generally I find excellent, says it was made solely for Dell so they cannot help, turns out also made for HP but with some firmware changes I lster found out.? Note, Dell will say it is a Perc but it may be an AMI or and AAdaptec, and parts are totally differant yet they are both called a Perc X model, so you may not even be buying what you think as well as their performance test will be on the higher preformasnce unit but you end up with the lower end part.? You can forget their "White Papers" performance statements as the part tested may not be what you bought. So my comment, forget buying Dell computers if you expect them to be cooperative.? Also many of the so called non-certified disks, or other parts are really from 3rd parts with Dell's label on the case if you take it out of the hard disk cage for example, are the exact same disk such as the top of the line Seagate large storage drives if you remove them from the Dell case.? So why would a Seagate XYZ (fake serial or model number) from Seagate of same size and model number not work with the exact same one Dell sales at a huge mark up.? Both are high performance drives, 10-15,000 high speed iSCSI or Sata, the same model number and there is no way Dell tests ever disk beyond basic test.? Dells original philosophy was buy "off the shelf part and assembly them for a low cost and good performance and serice, Michael Dell started this in his dorm room at college and grew it into a major computer company, then left but tried to come back to "fix" Dell's problems, I guess locking you to them for over-priced fixes, assuming the customer service can even locate the fix, is their idea of how to fix, I think I will just order my own parts from MOBO's to controllers at least then I know what is in them and where to get my parts at often a lower price and better performance"? It seems their philosophy since has been lets over-charge, under-service and now make it where our customers have no choice but to continue with us.? Basically starts to sound like Computer addiction, get a customer and make sure they have no choice but to buy there computer fix from Dell (assuming they can find the part) at an inflated price and service so poor? they cannot read their own part numbers.? So I will assume that if you ever have a drive problem you will end up as I did in an endless loop of calls, wrong parts, etc.? Good luck. Mike --- On Fri, 2/12/10, Rahul Nabar wrote: From: Rahul Nabar Subject: Re: [Beowulf] Re: Third-party drives not permitted on new Dell servers? To: "Gerry Creager" Cc: "Beowulf ML" Date: Friday, February 12, 2010, 7:36 AM On Fri, Feb 12, 2010 at 10:52 AM, Gerry Creager wrote: > We discussed this in our HPC group meeting yesterday. I've long been > dissatisfied with PERC controllers, but this is now a show-stopper for me. I > might order Dell, but not the PERC, ever again. ?Who do they think they are? > NetApp? Don't you *have* to use the PERC? Will the HDDs talk with other controllers? Can I hear some more about your PERC dissatisfaction, Gerry? I just bought a few and might be better knowing what I'm up against! -- Rahul _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From douglas.guptill at dal.ca Fri Feb 12 17:49:37 2010 From: douglas.guptill at dal.ca (Douglas Guptill) Date: Fri, 12 Feb 2010 21:49:37 -0400 Subject: [Beowulf] Register survey on HPC In-Reply-To: <68A57CCFD4005646957BD2D18E60667B0F4FDB76@milexchmb1.mil.tagmclarengroup.com> References: <68A57CCFD4005646957BD2D18E60667B0F4FDB76@milexchmb1.mil.tagmclarengroup.com> Message-ID: <20100213014937.GB15187@sopalepc> On Fri, Feb 12, 2010 at 05:56:56PM -0000, Hearns, John wrote: > http://www.theregister.co.uk/2010/02/12/hpc_for_the_masses/ There is a sentence in there which, taken out of context and suitably edited, really tickles me: "Microsoft's Windows is ... seen as potentially useful by almost half of respondents..." Douglas. From gerry.creager at tamu.edu Sat Feb 13 08:09:48 2010 From: gerry.creager at tamu.edu (Gerry Creager) Date: Sat, 13 Feb 2010 10:09:48 -0600 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: <6B51FF25-AFF1-42F3-9829-1E127D6CFD2E@staff.uni-marburg.de> <4B758733.3040101@tamu.edu> Message-ID: <4B76CECC.3020002@tamu.edu> Rahul Nabar wrote: > On Fri, Feb 12, 2010 at 10:52 AM, Gerry Creager wrote: >> We discussed this in our HPC group meeting yesterday. I've long been >> dissatisfied with PERC controllers, but this is now a show-stopper for me. I >> might order Dell, but not the PERC, ever again. Who do they think they are? >> NetApp? > > > Don't you *have* to use the PERC? Will the HDDs talk with other controllers? > > Can I hear some more about your PERC dissatisfaction, Gerry? I just > bought a few and might be better knowing what I'm up against! You don't have to use PERC but you'll get the full guilt trip if you try ordering without them. At least in the past, PERC was a firmware-tweaked LSI RAID controller. I've learned how the LSI RAID controllers work, and I'm very happy with them: They are, among other things, one of a small set of real hardware RAID controllers. My second choice is 3Ware. PERCs have some tuning that makes sense to the Dell guys who did the tweaks, but which don't tend to make a lot of sense to me. I can't create multiple LUNs on a PERC array (or at least haven't figured out how, yet) and I've had problems with mounting root file systems on them. They don't gracefully give up one or two drives for boot access in, say, a RAID 0/1 config. Having said this, I'm not the most facile when configuring the PERC controllers are discussed, partly because I was scarred while young. Rather than learning about them, I just order around 'em now. gerry From hahn at mcmaster.ca Sat Feb 13 11:05:55 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat, 13 Feb 2010 14:05:55 -0500 (EST) Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References: Message-ID: > hardware. Apparently Dell has started hardware-blocking hard-drives > that are not "Dell certified". I believe this should be met head-on: if a controller claiming to support SATA does not permit the use of any conforming SATA disk, then the controller is not conforming. we need to lobby the standards organizations to consistently apply their trademark. the situation is easy to understand: this is just one step beyond "warranty void if opened". but it's a step too far, since it's not merely a warranty situation (where the device will continue to work even if opened), but rather standard non-conformity for lock-in. I think the issue of standards needs to be talked about more in the industry press, actually. there's too much damaging fuzziness about "defacto" standards, what interop really means to the customer, and how you should run away when a vendor lists "supported" models. defacto standards means "not a standard, but the way X does it, and everyone thinks X is unchallengable." this completely negates the actual meaning of standard, which is that if you have two conforming devices, they will interoperate. a defacto standard means nothing more than "has worked with X in the tests we've done". no promise for any non-tested configuration. no promises if X decides to change. interop is what the customer desires: instead of an N^2 problem of deciding whether option A is compatible with option B, each option merely has to conform to the standard. 2N tests less work than N^2 (for sufficiently large N ;) when a vendor lists supported configs, they are implicitly saying that they have no faith in standard conformity, and are instead merely going to mark out a few places in the N^2 grid where they will take the blame for failure. this is profoundly anti-customer. we expect standards in most places (just think: tires, roads, gasoline - imagine if Ford only "supported" driving with Ford-brand gas, on "OEM" tires, on Ford-approved roads where the only other cars are Fords...) I don't know whether having standards organizations police their brands would be good enough. obviously, there is some conflict of interest, since most of the support for, say, SATA-IO is let by vendors (incl Dell), and is probably less arms-length than INCITS T10/13 committees. but I think this could be dealt with as a criminal matter as well, since claiming standard conformance is clearly a product-liability issue. my personal experience is with HP products: nearly all HP disk controllers refuse to work with products not bought through HP channels. there is some escape at the lowest end where HP doesn't bother to break any of the chipset-integrated controllers (afaik). to me, this difference indicts. as MAKERs say, if you can't open it, you don't own it. I think as customers we should demand standard-conformity, even though vendors have often gotten away with it in the past. the same vendors _do_ actually support standards- based interop for some products (ethernet, power cables, vga/dvi/hdmi, pci/pcie, usually even dimms). if a product doesn't conform to its specs, then it's broken. how many class-action suits would it take to get vendors to recognize this? From reuti at staff.uni-marburg.de Sun Feb 14 15:18:00 2010 From: reuti at staff.uni-marburg.de (Reuti) Date: Mon, 15 Feb 2010 00:18:00 +0100 Subject: [Beowulf] PVM 3.4.5-12 terminates when adding Host on Ubuntu 9.10 In-Reply-To: References: Message-ID: <0D2F92CE-AAEB-4D9E-9AC6-F591C9AE1773@staff.uni-marburg.de> Am 11.02.2010 um 19:43 schrieb Tsz Kuen Ching: > Whenever I attempt to add a host in PVM it ends up terminating the > process in the master program. The process does run in the slave > node, however because the PVM terminates I do not get access to the > node. > > I'm currently using Ubuntu 9.10, and I used apt-get to install pvm > ( pvmlib, pvmdev, pvm). > Thus $PVM_ROOT is set automatically, and so is $PVM_ARCH > As for the other variables, I have not looked for them. > > I can ssh into the the slave without the need of a password. Do you have any firwall on the machines which blocks certain ports? -- Reuti > > Any Ideas or suggestions? > > This is what happens: > > user at laptop> pvm > pvm> add slave-slave > add slave-slave > Terminated > user at laptop> ... > > The logs are as followed: > > Laptop log > --- > [t80040000] 02/11 10:23:32 laptop (127.0.1.1:55884) LINUX 3.4.5 > [t80040000] 02/11 10:23:32 ready Thu Feb 11 10:23:32 2010 > [t80040000] 02/11 10:23:32 netoutput() sendto: errno=22 > [t80040000] 02/11 10:23:32 em=0x2c24f0 > [t80040000] 02/11 10:23:32 [49/?][6e/?][76/?][61/?][6c/?][69/?][64/ > ?][20/?][61/?][72/?] > [t80040000] 02/11 10:23:32 netoutput() sendto: Invalid argument > [t80040000] 02/11 10:23:32 pvmbailout(0) > > slave-log > --- > [t80080000] 02/11 10:23:25 slave-slave (xxx.x.x.xxx:57344) LINUX64 > 3.4.5 > [t80080000] 02/11 10:23:25 ready Thu Feb 11 10:23:25 2010 > [t80080000] 02/11 10:28:26 work() run = STARTUP, timed out waiting > for master > [t80080000] 02/11 10:28:26 pvmbailout(0) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From kus at free.net Mon Feb 15 09:32:40 2010 From: kus at free.net (Mikhail Kuzminsky) Date: Mon, 15 Feb 2010 20:32:40 +0300 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: Message-ID: And what is known about other vendors (Sun, HP, IBM) x86 standard 1u/2u servers ? SGI hardware even in their "big" UNIX SMPs like Power Challenge allowed using of 3rd party drives - although they were not supported officially. Mikhail Kuzminsky Computer Assistance to Chemical Research Center Zelinsky Institute of Organic Chemistry RAS Moscow From rpnabar at gmail.com Mon Feb 15 09:51:28 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Mon, 15 Feb 2010 11:51:28 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

Message-ID: This was the response from Dell, I especially like the analogy: [snip] >There are a number of benefits for using Dell qualified drives in particular ensuring a ***positive experience*** and protecting ***our data***. >While SAS and SATA are industry standards there are differences which occur in implementation. An analogy is that English is spoken in the UK, US >and Australia. While the language is generally the same, there are subtle differences in word usage which can lead to confusion. This exists in >storage subsystems as well. As these subsystems become more capable, faster and more complex, these differences in implementation can have >greater impact. [snip] I added the emphasis. I am in love Dell-disks that get me "the positive experience". :) -- Rahul From deadline at eadline.org Mon Feb 15 10:56:38 2010 From: deadline at eadline.org (Douglas Eadline) Date: Mon, 15 Feb 2010 13:56:38 -0500 (EST) Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

Message-ID: <39109.192.168.1.1.1266260198.squirrel@mail.eadline.org> There are two "ISO standard" English words I have for this kind of marketing response. -- Doug > This was the response from Dell, I especially like the analogy: > > [snip] >>There are a number of benefits for using Dell qualified drives in >> particular ensuring a ***positive experience*** and protecting ***our >> data***. >>While SAS and SATA are industry standards there are differences which >> occur in implementation. An analogy is that English is spoken in the UK, >> US >and Australia. While the language is generally the same, there are >> subtle differences in word usage which can lead to confusion. This exists >> in >storage subsystems as well. As these subsystems become more capable, >> faster and more complex, these differences in implementation can have >> >greater impact. > [snip] > > I added the emphasis. I am in love Dell-disks that get me "the > positive experience". :) > > -- > Rahul > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug From rpnabar at gmail.com Mon Feb 15 13:56:28 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Mon, 15 Feb 2010 15:56:28 -0600 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance Message-ID: Are there any generic "scheduler visualization" tools out there? Sometimes I feel it'd be nice if I had a way to find out how my sceduling was performing. i.e. blocks of empty procs; how fragmented the job assignment was; large / small job split, utilization efficiency, backfill status; etc. I use openpbs (torque) + maui and it does have some text mode accounting reports. But sometimes they are hard to digest and a birds eye view might be easier via a visualization. I haven't found any toolkits yet. Of course, I could parse and plot myself with a bunch of sed / awk / gnuplot but I don't want to unnecessarily reinvent the wheel if I can avoid it. Also, I remembered seeing some cool visualizations (quite animated at that) at one of the supercomputing agencies a while ago but just can't seem to find which one it was now that I need it. Admittedly, some of the visualizations can sway more towards the "coolness" factor than actual insights but still it's worth a shot. Any pointers or scripts other Beowulfers might have are greatly appreciated. -- Rahul From hearnsj at googlemail.com Mon Feb 15 15:02:01 2010 From: hearnsj at googlemail.com (John Hearns) Date: Mon, 15 Feb 2010 23:02:01 +0000 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: References: Message-ID: <9f8092cc1002151502i7f17c32dk9db4aaa14619e2c@mail.gmail.com> That's a good question. PBS are promoting PBS Analytics http://www.pbsgridworks.com/Product.aspx?id=7 and SGE has Arco http://wikis.sun.com/display/GridEngine/Installing+ARCo If I have it right, both of these work by accumulating information about completed jobs. I may be wrong, and would like to see Analytics working (hint - I have tried). Your question about displaying blocks of empty nodes is interesting. I guess in PBS you would run 'pbsnodes' and look for the free ones! From rpnabar at gmail.com Mon Feb 15 15:07:36 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Mon, 15 Feb 2010 17:07:36 -0600 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: <9f8092cc1002151502i7f17c32dk9db4aaa14619e2c@mail.gmail.com> References: <9f8092cc1002151502i7f17c32dk9db4aaa14619e2c@mail.gmail.com> Message-ID: On Mon, Feb 15, 2010 at 5:02 PM, John Hearns wrote: > That's a good question. > > PBS are promoting PBS Analytics http://www.pbsgridworks.com/Product.aspx?id=7 > and SGE has Arco http://wikis.sun.com/display/GridEngine/Installing+ARCo > If I have it right, both of these work by accumulating information > about completed jobs. > I may be wrong, and would like to see Analytics working (hint - I have tried). Thanks John! THose sound interesting. > Your question about displaying blocks of empty nodes is interesting. > I guess in PBS you would run 'pbsnodes' and look for the free ones! pbsnodes does have more than sufficient info. It's only the scripting around it that needs to be in place. -- Rahul From rpnabar at gmail.com Mon Feb 15 15:10:57 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Mon, 15 Feb 2010 17:10:57 -0600 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: <9f8092cc1002151502i7f17c32dk9db4aaa14619e2c@mail.gmail.com> References: <9f8092cc1002151502i7f17c32dk9db4aaa14619e2c@mail.gmail.com> Message-ID: On Mon, Feb 15, 2010 at 5:02 PM, John Hearns wrote: > That's a good question. > > PBS are promoting PBS Analytics http://www.pbsgridworks.com/Product.aspx?id=7 > and SGE has Arco http://wikis.sun.com/display/GridEngine/Installing+ARCo Too bad they both seem "paid". I was hoping to find something in the "free" domain. I doubt I can justify the $$ for a scheduler-visualizer especially if (like most things in the scheduling universe) the licenses tend to be stubbornly per-core. The per-core licensing has been the single biggest factor that prevents me from even evaluating out any of the "paid" schedulers. -- Rahul From beckerjes at mail.nih.gov Mon Feb 15 15:43:46 2010 From: beckerjes at mail.nih.gov (Jesse Becker) Date: Mon, 15 Feb 2010 18:43:46 -0500 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: References: <9f8092cc1002151502i7f17c32dk9db4aaa14619e2c@mail.gmail.com> Message-ID: <20100215234346.GJ12997@mail.nih.gov> On Mon, Feb 15, 2010 at 06:10:57PM -0500, Rahul Nabar wrote: >On Mon, Feb 15, 2010 at 5:02 PM, John Hearns wrote: >> That's a good question. >> >> PBS are promoting PBS Analytics http://www.pbsgridworks.com/Product.aspx?id=7 >> and SGE has Arco http://wikis.sun.com/display/GridEngine/Installing+ARCo > >Too bad they both seem "paid". I was hoping to find something in the >"free" domain. I doubt I can justify the $$ for a scheduler-visualizer ARCo can be be had for zero $ cost. It will, however, cost something in time and effort to configure, and requires a moderately beefy box on which to run. -- Jesse Becker NHGRI Linux support (Digicon Contractor) From landman at scalableinformatics.com Mon Feb 15 17:41:08 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 15 Feb 2010 20:41:08 -0500 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

Message-ID: <4B79F7B4.9020808@scalableinformatics.com> Rahul Nabar wrote: > This was the response from Dell, I especially like the analogy: > > [snip] >> There are a number of benefits for using Dell qualified drives in >> particular ensuring a ***positive experience*** and protecting >> ***our data***. While SAS and SATA are industry standards there are >> differences which occur in implementation. An analogy is that >> English is spoken in the UK, US >and Australia. While the language >> is generally the same, there are subtle differences in word usage >> which can lead to confusion. This exists in >storage subsystems as >> well. As these subsystems become more capable, faster and more >> complex, these differences in implementation can have >greater >> impact. > [snip] > > I added the emphasis. I am in love Dell-disks that get me "the > positive experience". :) Please indulge my taking a contrarian view based upon the products we sell/support/ship. I see significant derision heaped upon these decisions, which are called "marketing decisions" by Dell and others. It couldn't be possible, in most commenter's minds that they might actually have a point ... ... I am not defending Dell's language (I wouldn't use this or allow this to be used in our outgoing marketing/customer communications). Let me share an anecdote. I have elided the disk manufacturers name to protect the guilty. I will not give hints as to whom they are, though some may be able to guess ... I will not confirm. We ship units with 2TB (and 1.5TB) drives among others. We burn in and test these drives. We work very hard to insure compatibility, and to make sure that when users get the units, that the things work. We aren't perfect, and we do occasionally mess up. When we do, we own up to it and fix it right away. Its a different style of support. The buck stops with us. Period. So along comes a drive manufacturer, with some nice looking specs on 2TB (and some 1.5 and 1 TB) drives. They look great on paper. We get them into our labs, and play with them, and they seem to run really well. Occasional hiccup on building RAIDs, but you get that in large batches of drives. So now they are out in the field for months, under various loads. Some in our DeltaV's, some in our JackRabbits. The units in the DeltaV's seem to have a ridiculously high failure rate. This is not something we see in the lab. Even with constant stress, horrific sustained workloads ... they don't fail in ou testing. But get these same drives out into the users hands ... and whammo. Slightly different drives in our JackRabbit units, with a variety of RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc. This is not something we see in the lab in our testing. We try emulating their environments, and we can't generate the failures. Worse, we get the drives back after exchanging them at our cost with new replacements, only to find out, upon running diagnostics, that the drives haven't failed according to the test tool. This failing drive vendor refuses to acknowledge firmware bugs, effectively refuses to release patches/fixes. Our other main drive vendor, while not currently with a 2TB drive unit, doesn't have anything like this manufacturers failure rate in the field. When drives die in the field, they really ... really die in the field. And they do fix their firmware. So we are now moving off this failing manufacturer (its a shame as they used to produce quality parts for RAID several years ago), and we are evaluating replacements for them. Firmware updates are a critical aspect of a replacement. If the vendor won't allow for a firmware update, we won't use them. So ... this anecdote complete, if someone called me up and said "Joe, I really want you to build us an siCluster for our storage, and I want you to use [insert failing manufacturer's name here] drives because we like them", what do you think my reaction should be? Should it be "sure, no problem, whatever you want" ... with the subsequent problems and pain, for which we would be blamed ... or should it be "no, these drives don't work well ... deep and painful experience at customer sites shows that they have bugs in their firmware which are problematic for RAID users ... we are attempting to get them to give us the updated firmware to help the existing users, but we would not consider shipping more units with these drives due to their issues." Is that latter answer, which is the correct answer, a marketing answer? Yeah, SATA and SAS are standards. Yeah, in theory, they all do work together. In reality, they really don't, and you have to test. Everyone does some aspect slightly different and usually in software, so they can fix it if they messed up. If their is a RAID timeout bug due to head settling timing, yeah, this is fixable. But if the disk manufacturer doesn't want to fix it ... its your companies name on the outside of that box. You are going to take the heat for their problems. Note: This isn't just SATA/SAS drives, there are a whole mess of things that *should* work well together, but do not. We had some exciting times in the recent past with SAS backplanes that refused to work with SAS RAID cards. We've had some excitment from 10GbE cards, IB cards, etc. that we shouldn't have had. I can't and won't sanction their tone to you ... they should have explained things correctly. Given that PERC are rebadged LSI, yeah, I know perfectly well a whole mess of drives that *do not* work correctly with them. So please don't take Dell to task for trying to help you avoid making what they consider a bad decision on specific components. There could be a marketing aspect to it, but support is a cost, and they want to minimize costs. Look at failure rates, and toss the suppliers who have very high ones. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From hahn at mcmaster.ca Mon Feb 15 17:44:06 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 15 Feb 2010 20:44:06 -0500 (EST) Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References: Message-ID: > And what is known about other vendors (Sun, HP, IBM) x86 standard 1u/2u > servers ? as I mentioned, vendors don't seem to bother rewriting third-party disk controller bioses to enforce brand lock-in. for instance, most 1U's with 1-4 SATA will drive those bays from the AMD/Intel chipset controller, and so not be crippled. supporting SAS or hardware raid5/6 in those slots would also be a tip-off that a "higher end" controller is in use (which means "less functional" and "non-conforming" in this context.) > SGI hardware even in their "big" UNIX SMPs like Power Challenge allowed using > of 3rd party drives - although they were not supported officially. well, those would be SCSI and FC drives, where vendors seem to be more timid in doing nonconformity-for-lock-in. I guess they figure that if you're going to pay for gold-plated drives, you'll probably buy them from the original vendor. the lock-in is focused on more commoditized drives, where the vendor typically charges a >100% premium to relabel a generic drive from wd/seagate/etc. From rpnabar at gmail.com Mon Feb 15 18:12:08 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Mon, 15 Feb 2010 20:12:08 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <4B79F7B4.9020808@scalableinformatics.com> References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: On Mon, Feb 15, 2010 at 7:41 PM, Joe Landman wrote: > Please indulge my taking a contrarian view based upon the products we > sell/support/ship. > > I can't and won't sanction their tone to you ... they should have explained > things correctly. ?Given that PERC are rebadged LSI, yeah, I know perfectly > well a whole mess of drives that *do not* work correctly with them. > > So please don't take Dell to task for trying to help you avoid making what > they consider a bad decision on specific components. ?There could be a > marketing aspect to it, but support is a cost, and they want to minimize > costs. ?Look at failure rates, and toss the suppliers who have very high > ones. To me the test is: Is there a price-markup on the specific part recommended. If a vendor just said "Drive X is compatible and tested; please use it" and then I peg Drive X against competing drives and see a significant price markup without commensurate observable statistics improvement then I smell a rat. I feel further that a Vendor could make itself more neutral in this exercise by just naming one or more compatible, validated drive-models rather than trying to sell those themselves after re-branding. That creates an obvious conflict of interest. It makes it difficult to deconvolute monopoly-pricing from a genuine desire to promote reliability. I'm not sure how much of a price markup there is on the approved Dell drives. -- Rahul From landman at scalableinformatics.com Mon Feb 15 18:30:04 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 15 Feb 2010 21:30:04 -0500 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: <4B7A032C.2080207@scalableinformatics.com> Rahul Nabar wrote: > On Mon, Feb 15, 2010 at 7:41 PM, Joe Landman > wrote: >> Please indulge my taking a contrarian view based upon the products we >> sell/support/ship. >> >> I can't and won't sanction their tone to you ... they should have explained >> things correctly. Given that PERC are rebadged LSI, yeah, I know perfectly >> well a whole mess of drives that *do not* work correctly with them. >> >> So please don't take Dell to task for trying to help you avoid making what >> they consider a bad decision on specific components. There could be a >> marketing aspect to it, but support is a cost, and they want to minimize >> costs. Look at failure rates, and toss the suppliers who have very high >> ones. > > To me the test is: Is there a price-markup on the specific part > recommended. If a vendor just said "Drive X is compatible and tested; I can't speak to Dell's comments. I can speak to ours. If a customer asks us if we have tested a drive, we look it up and see if we have. If they want to try it, we offer them help. We have an interest in making sure that we work well with the drives in our units. This is part of the reason we make various decisions on configurations. Drive markup isn't a factor in configurations. Stuff working correctly is. Suppose Dell buys 50M drives per year. Shaving $1 per drive will net them $50M more to their bottom line. Which, in the larger scheme of things, doesn't do much to their bottom line. Far less than 1% motion on their P&L. > please use it" and then I peg Drive X against competing drives and see > a significant price markup without commensurate observable statistics > improvement then I smell a rat. I feel further that a Vendor could Hmmm.... if it won't impact their pricing that much to begin with, even if they could get it to a 1% cost of good sold reduction, it gets very hard to make an argument that they will perform these actions for economic reasons that simply won't have a significant economic impact upon their bottom line. OTOH, if you have manufacturer X with drive X.X with a known failure rate 10x that of manufacturer Y with a drive Y.Y, and your liabilities column on your balance sheet is drastically negatively impacted by support issues ... yeah, you are going to do all you can to minimize this liability side. You can't impact the drive costs much, but by careful selection of drive units you sure can reduce your support liabilities. > make itself more neutral in this exercise by just naming one or more > compatible, validated drive-models rather than trying to sell those > themselves after re-branding. That creates an obvious conflict of They do make margin on drives. If you object, you can always buy the unit bare, and perform your own validation. Which means you buy your own test drives, and spend your own time and effort to do this. Which means spending your own money to do this. Support costs money, and they are seeking to keep those costs under control. > interest. It makes it difficult to deconvolute monopoly-pricing from a > genuine desire to promote reliability. Hmmm .... Dell only has a monopoly if you let them. If you want to buy servers from other companies, by all means, buy them from other companies. Many universities I am aware of have signed agreements with Dell, HP, Sun, etc to buy exclusively from them. Whether or not these are legal in the face of universities requirements on maximizing value on their purchase is a completely separate discussion, one you ought to have with your purchasing departments if you feel that you are not getting the value you need from their actions. My old research group used to write sole source memos for every purchase, so that we could get what we wanted, and not what our purchasing department wanted to buy. > > I'm not sure how much of a price markup there is on the approved Dell drives. > You should be able to calculate it by configuring units with various numbers of drives. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From hahn at mcmaster.ca Mon Feb 15 23:08:32 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 16 Feb 2010 02:08:32 -0500 (EST) Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <4B7A032C.2080207@scalableinformatics.com> References:

<4B79F7B4.9020808@scalableinformatics.com> <4B7A032C.2080207@scalableinformatics.com> Message-ID: > Drive markup isn't a factor in configurations. Stuff working correctly is. > Suppose Dell buys 50M drives per year. Shaving $1 per drive will net them > $50M more to their bottom line. Which, in the larger scheme of things, > doesn't do much to their bottom line. Far less than 1% motion on their P&L. "working correctly" is, as you point out, difficult to prove (proving a negative versus "so far so good".) the real issue here is several-fold: 1. the markup is vastly more than $1/drive. I don't know Dell prices as well as another vendor where the markup is O(300%) (es.2 750G 408 $Cdn after public-sector discount, versus $135 from newegg.ca. ironically Seagate's warranty on this is 5 years; the vendor's is 1 year...) 2. big-name vendors are incredibly slow on their feet: usually lagging at least one whole disk generation. I understand that the bigger the supply chain, the more momentum and reluctance to stock new products. and testing takes time, but this is a huge disadvantage to customers - especially in a domain like storage which is on as steep price-performance curve as GPUs. 3. the main issue is still whether it's defensible to cripple a controller to refuse to interact with products that don't come through the vendor's supply chain (even if identical in make, model, fw rev) (yes, vendor-specific fw is a way to wiggle out of this...) > balance sheet is drastically negatively impacted by support issues ... yeah, > you are going to do all you can to minimize this liability side. You can't > impact the drive costs much, but by careful selection of drive units you sure > can reduce your support liabilities. I'm curious: you imply that vendors have no recourse to force the _disk_ vendors to supply parts which work right (as defined by standard). is that really true? I'd be surprised if Dell doesn't get pretty emphatic cooperation from their disk vendor(s). >> make itself more neutral in this exercise by just naming one or more >> compatible, validated drive-models rather than trying to sell those >> themselves after re-branding. That creates an obvious conflict of > > They do make margin on drives. If you object, you can always buy the unit > bare, and perform your own validation. Which means you buy your own test > drives, and spend your own time and effort to do this. Which means spending > your own money to do this. the fact is that disks are incredibly cheap and getting moreso. sure, replacing a bunch of them is noticable, but as a fraction of the the systems they support, they're a pittance. even as a fraction of the storage subsystem, they're very small (reading off the same public-sector price-list, I see a 10 TB SATA-based storage system for over $30k - that's not a premium product from this vendor and I can't see any way that the disks would cost more than $5k.) I think the real paradigm shift is that disks have become a consumable which you want to be able to replace in 1-2 product generations (2-3 years). along with this, disks just aren't that important, individually - even something _huge_ like seagate's firmware problem, for instance, only drove up random failures, no? but it also begs the question: what's really so different about disks? is the disk protocol really that much more subtle and prone to problems than, say, PCI-E? > Support costs money, and they are seeking to keep those costs under control. sure, the free market is all about finding ways to make money; that doesn't imply that customers, the market and the society as a whole has to permit every possible way ;) >> interest. It makes it difficult to deconvolute monopoly-pricing from a >> genuine desire to promote reliability. > > Hmmm .... Dell only has a monopoly if you let them. If you want to buy > servers from other companies, by all means, buy them from other companies. few markets are free; any individual purchasing decision is almost certainly under a massive market asymmetry. no project/consortium/university can actually require change in policy from an entity as large as Dell/HP/IBM/etc. > Many universities I am aware of have signed agreements with Dell, HP, Sun, > etc to buy exclusively from them. Whether or not these are legal in the face > of universities requirements on maximizing value on their purchase is a > completely separate discussion, one you ought to have with your purchasing > departments if you feel that you are not getting the value you need from hah. purchasing departments are interested primarily in survival like any organism. unfortunately, they tend not to have natural predators. but funding organizations often effectively require use of BigCo; even if not, there's always the no-one-got-fired-for-buying-IBM sort of institutional conservativism. > their actions. My old research group used to write sole source memos for > every purchase, so that we could get what we wanted, and not what our > purchasing department wanted to buy. right, but did you succeed in changing how BigCo handles its supply chain? >> I'm not sure how much of a price markup there is on the approved Dell >> drives. > > You should be able to calculate it by configuring units with various numbers > of drives. many vendors also publish price lists (though the cost will be different when calculated these two ways.) From lynesh at Cardiff.ac.uk Tue Feb 16 03:31:26 2010 From: lynesh at Cardiff.ac.uk (Huw Lynes) Date: Tue, 16 Feb 2010 11:31:26 +0000 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: References: <9f8092cc1002151502i7f17c32dk9db4aaa14619e2c@mail.gmail.com> Message-ID: <1266319886.2321.9.camel@w609.insrv.cf.ac.uk> On Mon, 2010-02-15 at 17:10 -0600, Rahul Nabar wrote: > On Mon, Feb 15, 2010 at 5:02 PM, John Hearns wrote: > > That's a good question. > > > > PBS are promoting PBS Analytics http://www.pbsgridworks.com/Product.aspx?id=7 > > and SGE has Arco http://wikis.sun.com/display/GridEngine/Installing+ARCo > > Too bad they both seem "paid". I was hoping to find something in the > "free" domain. I doubt I can justify the $$ for a scheduler-visualizer > especially if (like most things in the scheduling universe) the > licenses tend to be stubbornly per-core. The per-core licensing has > been the single biggest factor that prevents me from even evaluating > out any of the "paid" schedulers. We are also looking at this area. We've looked at PBS Analytics (expensive for extra licenses on top of our existing PBSPro) and Cluster Resources MOAB Access Portal (expensive, with a problematic authentication model). Uni Dusseldorf have an in-house tool called myJAM which we are hoping to test in the near future. The very first script I wrote when we took delivery of our PBSPro cluster was a tool to output a summary of node use. It will probably work fine on Torque too. It's not a GUI, but at least it's something. http://webdocs.arcca.cf.ac.uk/external/scripts/qstate Thanks, Huw -- Huw Lynes | Advanced Research Computing HEC Sysadmin | Cardiff University | Redwood Building, Tel: +44 (0) 29208 70626 | King Edward VII Avenue, CF10 3NB From michf at post.tau.ac.il Tue Feb 16 04:53:28 2010 From: michf at post.tau.ac.il (Micha Feigin) Date: Tue, 16 Feb 2010 14:53:28 +0200 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <4B79F7B4.9020808@scalableinformatics.com> References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: <20100216145328.52f048a9@math018-24.tau.ac.il> On Mon, 15 Feb 2010 20:41:08 -0500 Joe Landman wrote: > Rahul Nabar wrote: > > This was the response from Dell, I especially like the analogy: > > > > [snip] > >> There are a number of benefits for using Dell qualified drives in > >> particular ensuring a ***positive experience*** and protecting > >> ***our data***. While SAS and SATA are industry standards there are > >> differences which occur in implementation. An analogy is that > >> English is spoken in the UK, US >and Australia. While the language > >> is generally the same, there are subtle differences in word usage > >> which can lead to confusion. This exists in >storage subsystems as > >> well. As these subsystems become more capable, faster and more > >> complex, these differences in implementation can have >greater > >> impact. > > [snip] > > > > I added the emphasis. I am in love Dell-disks that get me "the > > positive experience". :) > > Please indulge my taking a contrarian view based upon the products we > sell/support/ship. > > I see significant derision heaped upon these decisions, which are called > "marketing decisions" by Dell and others. It couldn't be possible, in > most commenter's minds that they might actually have a point ... > > ... I am not defending Dell's language (I wouldn't use this or allow > this to be used in our outgoing marketing/customer communications). > > Let me share an anecdote. I have elided the disk manufacturers name to > protect the guilty. I will not give hints as to whom they are, though > some may be able to guess ... I will not confirm. > > We ship units with 2TB (and 1.5TB) drives among others. We burn in and > test these drives. We work very hard to insure compatibility, and to > make sure that when users get the units, that the things work. We > aren't perfect, and we do occasionally mess up. When we do, we own up > to it and fix it right away. Its a different style of support. The > buck stops with us. Period. > > So along comes a drive manufacturer, with some nice looking specs on 2TB > (and some 1.5 and 1 TB) drives. They look great on paper. We get them > into our labs, and play with them, and they seem to run really well. > Occasional hiccup on building RAIDs, but you get that in large batches > of drives. > > So now they are out in the field for months, under various loads. Some > in our DeltaV's, some in our JackRabbits. The units in the DeltaV's > seem to have a ridiculously high failure rate. This is not something we > see in the lab. Even with constant stress, horrific sustained workloads > ... they don't fail in ou testing. But get these same drives out into > the users hands ... and whammo. > > Slightly different drives in our JackRabbit units, with a variety of > RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc. > > This is not something we see in the lab in our testing. We try > emulating their environments, and we can't generate the failures. > > Worse, we get the drives back after exchanging them at our cost with new > replacements, only to find out, upon running diagnostics, that the > drives haven't failed according to the test tool. This failing drive > vendor refuses to acknowledge firmware bugs, effectively refuses to > release patches/fixes. > > Our other main drive vendor, while not currently with a 2TB drive unit, > doesn't have anything like this manufacturers failure rate in the field. > When drives die in the field, they really ... really die in the field. > And they do fix their firmware. > > So we are now moving off this failing manufacturer (its a shame as they > used to produce quality parts for RAID several years ago), and we are > evaluating replacements for them. Firmware updates are a critical > aspect of a replacement. If the vendor won't allow for a firmware > update, we won't use them. > > So ... this anecdote complete, if someone called me up and said "Joe, I > really want you to build us an siCluster for our storage, and I want you > to use [insert failing manufacturer's name here] drives because we > like them", what do you think my reaction should be? Should it be > "sure, no problem, whatever you want" ... with the subsequent problems > and pain, for which we would be blamed ... or should it be "no, these > drives don't work well ... deep and painful experience at customer sites > shows that they have bugs in their firmware which are problematic for > RAID users ... we are attempting to get them to give us the updated > firmware to help the existing users, but we would not consider shipping > more units with these drives due to their issues." > > Is that latter answer, which is the correct answer, a marketing answer? > But what if the customer tells you, ship me your system without a drive, I'll put whatever I want in there so you are not my point of contact for failing drives but you say, no, I won't allow them in my system and I won't even sell you a replacement of what I do allow in the system? > Yeah, SATA and SAS are standards. Yeah, in theory, they all do work > together. In reality, they really don't, and you have to test. > Everyone does some aspect slightly different and usually in software, so > they can fix it if they messed up. If their is a RAID timeout bug due > to head settling timing, yeah, this is fixable. But if the disk > manufacturer doesn't want to fix it ... its your companies name on the > outside of that box. You are going to take the heat for their problems. > > Note: This isn't just SATA/SAS drives, there are a whole mess of things > that *should* work well together, but do not. We had some exciting > times in the recent past with SAS backplanes that refused to work with > SAS RAID cards. We've had some excitment from 10GbE cards, IB cards, > etc. that we shouldn't have had. > > I can't and won't sanction their tone to you ... they should have > explained things correctly. Given that PERC are rebadged LSI, yeah, I > know perfectly well a whole mess of drives that *do not* work correctly > with them. > > So please don't take Dell to task for trying to help you avoid making > what they consider a bad decision on specific components. There could > be a marketing aspect to it, but support is a cost, and they want to > minimize costs. Look at failure rates, and toss the suppliers who have > very high ones. > > > From landman at scalableinformatics.com Tue Feb 16 05:32:30 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 16 Feb 2010 08:32:30 -0500 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <20100216145328.52f048a9@math018-24.tau.ac.il> References:

<4B79F7B4.9020808@scalableinformatics.com> <20100216145328.52f048a9@math018-24.tau.ac.il> Message-ID: <4B7A9E6E.9010500@scalableinformatics.com> On 2/16/2010 7:53 AM, Micha Feigin wrote: >> So ... this anecdote complete, if someone called me up and said "Joe, I >> really want you to build us an siCluster for our storage, and I want you >> to use [insert failing manufacturer's name here] drives because we >> like them", what do you think my reaction should be? Should it be >> "sure, no problem, whatever you want" ... with the subsequent problems >> and pain, for which we would be blamed ... or should it be "no, these >> drives don't work well ... deep and painful experience at customer sites >> shows that they have bugs in their firmware which are problematic for >> RAID users ... we are attempting to get them to give us the updated >> firmware to help the existing users, but we would not consider shipping >> more units with these drives due to their issues." >> >> Is that latter answer, which is the correct answer, a marketing answer? >> >> > But what if the customer tells you, ship me your system without a drive, I'll > put whatever I want in there so you are not my point of contact for failing > drives but you say, no, I won't allow them in my system and I won't even sell > you a replacement of what I do allow in the system? > We have tried this before. Invariably they set up the disks wrong. And then we get the blame and a bad rap from the customer for selling them a slow unit. Even though it was their own fault it was slow. Or we got a number of different flavors of drive, no guarantees on same batch or firmware revision, or even same brand. And yes, we see and deal with those issues. We simply don't sell bare chassis any longer, because it is our name on the box. When you get it, it works and is fast. If you make changes, and you are welcome to, it will likely slow down. Several of our customers ignored our warnings on this, reloaded the unit and yelled at us over their slower performance. One of our earliest prospective customers, absolutely convinced they knew more than us about how to configure/build fast storage, reconfigured a working unit with fast IO system to be a much slower unit, and then proceeded to benchmark us versus one of our competitors. Yeah, we are sensitive to this. I won't defend Dell's position. I will point out that they have a point with some of their restrictions. You want to service the branded drives we ship, you are welcome to do this yourself. We have no issues with that. We do ensure consistent firmware revs between disks, so you'd need to do this yourself. You want to wipe the OS and the install and set it up, you are welcome to do this, though we caution that you are going to throw lots of performance away by doing so. My other point about this is that if you don't like Dell's policies, you have freedom to choose other vendors. But giving them heat over reducing their exposure to support liability strikes me as not a reasonable gripe. But then again, I am on the vendor side of the equation. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics Inc., email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From john.hearns at mclaren.com Tue Feb 16 05:34:46 2010 From: john.hearns at mclaren.com (Hearns, John) Date: Tue, 16 Feb 2010 13:34:46 -0000 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <20100216145328.52f048a9@math018-24.tau.ac.il> References:

<4B79F7B4.9020808@scalableinformatics.com> <20100216145328.52f048a9@math018-24.tau.ac.il> Message-ID: <68A57CCFD4005646957BD2D18E60667B0F573CE4@milexchmb1.mil.tagmclarengroup.com> > > But what if the customer tells you, ship me your system without a > drive, I'll > put whatever I want in there so you are not my point of contact for > failing > drives but you say, no, I won't allow them in my system and I won't > even sell > you a replacement of what I do allow in the system? I have been in a situation like this, but not with disks. Company X sells them a system, integrates it on site. However hardware maintenance SHOULD come from Company Y which is the supplier of the hardware. Customers never see it that way, and Company X gets the phone calls and is expected to book service calls, take hardware apart and run tests on behalf of Company Y. That's why people like Joe are very wary of letting customers have a free-for-all in choosing $high-performance-component to save a few dollars. The contents of this email are confidential and for the exclusive use of the intended recipient. If you receive this email in error you should not copy it, retransmit it, use it or disclose its contents but should return it to the sender immediately and delete your copy. From deadline at eadline.org Tue Feb 16 07:28:09 2010 From: deadline at eadline.org (Douglas Eadline) Date: Tue, 16 Feb 2010 10:28:09 -0500 (EST) Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: References: Message-ID: <58062.192.168.1.1.1266334089.squirrel@mail.eadline.org> For SGE: http://xml-qstat.org/index.html -- Doug > Are there any generic "scheduler visualization" tools out there? > Sometimes I feel it'd be nice if I had a way to find out how my > sceduling was performing. i.e. blocks of empty procs; how fragmented > the job assignment was; large / small job split, utilization > efficiency, backfill status; etc. I use openpbs (torque) + maui and it > does have some text mode accounting reports. But sometimes they are > hard to digest and a birds eye view might be easier via a > visualization. > > I haven't found any toolkits yet. Of course, I could parse and plot > myself with a bunch of sed / awk / gnuplot but I don't want to > unnecessarily reinvent the wheel if I can avoid it. Also, I remembered > seeing some cool visualizations (quite animated at that) at one of the > supercomputing agencies a while ago but just can't seem to find which > one it was now that I need it. Admittedly, some of the visualizations > can sway more towards the "coolness" factor than actual insights but > still it's worth a shot. > > Any pointers or scripts other Beowulfers might have are greatly > appreciated. > > -- > Rahul > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Doug From mathog at caltech.edu Tue Feb 16 08:49:07 2010 From: mathog at caltech.edu (David Mathog) Date: Tue, 16 Feb 2010 08:49:07 -0800 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? Message-ID: Joe Landman wrote > So along comes a drive manufacturer, with some nice looking specs on 2TB > (and some 1.5 and 1 TB) drives. They look great on paper. We get them > into our labs, and play with them, and they seem to run really well. > Occasional hiccup on building RAIDs, but you get that in large batches > of drives. > > So now they are out in the field for months, under various loads. Some > in our DeltaV's, some in our JackRabbits. The units in the DeltaV's > seem to have a ridiculously high failure rate. This is not something we > see in the lab. Even with constant stress, horrific sustained workloads > ... they don't fail in ou testing. But get these same drives out into > the users hands ... and whammo. > > Slightly different drives in our JackRabbit units, with a variety of > RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc. > > This is not something we see in the lab in our testing. We try > emulating their environments, and we can't generate the failures. > > Worse, we get the drives back after exchanging them at our cost with new > replacements, only to find out, upon running diagnostics, that the > drives haven't failed according to the test tool. This failing drive > vendor refuses to acknowledge firmware bugs, effectively refuses to > release patches/fixes. While there is no doubting that these drives didn't work reliably in your arrays, that doesn't necessarily mean they were "defective". Just playing devil's advocate here, but it could be the array controller is using some feature where there is a bit of wiggle room in the standard, so that both the disk and the controller are "conforming", but they still won't work together reliably. In a situation like that I would expect the vendor to disclose the issue, so it would be clear why the disks had to come from A and not B. As long as the vendor explained the problem clearly most customers would be fine buying the preferred disks. It's when the vendor says "you have to use OUR disks" and doesn't tell you why, and when, as far as you can tell, these are the same devices that you could buy directly from the manufacturer without the 5X markup, that things smell bad. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From landman at scalableinformatics.com Tue Feb 16 09:09:45 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 16 Feb 2010 12:09:45 -0500 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: Message-ID: <4B7AD159.50000@scalableinformatics.com> David Mathog wrote: > Joe Landman wrote > >> So along comes a drive manufacturer, with some nice looking specs on 2TB >> (and some 1.5 and 1 TB) drives. They look great on paper. We get them >> into our labs, and play with them, and they seem to run really well. >> Occasional hiccup on building RAIDs, but you get that in large batches >> of drives. >> >> So now they are out in the field for months, under various loads. Some >> in our DeltaV's, some in our JackRabbits. The units in the DeltaV's >> seem to have a ridiculously high failure rate. This is not something we >> see in the lab. Even with constant stress, horrific sustained workloads >> ... they don't fail in ou testing. But get these same drives out into >> the users hands ... and whammo. >> >> Slightly different drives in our JackRabbit units, with a variety of >> RAID controllers. Same types of issues. Timeouts, RAID fall outs, etc. >> >> This is not something we see in the lab in our testing. We try >> emulating their environments, and we can't generate the failures. >> >> Worse, we get the drives back after exchanging them at our cost with new >> replacements, only to find out, upon running diagnostics, that the >> drives haven't failed according to the test tool. This failing drive >> vendor refuses to acknowledge firmware bugs, effectively refuses to >> release patches/fixes. > > While there is no doubting that these drives didn't work reliably in > your arrays, that doesn't necessarily mean they were "defective". Just > playing devil's advocate here, but it could be the array controller is > using some feature where there is a bit of wiggle room in the standard, > so that both the disk and the controller are "conforming", but they > still won't work together reliably. In a situation like that I would > expect the vendor to disclose the issue, so it would be clear why the > disks had to come from A and not B. As long as the vendor explained the > problem clearly most customers would be fine buying the preferred disks. I agree that some devices work well with others. This is what we see. Some do not. We have a few boxful's of 1TB drives that don't play well with others. And yes, standards do leave wiggle room. Interop testing days are critical. A connect-a-thon very helpful. But the point is, just because it says SATA, you shouldn't expect that it will work with all SATA controllers. No ... seriously. Likewise this is true with many other components. Some stuff doesn't play well with others. I didn't sanction the language used, I thought it wrong. But from a support scenario, it can be (and often is) a nightmare. We take ownership of as little or as much of what our customers want us to do. If your name is on the box, no-one appreciates a finger pointing exercise rather than a path to solution. > It's when the vendor says "you have to use OUR disks" and doesn't tell > you why, and when, as far as you can tell, these are the same devices > that you could buy directly from the manufacturer without the 5X markup, > that things smell bad. I agree with this paragraph. We won't name specific names in public, we do speak about our drive issues in private with our customers. 5X markup? We must be doing something wrong :/ > > Regards, > > David Mathog > mathog at caltech.edu > Manager, Sequence Analysis Facility, Biology Division, Caltech > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From rpnabar at gmail.com Tue Feb 16 09:38:06 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 16 Feb 2010 11:38:06 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <4B79F7B4.9020808@scalableinformatics.com> References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: On Mon, Feb 15, 2010 at 7:41 PM, Joe Landman wrote: > Please indulge my taking a contrarian view based upon the products we > sell/support/ship. > I can't and won't sanction their tone to you ... they should have explained > things correctly. ?Given that PERC are rebadged LSI, yeah, I know perfectly > well a whole mess of drives that *do not* work correctly with them. > > So please don't take Dell to task for trying to help you avoid making what > they consider a bad decision on specific components. ?There could be a > marketing aspect to it, but support is a cost, and they want to minimize > costs. ?Look at failure rates, and toss the suppliers who have very high > ones. Another worry is what happens in the long run if the vendor either folds shop or stops selling and / or supporting that particular model of drive. Frequently the lifecycle of these devices is longer than the warranty. The inability to shop around for drives could be an issue. Especially with this rigid approach of firmware rejecting a foreign component and not just a warning. Perhaps this is a paranoid scenario since these are big vendors and not likely to go bankrupt. -- Rahul From landman at scalableinformatics.com Tue Feb 16 09:44:59 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Tue, 16 Feb 2010 12:44:59 -0500 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: <4B7AD99B.90605@scalableinformatics.com> Rahul Nabar wrote: > On Mon, Feb 15, 2010 at 7:41 PM, Joe Landman > wrote: > >> Please indulge my taking a contrarian view based upon the products we >> sell/support/ship. >> I can't and won't sanction their tone to you ... they should have explained >> things correctly. Given that PERC are rebadged LSI, yeah, I know perfectly >> well a whole mess of drives that *do not* work correctly with them. >> >> So please don't take Dell to task for trying to help you avoid making what >> they consider a bad decision on specific components. There could be a >> marketing aspect to it, but support is a cost, and they want to minimize >> costs. Look at failure rates, and toss the suppliers who have very high >> ones. > > Another worry is what happens in the long run if the vendor either > folds shop or stops selling and / or supporting that particular model > of drive. Frequently the lifecycle of these devices is longer than the > warranty. The inability to shop around for drives could be an issue. > Especially with this rigid approach of firmware rejecting a foreign > component and not just a warning. This is an issue with any proprietary technology. We talk about this in terms of "freedom from bricking". For example, with Sun, there are quite a few (now quite nervous) Thumper/Thor owners. Thumper has been EOLed, and the future of Thor is uncertain at best. We have customers ask us constantly if we take trade-ins, and others asking us if they can buy the trade-ins for spares. This is a real issue. But it is tangential to the specific issue as initially discussed. > > Perhaps this is a paranoid scenario since these are big vendors and > not likely to go bankrupt. Erm ... uh ... Sun, SGI, LNXI, ... Big != Safe. > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From james.p.lux at jpl.nasa.gov Tue Feb 16 09:52:02 2010 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 16 Feb 2010 09:52:02 -0800 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: <4B7AD159.50000@scalableinformatics.com> Message-ID: On 2/16/10 9:09 AM, "Joe Landman" wrote: > > > 5X markup? We must be doing something wrong :/ > Depends on what the price includes. I could easily see a commodity drive in a case lot being dropped on the loading dock at, say, $100 each, and the drive with installation, system integrator testing, downstream support, etc. being $500. Doesn't take many hours on the phone tracking down an idiosyncracy or setup to cost $500 in labor. A lot of times, a company will price things to spread the NRE (all that testing of drives in the lab in various configurations, working out the driver parameters that make it work best, etc... Which can easily be in the tens, if not hundreds, of $K range) across the sell price of the many boxes. Then you wind up with folks asking about "how come the *same drive* from X costs twice as much from Y?", while conveniently forgetting that X isn't charging you for the $100K NRE or the $1000/incident support fee or the... From james.p.lux at jpl.nasa.gov Tue Feb 16 10:32:01 2010 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 16 Feb 2010 10:32:01 -0800 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Rahul Nabar > Sent: Tuesday, February 16, 2010 9:38 AM > To: landman at scalableinformatics.com > Cc: Mikhail Kuzminsky; beowulf at beowulf.org > Subject: Re: [Beowulf] Third-party drives not permitted on new Dell servers? > > On Mon, Feb 15, 2010 at 7:41 PM, Joe Landman > wrote: > > > Please indulge my taking a contrarian view based upon the products we > > sell/support/ship. > > I can't and won't sanction their tone to you ... they should have explained > > things correctly. ?Given that PERC are rebadged LSI, yeah, I know perfectly > > well a whole mess of drives that *do not* work correctly with them. > > > > So please don't take Dell to task for trying to help you avoid making what > > they consider a bad decision on specific components. ?There could be a > > marketing aspect to it, but support is a cost, and they want to minimize > > costs. ?Look at failure rates, and toss the suppliers who have very high > > ones. > > Another worry is what happens in the long run if the vendor either > folds shop or stops selling and / or supporting that particular model > of drive. Frequently the lifecycle of these devices is longer than the > warranty. The inability to shop around for drives could be an issue. > Especially with this rigid approach of firmware rejecting a foreign > component and not just a warning. > "lifecycle" has a lot of meanings, and I think that's where the problems arise, in some cases. For the most part, PC hardware these days is designed based on a 3 year replacement schedule, so a vendor will set up their warranty terms, model introduction and retirement schedule based on that, regardless of whether the actual life is much (sometimes much, much) longer (as all of us with old NT4.0 and DOS 3.x machines lurking in the lab know). Unfortunately, the HPC (Beowulf) world is driven by the economics of the ordinary consumer/office desktop computer. That's what lets you build a teraflop machine without incurring the debt of a small country: you can leverage the mass production for consumers which drives the prices down, but also has very short product cycles. The 3 year cycle is driven by in large part by IRS depreciation rules which call computer equipment a "5-year" piece of gear, but MACRS means that by the end of year 3, you've already depreciated over 70% of the purchase price. Considering that purchase price is about half the "total cost of ownership" (at most), it makes sense to buy new gear that often (since the help-desk, configuration management, networking, etc, support costs remain fixed per month). The hardware cost is often a small fraction of the overall cost to put a computer on someone's desk. If you look at a typical desktop PC scenario, you might have a $2500 computer, where the first year depreciation is $42/mo, the second year is $67/mo, and the third year is $40/mo. On top of that, support costs might be $100-200/mo. In the fourth year, under MACRS, the depreciation is $24/mo. So you could get a brand new computer (which will be easier to support, is faster, etc.) for a big $22/month hit on your budget(which is <10% of the total monthly tab, counting the support costs). It's a no brainer to turn over the computers that fast, especially if you are trying to save on support costs by having a limited number of different models in the installed base at any given time (which is what large companies do). From lindahl at pbm.com Tue Feb 16 10:57:15 2010 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 16 Feb 2010 10:57:15 -0800 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> <4B7A032C.2080207@scalableinformatics.com> Message-ID: <20100216185715.GA16132@bx9.net> On Tue, Feb 16, 2010 at 02:08:32AM -0500, Mark Hahn wrote: > I'm curious: you imply that vendors have no recourse to force the _disk_ > vendors to supply parts which work right (as defined by standard). > is that really true? I'd be surprised if Dell doesn't get pretty emphatic > cooperation from their disk vendor(s). If you recall our past discussions here about "raid duty" disks, there are a couple of things not in the standard which are significant: vibration resistance, and the disk's maximum retry time vs. raid controllers deciding there's a timout. Most standards have problems like that. You can't imagine the interesting time PathScale had getting our InfiniBand products to cooperate well with Mellanox parts. -- greg From spandey at csse.unimelb.edu.au Sun Feb 14 14:33:28 2010 From: spandey at csse.unimelb.edu.au (Suraj Pandey) Date: Mon, 15 Feb 2010 09:33:28 +1100 Subject: [Beowulf] [hpc-announce] CCGrid 2010: Call for Research/Product Demos Message-ID: <623657FD-64B6-4146-A186-4ADB6E59647F@csse.unimelb.edu.au> [Please accept our apologies if you receive multiple copies of this email] ------------------------------------------------------------------------ CCGrid 2010: Call for Research/Product Demos ------------------------------------------------------------------------ The 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010) May 17-20, Melbourne, Victoria, Australia. http://www.manjrasoft.com/ccgrid2010/callfordemos.html We invite research demonstrations from laboratories or research groups (academic, government, or industrial) to showcase new innovations and technologies in Cloud Computing and HPC/scientific applications at the CCGrid 2010 conference. CCGrid is a highly successful and well-recognized International conference for presenting the latest breakthroughs in Cluster, Cloud and Grid technologies. Although we are looking for demos on the emerging Cloud Computing and GPU-based computing areas, we welcome demos from all areas within the scope of CCGrid 2010. The topics of interest including the following, but not limited to are: - Scientific, Engineering, Commercial or e-Science Applications using Cloud Computing - Middleware - Demonstrable Open-Challenges - Resource Management - Scheduling and Load Balancing - Programming Models, Tools, and Environments - Performance Evaluation and Modeling The proposal should include: - Up to 2 page description of the demo and the set up, and - A short (half-page) description of the research lab Accepted research demonstrations will be invited to present in the conference. These abstracts will not be included in the conference proceedings, but will be published on the conference website. Live demos are expected to be up and running during the Research Demonstration Session. We strongly discourage recorded presentations. In addition, authors of accepted demonstrations are expected to communicate more general information about the specific demo and other work being performed at the lab using their own posters (poster boards are provided). We will be providing wireless Internet access ONLY. This means, the hardware (compute and data resources) and software needed for the demo should reside either in public Clouds such as Amazon or private infrastructure of research labs/enterprises. A Best Research Demo Award will be presented to the winning demo/team selected by the award committee at the CCGrid 2010 conference. Please submit a pdf version of the proposal to: spandey at csse.unimelb.edu.au by the due date. Demo Co-ordinators -------------------- Pavan Balaji, Argonne National Laboratory A,B.M. Russel, VeRSI Suraj Pandey, University of Melbourne Important Dates -------------------- Deadline for submissions: March 31st, 2010 Notification of Acceptance: April 5th, 2010 ------------------------------------------------------------------------ Sincerely, Suraj Pandey Phd Candidate, Cloud Computing and Distributed Systems (CLOUDS) Lab Dept. of Computer Science and Software Engineering The University of Melbourne ICT Building, 111 Barry Street, Carlton, Melbourne, VIC 3053, Australia Phone: +61-3-8344-1355 (Off) Fax: +61-3-9348-1184 email: spandey at csse.unimelb.edu.au url: http://www.csse.unimelb.edu.au/~spandey ----------------------------------------------------------- -------------- next part -------------- An HTML attachment was scrubbed... URL: From Z.Wu at leeds.ac.uk Tue Feb 16 03:53:59 2010 From: Z.Wu at leeds.ac.uk (Zhili Wu) Date: Tue, 16 Feb 2010 11:53:59 +0000 Subject: [Beowulf] SMLA'10 Workshop CFP: Submission Deadline Extended to March 10, 2010 In-Reply-To: <15AB31C404448F4A918D077CA7E74C6E012B1989ADE7@HERMES7.ds.leeds.ac.uk> References: <15AB31C404448F4A918D077CA7E74C6E012B1989ADE7@HERMES7.ds.leeds.ac.uk> Message-ID: <15AB31C404448F4A918D077CA7E74C6E012B1BC626EB@HERMES7.ds.leeds.ac.uk> [Please accept our apologies if you receive multiple copies of this email] CALL FOR PAPERS The 2010 International Workshop on Scalable Machine Learning and Applications (SMLA-10) To be held in conjunction with CIT'10 (Supported by IEEE Computer Society), June 29 - July 1, 2010, Bradford, UK http://smlc09.leeds.ac.uk/smla/ http://www.scim.brad.ac.uk/~ylwu/CIT2010/ SCOPE: Machine learning and data mining have been playing an increasing role in many real scenarios, such as web mining, language processing, image search, financial engineering, etc. In these application domains, data are surpassing the scale of terabyte in an ever faster pace, but the techniques for processing and mining them often lag behind in far too many aspects. To deal with billions of web pages, images, transaction records and capacity-intensive audio and video data stream, machine learning and data mining techniques and their underlying computing infrastructure are facing great challenges. In this SMLA workshop we are willing to bring together researchers and practitioners for getting advancement in scalable machine learning and applications. On one hand we expect works on how to dramatically empower existing machine learning and data mining methods via grid/cloud or other novel computing models. On the other hand we value the effort of building or extending machine learning and data mining methods that are scalable to huge datasets. Papers can be related to any subset of the following topics, or any unconventional direction to scale up machine learning and data mining methods: -- Cloud Computing -- Large Scale Data Mining -- Fast Support Vector Machines -- Data Abstraction, Dimension Reduction -- User Personalization and Recommendation -- Natural Language Processing -- Ontology and Semantic Technologies -- Parallelization of Machine Learning Methods -- Fast Machine Learning Model Tuning and Selection -- Large Scale Webpage Topic, Genre, Sentiment Classification -- Financial Engineering STEERING COMMITTEE Chih-Jen Lin, Natinal Taiwan University, Taiwan Serge Sharoff, University of Leeds, UK Katja Markert, University of Leeds, UK Ivor Wai-Hung Tsang, Nanyang Technological University, Singapore PROGRAM CHAIRS Zhili Wu, University of Leeds, UK Xiaolong Jin, University of Bradford, UK PUBLICITY CHAIRS Evi Syukur, University of New South Wales, Australia Lei Liu, University of Bradford, UK PROGRAM COMMITTEE Please refer to http://smlc09.leeds.ac.uk/smla/committee.htm for a complete list of program committee PAPER SUBMISSION: Authors are invited to submit manuscripts reporting original unpublished research and recent developments in the topics related to the workshop. The length of the papers should not exceed 6 pages + 2 pages for overlength charges (IEEE Computer Society Proceedings Manuscripts style: two columns, single-spaced), including figures and references, using 10 fonts, and number each page. Papers should be submitted electronically in PDF format (or postscript) by sending it as an e-mail attachment to Zhili Wu (z.wu at leeds.ac.uk). All papers will be peer reviewed and the comments will be provided to the authors. The accepted papers will be published together with those of other CIT'10 workshops by the IEEE Computer Society Press. *********************************************************************** Distinguished selected papers, after further extensions, will be published in CIT 2010's special issues of the following prestigious SCI-indexed journals: -- The Journal of Supercomputing ?C Springer -- Journal of Computer and System Sciences ?C Elsevier -- Concurrency and Computation: Practice and Experience - John Wiley & Sons *********************************************************************** IMPORTANT DATES: Paper submission: March 10, 2010, GMT-11 Notification of Acceptance: April 01, 2010 Camera-ready due: April 18, 2010 Author registration: April 18, 2010 Conference: June 29 - July 1, 2010 *********************************************************************** From vallard at benincosa.com Tue Feb 16 07:43:22 2010 From: vallard at benincosa.com (Vallard Benincosa) Date: Tue, 16 Feb 2010 07:43:22 -0800 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: <58062.192.168.1.1.1266334089.squirrel@mail.eadline.org> References: <58062.192.168.1.1.1266334089.squirrel@mail.eadline.org> Message-ID: <56E58A53-65F9-4177-95FA-54F35064C24C@benincosa.com> pbstop is a great curses tool for TORQUE to get an idea of where jobs are allocated in a visual way On Feb 16, 2010, at 7:28 AM, "Douglas Eadline" wrote: > For SGE: > > http://xml-qstat.org/index.html > > -- > Doug > >> Are there any generic "scheduler visualization" tools out there? >> Sometimes I feel it'd be nice if I had a way to find out how my >> sceduling was performing. i.e. blocks of empty procs; how fragmented >> the job assignment was; large / small job split, utilization >> efficiency, backfill status; etc. I use openpbs (torque) + maui and >> it >> does have some text mode accounting reports. But sometimes they are >> hard to digest and a birds eye view might be easier via a >> visualization. >> >> I haven't found any toolkits yet. Of course, I could parse and plot >> myself with a bunch of sed / awk / gnuplot but I don't want to >> unnecessarily reinvent the wheel if I can avoid it. Also, I >> remembered >> seeing some cool visualizations (quite animated at that) at one of >> the >> supercomputing agencies a while ago but just can't seem to find which >> one it was now that I need it. Admittedly, some of the visualizations >> can sway more towards the "coolness" factor than actual insights but >> still it's worth a shot. >> >> Any pointers or scripts other Beowulfers might have are greatly >> appreciated. >> >> -- >> Rahul >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin >> Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From oneal at dbi.udel.edu Tue Feb 16 10:20:39 2010 From: oneal at dbi.udel.edu (Doug O'Neal) Date: Tue, 16 Feb 2010 13:20:39 -0500 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: <4B7AD159.50000@scalableinformatics.com> Message-ID: On 02/16/2010 12:52 PM, Lux, Jim (337C) wrote: > > > On 2/16/10 9:09 AM, "Joe Landman" wrote: >> >> 5X markup? We must be doing something wrong :/ >> > > > Depends on what the price includes. I could easily see a commodity drive in > a case lot being dropped on the loading dock at, say, $100 each, and the > drive with installation, system integrator testing, downstream support, etc. > being $500. Doesn't take many hours on the phone tracking down an > idiosyncracy or setup to cost $500 in labor. But when you're installing anywhere from eight to forty-eight drives in a single system the required hours to make up that $400/drive overhead does get larger. And if you spread the system integrator testing over eight drives per unit and hundreds to thousands of units the cost per drive shouldn't be measured in hundreds of dollars. From prentice at ias.edu Tue Feb 16 11:18:55 2010 From: prentice at ias.edu (Prentice Bisbal) Date: Tue, 16 Feb 2010 14:18:55 -0500 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <39109.192.168.1.1.1266260198.squirrel@mail.eadline.org> References:

<39109.192.168.1.1.1266260198.squirrel@mail.eadline.org> Message-ID: <4B7AEF9F.8050407@ias.edu> Actually, I think the ISO standard calls for concatenating those two words into one. Douglas Eadline wrote: > There are two "ISO standard" English words I have for this kind of > marketing response. > > -- > Doug > > >> This was the response from Dell, I especially like the analogy: >> >> [snip] >>> There are a number of benefits for using Dell qualified drives in >>> particular ensuring a ***positive experience*** and protecting ***our >>> data***. >>> While SAS and SATA are industry standards there are differences which >>> occur in implementation. An analogy is that English is spoken in the UK, >>> US >and Australia. While the language is generally the same, there are >>> subtle differences in word usage which can lead to confusion. This exists >>> in >storage subsystems as well. As these subsystems become more capable, >>> faster and more complex, these differences in implementation can have >>>> greater impact. >> [snip] >> >> I added the emphasis. I am in love Dell-disks that get me "the >> positive experience". :) >> >> -- >> Rahul -- Prentice From gerry.creager at tamu.edu Tue Feb 16 11:33:35 2010 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue, 16 Feb 2010 13:33:35 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: <4B7AF30F.1010804@tamu.edu> On 2/16/10 12:32 PM, Lux, Jim (337C) wrote: >> -----Original Message----- >> From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Rahul Nabar >> Sent: Tuesday, February 16, 2010 9:38 AM >> To: landman at scalableinformatics.com >> Cc: Mikhail Kuzminsky; beowulf at beowulf.org >> Subject: Re: [Beowulf] Third-party drives not permitted on new Dell servers? >> >> On Mon, Feb 15, 2010 at 7:41 PM, Joe Landman >> wrote: >> >>> Please indulge my taking a contrarian view based upon the products we >>> sell/support/ship. >>> I can't and won't sanction their tone to you ... they should have explained >>> things correctly. Given that PERC are rebadged LSI, yeah, I know perfectly >>> well a whole mess of drives that *do not* work correctly with them. >>> >>> So please don't take Dell to task for trying to help you avoid making what >>> they consider a bad decision on specific components. There could be a >>> marketing aspect to it, but support is a cost, and they want to minimize >>> costs. Look at failure rates, and toss the suppliers who have very high >>> ones. >> >> Another worry is what happens in the long run if the vendor either >> folds shop or stops selling and / or supporting that particular model >> of drive. Frequently the lifecycle of these devices is longer than the >> warranty. The inability to shop around for drives could be an issue. >> Especially with this rigid approach of firmware rejecting a foreign >> component and not just a warning. >> > > "lifecycle" has a lot of meanings, and I think that's where the problems arise, in some cases. > For the most part, PC hardware these days is designed based on a 3 year replacement schedule, so a vendor will set up their warranty terms, model introduction and retirement schedule based on that, regardless of whether the actual life is much (sometimes much, much) longer (as all of us with old NT4.0 and DOS 3.x machines lurking in the lab know). > > Unfortunately, the HPC (Beowulf) world is driven by the economics of the ordinary consumer/office desktop computer. That's what lets you build a teraflop machine without incurring the debt of a small country: you can leverage the mass production for consumers which drives the prices down, but also has very short product cycles. > > The 3 year cycle is driven by in large part by IRS depreciation rules which call computer equipment a "5-year" piece of gear, but MACRS means that by the end of year 3, you've already depreciated over 70% of the purchase price. Considering that purchase price is about half the "total cost of ownership" (at most), it makes sense to buy new gear that often (since the help-desk, configuration management, networking, etc, support costs remain fixed per month). The hardware cost is often a small fraction of the overall cost to put a computer on someone's desk. If you look at a typical desktop PC scenario, you might have a $2500 computer, where the first year depreciation is $42/mo, the second year is $67/mo, and the third year is $40/mo. On top of that, support costs might be $100-200/mo. In the fourth year, under MACRS, the depreciation is $24/mo. So you could get a brand new computer (which will be easier to support, is faster, etc.) for a big $22/month hit on your budget (! > which is<10% of the total monthly tab, counting the support costs). It's a no brainer to turn over the computers that fast, especially if you are trying to save on support costs by having a limited number of different models in the installed base at any given time (which is what large companies do). > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf OK, my brain hurts, now. You've been in Management too long, Jim! From rpnabar at gmail.com Tue Feb 16 11:35:52 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 16 Feb 2010 13:35:52 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <4B7AEF9F.8050407@ias.edu> References:

<39109.192.168.1.1.1266260198.squirrel@mail.eadline.org> <4B7AEF9F.8050407@ias.edu> Message-ID: On Tue, Feb 16, 2010 at 1:18 PM, Prentice Bisbal wrote: > Actually, I think the ISO standard calls for concatenating those two > words into one. > > Douglas Eadline wrote: >> There are two "ISO standard" English words I have for this kind of >> marketing response. >> I think Doug was utilizing the wiggle-room in the ISO standards for the word(s). ;) -- Rahul From rpnabar at gmail.com Tue Feb 16 11:42:49 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Tue, 16 Feb 2010 13:42:49 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: On Tue, Feb 16, 2010 at 12:32 PM, Lux, Jim (337C) wrote: > Unfortunately, the HPC (Beowulf) world is driven by the economics of the ordinary consumer/office desktop computer. ?That's what lets you build a teraflop machine without incurring the debt of a small country: you can leverage the mass production for consumers which drives the prices down, but also has very short product cycles. > > The 3 year cycle is driven by in large part by IRS depreciation rules which call computer equipment a "5-year" piece of gear, but On the other hand many of the Beowulfers are in the govt. / university / higher-ed. domain where things run somewhat "tax free"? Not sure if then these IRS writeoffs then factor much into decision making or not. All the more reason to avoid getting locked-in to 3-year vendor cycles. -- Rahul From james.p.lux at jpl.nasa.gov Tue Feb 16 11:48:51 2010 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 16 Feb 2010 11:48:51 -0800 Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? In-Reply-To: References: <4B7AD159.50000@scalableinformatics.com> Message-ID: James Lux, P.E. Task Manager, SOMD Software Defined Radios Flight Communications Systems Section Jet Propulsion Laboratory 4800 Oak Grove Drive, Mail Stop 161-213 Pasadena, CA, 91109 +1(818)354-2075 phone +1(818)393-6875 fax > -----Original Message----- > From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Doug O'Neal > Sent: Tuesday, February 16, 2010 10:21 AM > To: beowulf at beowulf.org > Subject: [Beowulf] Re: Third-party drives not permitted on new Dell servers? > > On 02/16/2010 12:52 PM, Lux, Jim (337C) wrote: > > > > > > On 2/16/10 9:09 AM, "Joe Landman" wrote: > >> > >> 5X markup? We must be doing something wrong :/ > >> > > > > > > Depends on what the price includes. I could easily see a commodity drive in > > a case lot being dropped on the loading dock at, say, $100 each, and the > > drive with installation, system integrator testing, downstream support, etc. > > being $500. Doesn't take many hours on the phone tracking down an > > idiosyncracy or setup to cost $500 in labor. > > But when you're installing anywhere from eight to forty-eight drives in a > single system the required hours to make up that $400/drive overhead does > get larger. And if you spread the system integrator testing over eight > drives > per unit and hundreds to thousands of units the cost per drive shouldn't be > measured in hundreds of dollars. > True, IFF the costing strategy is based on that sort of approach. Various companies can and do price the NRE and support tail cost in a variety of ways. They might have a "notional" system size and base the pricing model on that: Say they, through research, find that most customers are buying, say, 32 systems at a crack. Now the support tail (which is basically "per system") is spread across only 32 drives, not thousands. If you happen to buy 64 systems, then you basically are paying twice. Most companies don't have infinite granularity in this sort of thing, and try to pick a few breakpoints that make sense. (NRE = non recurring engineering) As far as the NRE goes, say they get a batch of a dozen drives each of half a dozen kinds. They have to set up half a dozen test systems (either in parallel or sequentially), run the tests on all of them, and wind up with maybe 2 or 3 leading candidates that they decide to list on their "approved disk" list. The cost of testing the disks that didn't make the cut has to be added to the cost of the disks that did. There's a lot that goes into pricing that isn't obvious at first glance, or even second glance, especially if you're looking at a single instance (your own purchase) and trying to work backwards from there. There are weird anomalies that crop up in supposedly commodity items from things like fuel prices (e.g. you happened to buy that container load of disks when fuel prices were high, so shipping cost more). A couple years ago, there were huge fluctuations in the price of copper, so there would be 2:1 differences in the retail cost of copper wire and tubing at the local Home Depot and Lowes, basically depending on when they happened to have bought the stuff wholesale. (this is the kind of thing that arbitrageurs look for, of course) Some of it is "paying for convenience", too. Rather than do all the testing yourself, or writing a detailed requirements and procurement document for a third party, both of which cost you some non-zero amount of time and money, you just pay the increased price to a vendor who's done it for you. It's like eating sausage. You can buy already made sausage, and the sausage maker has done the experimenting with seasoning and process controls to come out with something that people like. Or, you can spend the time to make it yourself, potentially saving some money and getting a more customized sausage taste, BUT, you're most likely going to have some less-than-ideal sausage in the process. The more computers or sausage you're consuming, the more likely it is that you could do better with a customized approach, but, even there, you may be faced with resource limits (e.g. you could spend your time getting a better deal on the disks or you could spend your time doing research with the disks. Ultimately, the research MUST get done, so you have to trade off how much you're willing to spend.) From james.p.lux at jpl.nasa.gov Tue Feb 16 11:52:23 2010 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 16 Feb 2010 11:52:23 -0800 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: > -----Original Message----- > From: Rahul Nabar [mailto:rpnabar at gmail.com] > Sent: Tuesday, February 16, 2010 11:43 AM > To: Lux, Jim (337C) > Cc: landman at scalableinformatics.com; Mikhail Kuzminsky; beowulf at beowulf.org > Subject: Re: [Beowulf] Third-party drives not permitted on new Dell servers? > > On Tue, Feb 16, 2010 at 12:32 PM, Lux, Jim (337C) > wrote: > > Unfortunately, the HPC (Beowulf) world is driven by the economics of the ordinary consumer/office > desktop computer. ?That's what lets you build a teraflop machine without incurring the debt of a small > country: you can leverage the mass production for consumers which drives the prices down, but also has > very short product cycles. > > > > The 3 year cycle is driven by in large part by IRS depreciation rules which call computer equipment > a "5-year" piece of gear, but > > On the other hand many of the Beowulfers are in the govt. / university > / higher-ed. domain where things run somewhat "tax free"? Not sure if > then these IRS writeoffs then factor much into decision making or not. > All the more reason to avoid getting locked-in to 3-year vendor > cycles. > Yes, but my point was that Beowulfers are a relatively small consumer of what the industry produces, and what *industry* decides to produce is driven by what most people face in terms of lifecycle practices. So, even if you don't live with the IRS rules (and even if you're tax free, that doesn't mean that your accountants don't follow Generally Accepted Accounting Practices, which map pretty much one to one), you wind up being affected by them. From gerry.creager at tamu.edu Tue Feb 16 11:57:07 2010 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue, 16 Feb 2010 13:57:07 -0600 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: References:

<4B79F7B4.9020808@scalableinformatics.com> Message-ID: <4B7AF893.5000103@tamu.edu> On 2/16/10 1:42 PM, Rahul Nabar wrote: > On Tue, Feb 16, 2010 at 12:32 PM, Lux, Jim (337C) > wrote: >> Unfortunately, the HPC (Beowulf) world is driven by the economics of the ordinary consumer/office desktop computer. That's what lets you build a teraflop machine without incurring the debt of a small country: you can leverage the mass production for consumers which drives the prices down, but also has very short product cycles. >> >> The 3 year cycle is driven by in large part by IRS depreciation rules which call computer equipment a "5-year" piece of gear, but > > On the other hand many of the Beowulfers are in the govt. / university > / higher-ed. domain where things run somewhat "tax free"? Not sure if > then these IRS writeoffs then factor much into decision making or not. > All the more reason to avoid getting locked-in to 3-year vendor > cycles. At MY University, and in all my grants, I plan a 3-year depreciation cycle (regardless of the grief I gave Jim privately). You've gotta have a plan. That said, however, I also get to beat the inventory folk senseless when they tell me I've got a $100k computer on inventory that's 8 years old and is listed as a '386. I tend to look for best price/performance ratios. I don't want something that will perform poorly because it impacts my research as well as that of the folks who use our clusters. I don't have to go for the cheapest, especially if it's incompatible. But, I don't go for the most expensive, without due diligence, just 'cause it costs more than every one else. gerry From james.p.lux at jpl.nasa.gov Tue Feb 16 13:59:27 2010 From: james.p.lux at jpl.nasa.gov (Lux, Jim (337C)) Date: Tue, 16 Feb 2010 13:59:27 -0800 Subject: [Beowulf] Third-party drives not permitted on new Dell servers? In-Reply-To: <4B7AF893.5000103@tamu.edu> References:

<4B79F7B4.9020808@scalableinformatics.com> <4B7AF893.5000103@tamu.edu> Message-ID: > >> The 3 year cycle is driven by in large part by IRS depreciation rules which call computer equipment > a "5-year" piece of gear, but > > > > On the other hand many of the Beowulfers are in the govt. / university > > / higher-ed. domain where things run somewhat "tax free"? Not sure if > > then these IRS writeoffs then factor much into decision making or not. > > All the more reason to avoid getting locked-in to 3-year vendor > > cycles. > > At MY University, and in all my grants, I plan a 3-year depreciation > cycle (regardless of the grief I gave Jim privately). You've gotta have > a plan. That said, however, I also get to beat the inventory folk > senseless when they tell me I've got a $100k computer on inventory > that's 8 years old and is listed as a '386. How else would you be able to find ads for surplus equipment: widget $5; original government cost $50,000. (note carefully government cost != government value) And yes.. every year, we get to fight that same battle. What do you mean you can't find that 8" floppy drive! It's in the property database as costing over $5000 (when we acquired it in 1981). I'll bet we have some Apple IIs around still in inventory. Jim From brian.ropers.huilman at gmail.com Wed Feb 17 07:25:34 2010 From: brian.ropers.huilman at gmail.com (Brian D. Ropers-Huilman) Date: Wed, 17 Feb 2010 09:25:34 -0600 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: References: Message-ID: On Mon, Feb 15, 2010 at 15:56, Rahul Nabar wrote: > Are there any generic "scheduler visualization" tools out there? We've been developing in-house tools for this (both representing usage overall, but recently focusing on a scheduler view) for a while. We have a set of command-line tools to produce reports and graphs. Here's one graph for a snapshot of a view for the month of December on one of our clusters with 256 nodes and 2048 cores: http://www.msi.umn.edu/~bropers/calhoun_december.png The top is each job as it fits on the various nodes and time and the bottom is utilization. We have recently added the ability to show these graphs per user or group with their jobs highlighted and the rest dimmed and we've also added queue wait, number of jobs, and several other graphs, similar to the utilization, at the bottom: http://www.msi.umn.edu/~bropers/group_calhoun_jan2010.png We run torque with Moab and this is a result of parsing the torque logs. We are still going through and validating the code and adding features and may be at a point, somewhere in the future, where we'd be comfortable releasing it. If you want to discuss more, please contact me off-list. -- Brian D. Ropers-Huilman, Director Systems Administration and Technical Operations Minnesota Supercomputing Institute 599 Walter Library +1 612-626-5948 (V) 117 Pleasant Street S.E. +1 612-624-8861 (F) University of Minnesota Twin Cities Campus Minneapolis, MN 55455-0255 http://www.msi.umn.edu/ From hahn at mcmaster.ca Wed Feb 17 10:52:20 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 17 Feb 2010 13:52:20 -0500 (EST) Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: References: Message-ID: > http://www.msi.umn.edu/~bropers/calhoun_december.png we've done this kind of color-job band before, and found that it was difficult to read. another approach is to show jobs as logical blocks, rather than cpus mapped directly to y-axis: https://www.sharcnet.ca/dynamic_images/clusterJobsPlot.saw.png admittedly, that's not terribly pretty. and MPI implementations that busy-wait make the %cpu report less useful than it might be. > We run torque with Moab and this is a result of parsing the torque > logs. We are still going through and validating the code and adding we run LSF, a home-grown scheduler and Maui on ~21 clusters, and feed job data into a central DB which permanently records all history. graphs like above (and others that show various usage metrics by user/group/cluster/jobsize/jobtype) are derived from the DB. -mark hahn. From jac67 at georgetown.edu Wed Feb 17 11:18:41 2010 From: jac67 at georgetown.edu (Jess Cannata) Date: Wed, 17 Feb 2010 14:18:41 -0500 Subject: [Beowulf] Upcoming GPU Programming Seminar at Georgetown University Message-ID: <4B7C4111.8010008@georgetown.edu> For those of you in the D.C. area: *** Next week Wednesday (24 Feb) we are hosting a free, three hour GPU programming seminar at Georgetown University. Details can be found at http://training.arc.georgetown.edu/gpu_seminar-feb2010.html It is open to anyone interested in GPU programming. Feel free to forward to anyone who may be interested. Please RSVP to me if you plan on attending. Thanks. -- Jess Cannata High Performance Computing Georgetown University 202-687-3661 From kuenching at gmail.com Wed Feb 17 11:23:57 2010 From: kuenching at gmail.com (Tsz Kuen Ching) Date: Wed, 17 Feb 2010 14:23:57 -0500 Subject: [Beowulf] PVM 3.4.5-12 terminates when adding Host on Ubuntu 9.10 In-Reply-To: <0D2F92CE-AAEB-4D9E-9AC6-F591C9AE1773@staff.uni-marburg.de> References: <0D2F92CE-AAEB-4D9E-9AC6-F591C9AE1773@staff.uni-marburg.de> Message-ID: Hello, Thanks for the reply, I have asked around and found out that there are no firewall on the machine which blocks certain ports. Does anyone else have an idea or answer? On Sun, Feb 14, 2010 at 6:18 PM, Reuti wrote: > Am 11.02.2010 um 19:43 schrieb Tsz Kuen Ching: > > > Whenever I attempt to add a host in PVM it ends up terminating the process >> in the master program. The process does run in the slave node, however >> because the PVM terminates I do not get access to the node. >> >> I'm currently using Ubuntu 9.10, and I used apt-get to install pvm ( >> pvmlib, pvmdev, pvm). >> Thus $PVM_ROOT is set automatically, and so is $PVM_ARCH >> As for the other variables, I have not looked for them. >> >> I can ssh into the the slave without the need of a password. >> > > Do you have any firwall on the machines which blocks certain ports? > > -- Reuti > > > >> Any Ideas or suggestions? >> >> This is what happens: >> >> user at laptop> pvm >> pvm> add slave-slave >> add slave-slave >> Terminated >> user at laptop> ... >> >> The logs are as followed: >> >> Laptop log >> --- >> [t80040000] 02/11 10:23:32 laptop (127.0.1.1:55884) LINUX 3.4.5 >> [t80040000] 02/11 10:23:32 ready Thu Feb 11 10:23:32 2010 >> [t80040000] 02/11 10:23:32 netoutput() sendto: errno=22 >> [t80040000] 02/11 10:23:32 em=0x2c24f0 >> [t80040000] 02/11 10:23:32 >> [49/?][6e/?][76/?][61/?][6c/?][69/?][64/?][20/?][61/?][72/?] >> [t80040000] 02/11 10:23:32 netoutput() sendto: Invalid argument >> [t80040000] 02/11 10:23:32 pvmbailout(0) >> >> slave-log >> --- >> [t80080000] 02/11 10:23:25 slave-slave (xxx.x.x.xxx:57344) LINUX64 3.4.5 >> [t80080000] 02/11 10:23:25 ready Thu Feb 11 10:23:25 2010 >> [t80080000] 02/11 10:28:26 work() run = STARTUP, timed out waiting for >> master >> [t80080000] 02/11 10:28:26 pvmbailout(0) >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hearnsj at googlemail.com Wed Feb 17 22:57:12 2010 From: hearnsj at googlemail.com (John Hearns) Date: Thu, 18 Feb 2010 06:57:12 +0000 Subject: [Beowulf] Top500 power consumption Message-ID: <9f8092cc1002172257m4dfa8022s9f47a7502cbead4@mail.gmail.com> As I remember, the Top500 site now lists power consumption of systems, there cenrtainly is an section on the site from a few years ago discussing this. However I could not extract any figures. Does anyone know the magic buttons to press? I did find the Green500 site, which isn't very well populated with systems. From hearnsj at googlemail.com Wed Feb 17 23:11:08 2010 From: hearnsj at googlemail.com (John Hearns) Date: Thu, 18 Feb 2010 07:11:08 +0000 Subject: [Beowulf] Connecting QSFP to SFP+ Message-ID: <9f8092cc1002172311p6c39aea8ga99cc6ac8093b7fd@mail.gmail.com> I realise this is a forlorn quest. Can anyone think of a way of converting between a QSFP plug and an SFP+ socket? Ie. if I wanted to connect a QSFP cable into a 10gigabit ethernet switch which has an SFP port do I stand a cat's chance in hell? The gentleman from Mellanox who gave me a lot of help here may be permitted to have a small smile. And yes, the Q is QSFP = quad, so there are four of the things. I guess I should just bite the bullet and go 40Gbps http://www.networkworld.com/news/2009/021009-voltaire-switch.html (no, I'm not being really serious here From Shainer at mellanox.com Wed Feb 17 23:19:57 2010 From: Shainer at mellanox.com (Gilad Shainer) Date: Wed, 17 Feb 2010 23:19:57 -0800 Subject: [Beowulf] Connecting QSFP to SFP+ References: <9f8092cc1002172311p6c39aea8ga99cc6ac8093b7fd@mail.gmail.com> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F026629CB@mtiexch01.mti.com> There are several options, one for example is to use a hybrid cable - QSFP to SFP+, which you can find in the market today. Ping me for more info or other options. Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of John Hearns Sent: Wednesday, February 17, 2010 11:11 PM To: Beowulf Mailing List Subject: [Beowulf] Connecting QSFP to SFP+ I realise this is a forlorn quest. Can anyone think of a way of converting between a QSFP plug and an SFP+ socket? Ie. if I wanted to connect a QSFP cable into a 10gigabit ethernet switch which has an SFP port do I stand a cat's chance in hell? The gentleman from Mellanox who gave me a lot of help here may be permitted to have a small smile. And yes, the Q is QSFP = quad, so there are four of the things. I guess I should just bite the bullet and go 40Gbps http://www.networkworld.com/news/2009/021009-voltaire-switch.html (no, I'm not being really serious here _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From dnlombar at ichips.intel.com Thu Feb 18 06:47:54 2010 From: dnlombar at ichips.intel.com (David N. Lombard) Date: Thu, 18 Feb 2010 06:47:54 -0800 Subject: [Beowulf] Visualization toolkit to monitor scheduler performance In-Reply-To: References: Message-ID: <20100218144754.GB15298@nlxcldnl2.cl.intel.com> On Wed, Feb 17, 2010 at 07:25:34AM -0800, Brian D. Ropers-Huilman wrote: > On Mon, Feb 15, 2010 at 15:56, Rahul Nabar wrote: > > Are there any generic "scheduler visualization" tools out there? > > We've been developing in-house tools for this (both representing usage > overall, but recently focusing on a scheduler view) for a while. We > have a set of command-line tools to produce reports and graphs. Here's > one graph for a snapshot of a view for the month of December on one of > our clusters with 256 nodes and 2048 cores: > > http://www.msi.umn.edu/~bropers/calhoun_december.png ... > http://www.msi.umn.edu/~bropers/group_calhoun_jan2010.png > > We run torque with Moab and this is a result of parsing the torque > logs. We are still going through and validating the code and adding > features and may be at a point, somewhere in the future, where we'd be > comfortable releasing it. Brian, That's an outstanding job! There's a tremendous amount of information that's quite easy to understand. I do hope you release it soon. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From reuti at staff.uni-marburg.de Thu Feb 18 08:33:37 2010 From: reuti at staff.uni-marburg.de (Reuti) Date: Thu, 18 Feb 2010 17:33:37 +0100 Subject: [Beowulf] PVM 3.4.5-12 terminates when adding Host on Ubuntu 9.10 In-Reply-To: References: <0D2F92CE-AAEB-4D9E-9AC6-F591C9AE1773@staff.uni-marburg.de> Message-ID: <74775D56-4E92-4846-B9B4-F5E3B70D759A@staff.uni-marburg.de> Hi, Am 17.02.2010 um 20:23 schrieb Tsz Kuen Ching: > Thanks for the reply, I have asked around and found out that there > are no firewall on the machine which blocks certain ports. > > Does anyone else have an idea or answer? > > On Sun, Feb 14, 2010 at 6:18 PM, Reuti > wrote: > Am 11.02.2010 um 19:43 schrieb Tsz Kuen Ching: > > > Whenever I attempt to add a host in PVM it ends up terminating the > process in the master program. The process does run in the slave > node, however because the PVM terminates I do not get access to the > node. > > I'm currently using Ubuntu 9.10, and I used apt-get to install pvm > ( pvmlib, pvmdev, pvm). > Thus $PVM_ROOT is set automatically, and so is $PVM_ARCH > As for the other variables, I have not looked for them. > > I can ssh into the the slave without the need of a password. > > Do you have any firwall on the machines which blocks certain ports? > > -- Reuti > > > > Any Ideas or suggestions? > > This is what happens: > > user at laptop> pvm > pvm> add slave-slave > add slave-slave > Terminated > user at laptop> ... > > The logs are as followed: > > Laptop log > --- > [t80040000] 02/11 10:23:32 laptop (127.0.1.1:55884) LINUX 3.4.5 Does the laptop have a real address instead of 127.0.1.1, from which it can be accessed from slave-slave? Instead of using ssh, you can also startup pvm without any rsh/ssh by specifying: so=ms in the hostfile for this particular slave-slave and type a command by hand on slave-slave. -- Reuto > [t80040000] 02/11 10:23:32 ready Thu Feb 11 10:23:32 2010 > [t80040000] 02/11 10:23:32 netoutput() sendto: errno=22 > [t80040000] 02/11 10:23:32 em=0x2c24f0 > [t80040000] 02/11 10:23:32 [49/?][6e/?][76/?][61/?][6c/?][69/?][64/ > ?][20/?][61/?][72/?] > [t80040000] 02/11 10:23:32 netoutput() sendto: Invalid argument > [t80040000] 02/11 10:23:32 pvmbailout(0) > > slave-log > --- > [t80080000] 02/11 10:23:25 slave-slave (xxx.x.x.xxx:57344) LINUX64 > 3.4.5 > [t80080000] 02/11 10:23:25 ready Thu Feb 11 10:23:25 2010 > [t80080000] 02/11 10:28:26 work() run = STARTUP, timed out waiting > for master > [t80080000] 02/11 10:28:26 pvmbailout(0) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From rpnabar at gmail.com Thu Feb 18 09:37:36 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu, 18 Feb 2010 11:37:36 -0600 Subject: [Beowulf] Any recommendations for a good JBOD? Message-ID: Discussions that I read on this list in the last couple of months tempt me to do away with hardware RAID entirely for a new mini-storage-project I have to do. I am thinking of going for a JBOD with Linux Software RAID via mdadm. Hardware RAID just doesn't have the original awesomeness that it had me mesmerized with. Any recommendations for a good JBOD? The requirements are simple. 5 Terabytes total capacity. SATA drives. Don't need high performance: these are for archival home dirs. No active jobs run from this storage. Reliability and low price are key. Some kind of Direct-Attached Storage box. RAID5 or RAID6 maybe. Already have a pretty fast 8 core server with lots of RAM that I can hook this up to. Neither bandwidth nor IOPS need to be terribly high. Most of the data here is pretty static and not often moved around. One of the things I notice is that 5 Terabytes seems too low-end these days. Can't find many solutions tailored to this size. Most come with 12 or 16 bays etc. which seems excessive for this application. -- Rahul From gerry.creager at tamu.edu Thu Feb 18 10:12:05 2010 From: gerry.creager at tamu.edu (Gerald Creager) Date: Thu, 18 Feb 2010 12:12:05 -0600 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: References: Message-ID: <4B7D82F5.9050503@tamu.edu> For what you're describing, I'd consider CoRAID's AoE technology and system, and use their RAID6 capability. Otherwise, get yourself a box with up to 8 slots, preferably with hot-swap capability, and forge ahead. gerry Rahul Nabar wrote: > Discussions that I read on this list in the last couple of months > tempt me to do away with hardware RAID entirely for a new > mini-storage-project I have to do. I am thinking of going for a JBOD > with Linux Software RAID via mdadm. Hardware RAID just doesn't have > the original awesomeness that it had me mesmerized with. > > Any recommendations for a good JBOD? The requirements are simple. 5 > Terabytes total capacity. SATA drives. Don't need high performance: > these are for archival home dirs. No active jobs run from this > storage. Reliability and low price are key. Some kind of > Direct-Attached Storage box. RAID5 or RAID6 maybe. Already have a > pretty fast 8 core server with lots of RAM that I can hook this up to. > Neither bandwidth nor IOPS need to be terribly high. Most of the data > here is pretty static and not often moved around. > > One of the things I notice is that 5 Terabytes seems too low-end these > days. Can't find many solutions tailored to this size. Most come with > 12 or 16 bays etc. which seems excessive for this application. > -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From chekh at pcbi.upenn.edu Thu Feb 18 10:14:01 2010 From: chekh at pcbi.upenn.edu (Alex Chekholko) Date: Thu, 18 Feb 2010 13:14:01 -0500 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: References: Message-ID: <20100218131401.18a528a0.chekh@pcbi.upenn.edu> On Thu, 18 Feb 2010 11:37:36 -0600 Rahul Nabar wrote: > Discussions that I read on this list in the last couple of months > tempt me to do away with hardware RAID entirely for a new > mini-storage-project I have to do. I am thinking of going for a JBOD > with Linux Software RAID via mdadm. Hardware RAID just doesn't have > the original awesomeness that it had me mesmerized with. > > Any recommendations for a good JBOD? The requirements are simple. 5 > Terabytes total capacity. SATA drives. Don't need high performance: > these are for archival home dirs. No active jobs run from this > storage. Reliability and low price are key. Some kind of > Direct-Attached Storage box. RAID5 or RAID6 maybe. Already have a > pretty fast 8 core server with lots of RAM that I can hook this up to. > Neither bandwidth nor IOPS need to be terribly high. Most of the data > here is pretty static and not often moved around. > > One of the things I notice is that 5 Terabytes seems too low-end these > days. Can't find many solutions tailored to this size. Most come with > 12 or 16 bays etc. which seems excessive for this application. > Does it need to be rack-mount? What kind of interface? If performance is really not an issue, a consumer-level NAS box is probably your cheapest option. A QNap TS-410 is ~$450 plus 4 x 2TB drives... -- Alex Chekholko chekh at pcbi.upenn.edu From fly at anydata.co.uk Thu Feb 18 10:28:42 2010 From: fly at anydata.co.uk (Fred Youhanaie) Date: Thu, 18 Feb 2010 18:28:42 +0000 Subject: [Beowulf] Top500 power consumption In-Reply-To: <9f8092cc1002172257m4dfa8022s9f47a7502cbead4@mail.gmail.com> References: <9f8092cc1002172257m4dfa8022s9f47a7502cbead4@mail.gmail.com> Message-ID: <4B7D86DA.9000506@anydata.co.uk> On 23/12/42 20:59, John Hearns wrote: > As I remember, the Top500 site now lists power consumption of systems, > there cenrtainly is an section on the site from a few years ago > discussing this. However I could not extract any figures. Does anyone > know the magic buttons to press? > I did find the Green500 site, which isn't very well populated with systems. > Hi John Try the XML or Excel link from any of the specific lists, e.g. from http://top500.org/lists/2009/11 use http://top500.org/static/lists/xml/TOP500_200911_all.xml or http://top500.org/static/lists/2009/11/TOP500_200911.xls HTH Cheers f. From rpnabar at gmail.com Thu Feb 18 10:53:22 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu, 18 Feb 2010 12:53:22 -0600 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: <20100218131401.18a528a0.chekh@pcbi.upenn.edu> References: <20100218131401.18a528a0.chekh@pcbi.upenn.edu> Message-ID: On Thu, Feb 18, 2010 at 12:14 PM, Alex Chekholko wrote: > On Thu, 18 Feb 2010 11:37:36 -0600 > Does it need to be rack-mount? ?What kind of interface? Preferably rack-mount. But cost is a compelling argument . I could be convinced if a non-rack unit was significantly cheaper. I was thinking SAS / SCSI / iSCSI is probably easiest and cheapest. > If performance is really not an issue, a consumer-level NAS box is > probably your cheapest option. ?A QNap TS-410 is ~$450 plus 4 x 2TB > drives... Hmm...NAS. I was more thinking in terms of a DAS. Don't the NAS's come with their own CPU's / RAM and stuff? (Like the Sun Thumper) -- Rahul From rpnabar at gmail.com Thu Feb 18 11:00:51 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Thu, 18 Feb 2010 13:00:51 -0600 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: <4B7D82F5.9050503@tamu.edu> References: <4B7D82F5.9050503@tamu.edu> Message-ID: On Thu, Feb 18, 2010 at 12:12 PM, Gerald Creager wrote: > For what you're describing, I'd consider CoRAID's AoE technology and system, > and use their RAID6 capability. Otherwise, get yourself a box with up to 8 > slots, preferably with hot-swap capability, and forge ahead. > Thanks Gerry! That one looks promising. One thing that always confuses me: Let's say I do software RAID. So just buy a JBOD device. It still has some "controller" on it, correct? Is the quality of this controller very critical or not so much? And if a unit advertises itself as Hardware RAID can I still use it as JBOD mode? There seems more hits for "RAID" out there than for "JBOD". Maybe I am looking up the wrong keyword. -- Rahul From mwill at penguincomputing.com Thu Feb 18 11:07:16 2010 From: mwill at penguincomputing.com (Michael Will) Date: Thu, 18 Feb 2010 11:07:16 -0800 Subject: [Beowulf] Any recommendations for a good JBOD? References: <4B7D82F5.9050503@tamu.edu> Message-ID: <433093DF7AD7444DA65EFAFE3987879CCBD88B@orca.penguincomputing.com> Often a jbod is just sas/sata attached, and the real controller is in the host that attaches to it. It could then be a hardware raid controller from adaptec, or one of the very fast lsi sas/sata hca's which you could then use with software raid and/or LVM... Michael -----Original Message----- From: beowulf-bounces at beowulf.org on behalf of Rahul Nabar Sent: Thu 2/18/2010 11:00 AM To: gerry.creager at tamu.edu Cc: Beowulf Mailing List Subject: Re: [Beowulf] Any recommendations for a good JBOD? On Thu, Feb 18, 2010 at 12:12 PM, Gerald Creager wrote: > For what you're describing, I'd consider CoRAID's AoE technology and system, > and use their RAID6 capability. Otherwise, get yourself a box with up to 8 > slots, preferably with hot-swap capability, and forge ahead. > Thanks Gerry! That one looks promising. One thing that always confuses me: Let's say I do software RAID. So just buy a JBOD device. It still has some "controller" on it, correct? Is the quality of this controller very critical or not so much? And if a unit advertises itself as Hardware RAID can I still use it as JBOD mode? There seems more hits for "RAID" out there than for "JBOD". Maybe I am looking up the wrong keyword. -- Rahul _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Thu Feb 18 11:21:35 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Thu, 18 Feb 2010 14:21:35 -0500 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: References: Message-ID: <86E799F6-FA24-4B70-8BFD-1E907026DD9F@scalableinformatics.com> 5TB is fairly low end. Our 6 and 9 TB DV units do this with 12 drives. Uses mdadm and our tools atop it. Don't have pricing in front of me but they are quite inexpensive. Iscsi nfs cifs yadda yadda. 2 gbe ports you can drive at full speed. Please pardon brevity and typos ... Sent from my iPhone On Feb 18, 2010, at 12:37 PM, Rahul Nabar wrote: > Discussions that I read on this list in the last couple of months > tempt me to do away with hardware RAID entirely for a new > mini-storage-project I have to do. I am thinking of going for a JBOD > with Linux Software RAID via mdadm. Hardware RAID just doesn't have > the original awesomeness that it had me mesmerized with. > > Any recommendations for a good JBOD? The requirements are simple. 5 > Terabytes total capacity. SATA drives. Don't need high performance: > these are for archival home dirs. No active jobs run from this > storage. Reliability and low price are key. Some kind of > Direct-Attached Storage box. RAID5 or RAID6 maybe. Already have a > pretty fast 8 core server with lots of RAM that I can hook this up to. > Neither bandwidth nor IOPS need to be terribly high. Most of the data > here is pretty static and not often moved around. > > One of the things I notice is that 5 Terabytes seems too low-end these > days. Can't find many solutions tailored to this size. Most come with > 12 or 16 bays etc. which seems excessive for this application. > > -- > Rahul > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin > Computing > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From beckerjes at mail.nih.gov Thu Feb 18 11:26:04 2010 From: beckerjes at mail.nih.gov (Jesse Becker) Date: Thu, 18 Feb 2010 14:26:04 -0500 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: <4B7D82F5.9050503@tamu.edu> References: <4B7D82F5.9050503@tamu.edu> Message-ID: <20100218192604.GP15788@mail.nih.gov> On Thu, Feb 18, 2010 at 01:12:05PM -0500, Gerald Creager wrote: >For what you're describing, I'd consider CoRAID's AoE technology and >system, and use their RAID6 capability. Otherwise, get yourself a box >with up to 8 slots, preferably with hot-swap capability, and forge ahead. I'll second this recommendation. The Coraid servers are fairly inexpensive, variously support 4, 16 or 24 drives depending on model, and will accept any drives you care to throw in it. Coraid has been very good about this in the past, although they do maintain a list of problematic drives they recommend against using. That said, they will sell you a 'certified' drive if you want one. Performance is decent, especially given the price/capacity ratio. It does not need to be fully populated either, so you can grow into the system over time. The AoE protocol is well supported in Linux, (and theoretically other OSes, but I've not tested those). I also agree with using the built-in RAID abilties instead of using it as a JBOD--the rebuild times are murder. Coraid also provide tools to "extract" your data from bare drives in an emergency situation as well. >gerry > >Rahul Nabar wrote: >> Discussions that I read on this list in the last couple of months >> tempt me to do away with hardware RAID entirely for a new >> mini-storage-project I have to do. I am thinking of going for a JBOD >> with Linux Software RAID via mdadm. Hardware RAID just doesn't have >> the original awesomeness that it had me mesmerized with. >> >> Any recommendations for a good JBOD? The requirements are simple. 5 >> Terabytes total capacity. SATA drives. Don't need high performance: >> these are for archival home dirs. No active jobs run from this >> storage. Reliability and low price are key. Some kind of >> Direct-Attached Storage box. RAID5 or RAID6 maybe. Already have a >> pretty fast 8 core server with lots of RAM that I can hook this up to. >> Neither bandwidth nor IOPS need to be terribly high. Most of the data >> here is pretty static and not often moved around. >> >> One of the things I notice is that 5 Terabytes seems too low-end these >> days. Can't find many solutions tailored to this size. Most come with >> 12 or 16 bays etc. which seems excessive for this application. >> > >-- >Gerry Creager -- gerry.creager at tamu.edu >Texas Mesonet -- AATLT, Texas A&M University >Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 >Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Jesse Becker NHGRI Linux support (Digicon Contractor) From chekh at pcbi.upenn.edu Thu Feb 18 11:55:37 2010 From: chekh at pcbi.upenn.edu (Alex Chekholko) Date: Thu, 18 Feb 2010 14:55:37 -0500 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: References: <20100218131401.18a528a0.chekh@pcbi.upenn.edu> Message-ID: <20100218145537.6ee9145b.chekh@pcbi.upenn.edu> On Thu, 18 Feb 2010 12:53:22 -0600 Rahul Nabar wrote: > On Thu, Feb 18, 2010 at 12:14 PM, Alex Chekholko wrote: > > On Thu, 18 Feb 2010 11:37:36 -0600 > > Does it need to be rack-mount? ?What kind of interface? > > Preferably rack-mount. But cost is a compelling argument . I could be > convinced if a non-rack unit was significantly cheaper. > > I was thinking SAS / SCSI / iSCSI is probably easiest and cheapest. > Do you already have a suitable SAS or SCSI controller in the host machine? If not, then you have to factor in the cost of the controller. If you want iSCSI, then you're looking at a low-end SAN as opposed to a DAS. But the SAN/NAS distinction is blurry these days, as many devices can give you either block or file-level access. > > If performance is really not an issue, a consumer-level NAS box is > > probably your cheapest option. ?A QNap TS-410 is ~$450 plus 4 x 2TB > > drives... > > Hmm...NAS. I was more thinking in terms of a DAS. Don't the NAS's come > with their own CPU's / RAM and stuff? (Like the Sun Thumper) Yes, they do. But if you want to access 5TB via iSCSI (or NFS), that's likely the cheapest option. The cheapest 4-bay NAS I can find via the comparison charts here: http://www.smallnetbuilder.com/nas is only $281 (plus drives). -- Alex Chekholko chekh at pcbi.upenn.edu From gerry.creager at tamu.edu Thu Feb 18 12:15:45 2010 From: gerry.creager at tamu.edu (Gerald Creager) Date: Thu, 18 Feb 2010 14:15:45 -0600 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: <433093DF7AD7444DA65EFAFE3987879CCBD88B@orca.penguincomputing.com> References: <4B7D82F5.9050503@tamu.edu> <433093DF7AD7444DA65EFAFE3987879CCBD88B@orca.penguincomputing.com> Message-ID: <4B7D9FF1.6040202@tamu.edu> Friends don't let friends use Adaptec controllers if they really want RAID. gerry Michael Will wrote: > Often a jbod is just sas/sata attached, and the real controller is in > the host that attaches to it. It could then be a hardware raid controller > from adaptec, or one of the very fast lsi sas/sata hca's which you could > then use with software raid and/or LVM... > > Michael > > > -----Original Message----- > From: beowulf-bounces at beowulf.org on behalf of Rahul Nabar > Sent: Thu 2/18/2010 11:00 AM > To: gerry.creager at tamu.edu > Cc: Beowulf Mailing List > Subject: Re: [Beowulf] Any recommendations for a good JBOD? > > On Thu, Feb 18, 2010 at 12:12 PM, Gerald Creager > wrote: > > For what you're describing, I'd consider CoRAID's AoE technology and > system, > > and use their RAID6 capability. Otherwise, get yourself a box with up > to 8 > > slots, preferably with hot-swap capability, and forge ahead. > > > > Thanks Gerry! That one looks promising. > > One thing that always confuses me: > > Let's say I do software RAID. So just buy a JBOD device. It still has > some "controller" on it, correct? Is the quality of this controller > very critical or not so much? And if a unit advertises itself as > Hardware RAID can I still use it as JBOD mode? There seems more hits > for "RAID" out there than for "JBOD". Maybe I am looking up the wrong > keyword. > > -- > Rahul > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Gerry Creager -- gerry.creager at tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From kuenching at gmail.com Thu Feb 18 10:43:29 2010 From: kuenching at gmail.com (Tsz Kuen Ching) Date: Thu, 18 Feb 2010 13:43:29 -0500 Subject: [Beowulf] PVM 3.4.5-12 terminates when adding Host on Ubuntu 9.10 In-Reply-To: <74775D56-4E92-4846-B9B4-F5E3B70D759A@staff.uni-marburg.de> References: <0D2F92CE-AAEB-4D9E-9AC6-F591C9AE1773@staff.uni-marburg.de> <74775D56-4E92-4846-B9B4-F5E3B70D759A@staff.uni-marburg.de> Message-ID: Hello, Thanks for your help! It works now, after changing the host file to point at it's own IP address instead of the default localhost, things worked fine. - Kuen On Thu, Feb 18, 2010 at 11:33 AM, Reuti wrote: > Hi, > > Am 17.02.2010 um 20:23 schrieb Tsz Kuen Ching: > > > Thanks for the reply, I have asked around and found out that there are no >> firewall on the machine which blocks certain ports. >> >> Does anyone else have an idea or answer? >> >> On Sun, Feb 14, 2010 at 6:18 PM, Reuti >> wrote: >> Am 11.02.2010 um 19:43 schrieb Tsz Kuen Ching: >> >> >> Whenever I attempt to add a host in PVM it ends up terminating the process >> in the master program. The process does run in the slave node, however >> because the PVM terminates I do not get access to the node. >> >> I'm currently using Ubuntu 9.10, and I used apt-get to install pvm ( >> pvmlib, pvmdev, pvm). >> Thus $PVM_ROOT is set automatically, and so is $PVM_ARCH >> As for the other variables, I have not looked for them. >> >> I can ssh into the the slave without the need of a password. >> >> Do you have any firwall on the machines which blocks certain ports? >> >> -- Reuti >> >> >> >> Any Ideas or suggestions? >> >> This is what happens: >> >> user at laptop> pvm >> pvm> add slave-slave >> add slave-slave >> Terminated >> user at laptop> ... >> >> The logs are as followed: >> >> Laptop log >> --- >> [t80040000] 02/11 10:23:32 laptop (127.0.1.1:55884) LINUX 3.4.5 >> > > Does the laptop have a real address instead of 127.0.1.1, from which it can > be accessed from slave-slave? Instead of using ssh, you can also startup pvm > without any rsh/ssh by specifying: so=ms in the hostfile for this particular > slave-slave and type a command by hand on slave-slave. > > -- Reuto > > > > [t80040000] 02/11 10:23:32 ready Thu Feb 11 10:23:32 2010 >> [t80040000] 02/11 10:23:32 netoutput() sendto: errno=22 >> [t80040000] 02/11 10:23:32 em=0x2c24f0 >> [t80040000] 02/11 10:23:32 >> [49/?][6e/?][76/?][61/?][6c/?][69/?][64/?][20/?][61/?][72/?] >> [t80040000] 02/11 10:23:32 netoutput() sendto: Invalid argument >> [t80040000] 02/11 10:23:32 pvmbailout(0) >> >> slave-log >> --- >> [t80080000] 02/11 10:23:25 slave-slave (xxx.x.x.xxx:57344) LINUX64 3.4.5 >> [t80080000] 02/11 10:23:25 ready Thu Feb 11 10:23:25 2010 >> [t80080000] 02/11 10:28:26 work() run = STARTUP, timed out waiting for >> master >> [t80080000] 02/11 10:28:26 pvmbailout(0) >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eugen at leitl.org Fri Feb 19 01:30:12 2010 From: eugen at leitl.org (Eugen Leitl) Date: Fri, 19 Feb 2010 10:30:12 +0100 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: References: <20100218131401.18a528a0.chekh@pcbi.upenn.edu> Message-ID: <20100219093011.GT17686@leitl.org> On Thu, Feb 18, 2010 at 12:53:22PM -0600, Rahul Nabar wrote: > On Thu, Feb 18, 2010 at 12:14 PM, Alex Chekholko wrote: > > On Thu, 18 Feb 2010 11:37:36 -0600 > > Does it need to be rack-mount? ?What kind of interface? > > Preferably rack-mount. But cost is a compelling argument . I could be > convinced if a non-rack unit was significantly cheaper. > > I was thinking SAS / SCSI / iSCSI is probably easiest and cheapest. > > > > If performance is really not an issue, a consumer-level NAS box is > > probably your cheapest option. ?A QNap TS-410 is ~$450 plus 4 x 2TB > > drives... > > Hmm...NAS. I was more thinking in terms of a DAS. Don't the NAS's come > with their own CPU's / RAM and stuff? (Like the Sun Thumper) I know this isn't your design space, but Matt on zfs-discuss posted the following part list http://www.acmemicro.com/estore/merchant.ihtml?pid=5440&lastcatid=53&step=4 http://www.newegg.com/Product/Product.aspx?Item=N82E16820139043 http://www.acmemicro.com/estore/merchant.ihtml?pid=4518&step=4 http://www.acmemicro.com/estore/merchant.ihtml?pid=6708&step=4 http://www.newegg.com/Product/Product.aspx?Item=N82E16819117187 http://www.newegg.com/Product/Product.aspx?Item=N82E16835203002 later he added I'm just going to use the single 4x SAS. 1200MB/sec should be a great plenty for 24 drives total. I'm going to be mounting 2x SSD for ZIL and 2x SSD for ARC, then 20-2TB drives. I'm guessing that with a random I/O workload, I'll never hit the 1200MB/sec peak that the 4x SAS can sustain. Also - for the ZIL I will be using 2x 32GB Intel X25-E SLC drives, and for the ARC I'll be using 2x 160GB Intel X25M MLC drives. I'm hoping that the cache will allow me to saturate gigabit and eventually infiniband. -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From rpnabar at gmail.com Fri Feb 19 06:22:13 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 19 Feb 2010 08:22:13 -0600 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: <20100218145537.6ee9145b.chekh@pcbi.upenn.edu> References: <20100218131401.18a528a0.chekh@pcbi.upenn.edu> <20100218145537.6ee9145b.chekh@pcbi.upenn.edu> Message-ID: On Thu, Feb 18, 2010 at 1:55 PM, Alex Chekholko wrote: > On Thu, 18 Feb 2010 12:53:22 -0600 > Rahul Nabar wrote: > >> >> I was thinking SAS / SCSI / iSCSI is probably easiest and cheapest. > Do you already have a suitable SAS or SCSI controller in the host > machine? ?If not, then you have to factor in the cost of the controller. No. true. I have to factor in that price. But almost any kind of disk array I can think of will need a controller, correct? Or are there any JBOD formats that can be attached without putting in a controller in the server. Are SAS / SCSI controllers generic? Or are they paired to the JBOD one buys? In the past I've only had hardware RAID so the card usually came from te vendor that was selling the storage array. Also those cards did RAID+Controller whereas this time I'll be shopping around for a controller-only type of card since I want to do RAID via mdadm. > If you want iSCSI, then you're looking at a low-end SAN as opposed to a > DAS. ?But the SAN/NAS distinction is blurry these days, as many devices > can give you either block or file-level access. Yes, true. I'm dropping iSCSC entirely. Don't have the $$ to do a SAN with fibre switches etc. >> > If performance is really not an issue, a consumer-level NAS box is >> > probably your cheapest option. ?A QNap TS-410 is ~$450 plus 4 x 2TB >> > drives... >> >> Hmm...NAS. I was more thinking in terms of a DAS. Don't the NAS's come >> with their own CPU's / RAM and stuff? (Like the Sun Thumper) > > Yes, they do. ?But if you want to access 5TB via iSCSI (or NFS), that's > likely the cheapest option. That's quite non-intuitive to me. If it' a NAS they must need procs+RAM+NICs on board. How does that get cheaper than an equivalent "dumb" JBOD which outsources all these 3 functions to the attached host server? Maybe I am missing a part of the argument. I already have a server that the JBOD can be attached to so that cost to me is a sunk cost. I just need to consider the incrementals above that. -- Rahul From hahn at mcmaster.ca Fri Feb 19 09:29:43 2010 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 19 Feb 2010 12:29:43 -0500 (EST) Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: References: <20100218131401.18a528a0.chekh@pcbi.upenn.edu> <20100218145537.6ee9145b.chekh@pcbi.upenn.edu> Message-ID: >>> I was thinking SAS / SCSI / iSCSI is probably easiest and cheapest. the concept of scsi/sas being cheap is rather amusing. >> Do you already have a suitable SAS or SCSI controller in the host >> machine? ?If not, then you have to factor in the cost of the controller. > > No. true. I have to factor in that price. But almost any kind of disk > array I can think of will need a controller, correct? Or are there any unless it already has the controller, of course. most motherboards these days come with at least 6x 3 Gb sata ports, for instance. > JBOD formats that can be attached without putting in a controller in > the server. I was thinking of esata, and a 5-disk external enclosure with port-multiplier. if your system already has sata, you might need to add an esata header (or possibly a controller if the existing controller doesn't support port multipliers.) 5 disks on PM would be a pretty simple way to add JBOD for a md-based raid5. >> If you want iSCSI, then you're looking at a low-end SAN as opposed to a >> DAS. ?But the SAN/NAS distinction is blurry these days, as many devices >> can give you either block or file-level access. > > Yes, true. I'm dropping iSCSC entirely. Don't have the $$ to do a SAN > with fibre switches etc. iSCSI doesn't require SAN infrastructure, of course. that's kind of the point: you plug it into your existing ethernet fabric. for the low-overhead application you describe, it's a reasonable fit, except that even low-end iSCSI/NAS boxes tend to ramp up in price. that is, comparable to what you'd pay for a cheap uATX system (which would be of about the same speed, power, space and performance, not surprisingly.) >> Yes, they do. ?But if you want to access 5TB via iSCSI (or NFS), that's >> likely the cheapest option. > > That's quite non-intuitive to me. If it' a NAS they must need > procs+RAM+NICs on board. How does that get cheaper than an equivalent > "dumb" JBOD which outsources all these 3 functions to the attached > host server? Maybe I am missing a part of the argument. procs+ram+nic can easily total less than $100; enclosures can be very cheap as well. that's what's so appealing about that approach: it's fully user-servicable, and you don't have to depend on some random vendor to maintain firmware, supported-disk lists, etc. of course, that's also the main downside: you have just adopted another system, albeit embedded, to maintain. > I already have a server that the JBOD can be attached to so that cost > to me is a sunk cost. I just need to consider the incrementals above > that. right - 10 years ago, the cost overhead of the system was larger. nowadays, integration and moore's law has made small systems very cheap. this is good, since disks are incredibly cheap as well. (bad if you're in the storage business, where it looks a little funny to justify thousands of dollars of controller/etc infrastructure when the disks cost $100 or so. disk arrays can still make sense, of course, but availability of useful cheap commodity systems has changed the equation. regards, mark hahn. From bdobbins at gmail.com Fri Feb 19 10:25:07 2010 From: bdobbins at gmail.com (Brian Dobbins) Date: Fri, 19 Feb 2010 13:25:07 -0500 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? Message-ID: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> Hi guys, I'm beginning to look into configurations for a new cluster and with the AMD 12-core and Intel 8-core chips 'here' (or coming soonish), I'm curious if anyone has any data on the effects of the messaging rate of the IB cards. With a 4-socket node having between 32 and 48 cores, lots of computing can get done fast, possibly stressing the network. I know Qlogic has made a big deal about the InfiniPath adapter's extremely good message rate in the past... is this still an important issue? How do the latest Mellanox adapters compare? (Qlogic documents a ~30M messages processsed per second rate on its QLE7342, but I didn't see a number on the Mellanox ConnectX-2... and more to the point, do people see this effecting them?) On a similar note, does a dual-port card provide an increase in on-card processing, or 'just' another link? (The increased bandwidth is certainly nice, even in a flat switched network, I'm sure!) I'm primarily concerned with weather and climate models here - WRF, CAM, CCSM, etc., and clearly the communication rate will depend to a large degree on the resolutions used, but any information, even 'gut instincts' people have are welcome. The more info the merrier. Thanks very much, - Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From landman at scalableinformatics.com Fri Feb 19 10:47:07 2010 From: landman at scalableinformatics.com (Joe Landman) Date: Fri, 19 Feb 2010 13:47:07 -0500 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? In-Reply-To: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> Message-ID: <4B7EDCAB.5090403@scalableinformatics.com> Brian Dobbins wrote: > > Hi guys, > > I'm beginning to look into configurations for a new cluster and with > the AMD 12-core and Intel 8-core chips 'here' (or coming soonish), I'm > curious if anyone has any data on the effects of the messaging rate of > the IB cards. With a 4-socket node having between 32 and 48 cores, lots > of computing can get done fast, possibly stressing the network. The big issue will be contention for the resource. As you scale up the number of requesters, if the number of resources don't also scale up (even vitualized non-blocking HCA/NICs are good here), you could hit a problem at some point. > I know Qlogic has made a big deal about the InfiniPath adapter's > extremely good message rate in the past... is this still an important > issue? How do the latest Mellanox adapters compare? (Qlogic documents > a ~30M messages processsed per second rate on its QLE7342, but I didn't > see a number on the Mellanox ConnectX-2... and more to the point, do > people see this effecting them?) We see this on the storage side. Massive oversubscription of resources leads to contention issues for links, to ib packet requeue failures among other things. > > On a similar note, does a dual-port card provide an increase in > on-card processing, or 'just' another link? (The increased bandwidth is > certainly nice, even in a flat switched network, I'm sure!) Depends. If the card can talk to the PCIe bus at full speed, you might be able to saturate the link with a single QDR port. If your card is throttled for some reason (we have seen this) then adding the extra port might or might not help. If you are at the design stage, I'd suggest "go wide" as you can ... as many IB HCAs as you can get to keep the number of ports/core as high as reasonable. Of course I'd have to argue the same thing on the storage side :) > I'm primarily concerned with weather and climate models here - WRF, > CAM, CCSM, etc., and clearly the communication rate will depend to a > large degree on the resolutions used, but any information, even 'gut > instincts' people have are welcome. The more info the merrier. > > Thanks very much, > - Brian > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics, Inc. email: landman at scalableinformatics.com web : http://scalableinformatics.com http://scalableinformatics.com/jackrabbit phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615 From bdobbins at gmail.com Fri Feb 19 12:23:20 2010 From: bdobbins at gmail.com (Brian Dobbins) Date: Fri, 19 Feb 2010 15:23:20 -0500 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? In-Reply-To: <4B7EDCAB.5090403@scalableinformatics.com> References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> <4B7EDCAB.5090403@scalableinformatics.com> Message-ID: <2b5e0c121002191223w4ef0db3cu24816ff671b62abc@mail.gmail.com> Hi Joe, I'm beginning to look into configurations for a new cluster and with the >> AMD 12-core and Intel 8-core chips 'here' (or coming soonish), I'm curious >> if anyone has any data on the effects of the messaging rate of the IB cards. >> With a 4-socket node having between 32 and 48 cores, lots of computing can >> get done fast, possibly stressing the network. >> > > The big issue will be contention for the resource. As you scale up the > number of requesters, if the number of resources don't also scale up (even > vitualized non-blocking HCA/NICs are good here), you could hit a problem at > some point. My knowledge of the latest low-level hardware is sadly out of date - does a virtualized non-blocking HCA mean that I can have one HCA which virtualizes into four (one per socket say), and each of those four has its own memory-mapped buffer so that I don't get cache invalidation / contention on multi-socket boxes, or am I totally off-base here? I'm all for scaling up NICs as I scale up cores, but each additional NIC / HCA port means more switch ports, which adds up fast. In fact, if I have a standard 2-socket node now, with 8 cores in it and a DDR IB port, and then get a 2-socket node with 24 cores in it and a QDR IB port,... how's the math work? I've got 3x the cores, 1x the adapters, but that adapter has 2x the speed. Blah. I know Qlogic has made a big deal about the InfiniPath adapter's extremely >> good message rate in the past... is this still an important issue? How do >> the latest Mellanox adapters compare? (Qlogic documents a ~30M messages >> processsed per second rate on its QLE7342, but I didn't see a number on the >> Mellanox ConnectX-2... and more to the point, do people see this effecting >> them?) >> > We see this on the storage side. Massive oversubscription of resources > leads to contention issues for links, to ib packet requeue failures among > other things. So (ignoring disk latencies and just focusing on link contention), is there any difference between using 2x the storage nodes or the same number of storage nodes, but with 2x the NICs? On a similar note, does a dual-port card provide an increase in on-card >> processing, or 'just' another link? (The increased bandwidth is certainly >> nice, even in a flat switched network, I'm sure!) >> > > Depends. If the card can talk to the PCIe bus at full speed, you might be > able to saturate the link with a single QDR port. If your card is throttled > for some reason (we have seen this) then adding the extra port might or > might not help. If you are at the design stage, I'd suggest "go wide" as > you can ... as many IB HCAs as you can get to keep the number of ports/core > as high as reasonable. > Oh dear. I need to go re-learn a lot of things. So if I want multiple full-speed QDR cards in a node, I need that node to have independent PCIe buses, and each card to be placed on a separate bus. Of course I'd have to argue the same thing on the storage side :) No argument from me there! Thanks again, as always, for your input. - Brian -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdidomenico4 at gmail.com Fri Feb 19 12:49:11 2010 From: mdidomenico4 at gmail.com (Michael Di Domenico) Date: Fri, 19 Feb 2010 15:49:11 -0500 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? In-Reply-To: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> Message-ID: the folks on the linux-rdma mailing list can probably share some slides with you about app load over different cards. if you dont get a response, i can drop a few names of people who definitely have the info, but i dont want to do it at large on the list The last set of slides i can (thinking way back when i was still with qlogic) recall, yes the ipath cards could do 30m mesg/sec whereas the mlnx cards were half to a two thirds lower. this does have an affect on core count traffic, but only under certain application loads mlnx and qlogic made a trade off with their card designs, the qlogic cards have a real high msg/rate, but the rdma bandwidth performance can suffer in certain cases, where as mlnx has a higher rdma bandwidth but a lower mesg rate getting the balance right is a mastery of art. and the tipping point slides easily and with every application it's been a while since i was at qlogic and have forgotten a lot of the sales mumbo... On Fri, Feb 19, 2010 at 1:25 PM, Brian Dobbins wrote: > > Hi guys, > > ? I'm beginning to look into configurations for a new cluster and with the > AMD 12-core and Intel 8-core chips 'here' (or coming soonish), I'm curious > if anyone has any data on the effects of the messaging rate of the IB > cards.? With a 4-socket node having between 32 and 48 cores, lots of > computing can get done fast, possibly stressing the network. > > ? I know Qlogic has made a big deal about the InfiniPath adapter's extremely > good message rate in the past... is this still an important issue?? How do > the latest Mellanox adapters compare?? (Qlogic documents a ~30M messages > processsed per second rate on its QLE7342, but I didn't see a number on the > Mellanox ConnectX-2... and more to the point, do people see this effecting > them?) > > ? On a similar note, does a dual-port card provide an increase in on-card > processing, or 'just' another link?? (The increased bandwidth is certainly > nice, even in a flat switched network, I'm sure!) > > ? I'm primarily concerned with weather and climate models here - WRF, CAM, > CCSM, etc., and clearly the communication rate will depend to a large degree > on the resolutions used, but any information, even 'gut instincts' people > have are welcome.? The more info the merrier. > > ? Thanks very much, > ? - Brian > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > From Shainer at mellanox.com Fri Feb 19 13:09:40 2010 From: Shainer at mellanox.com (Gilad Shainer) Date: Fri, 19 Feb 2010 13:09:40 -0800 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F02662C76@mtiexch01.mti.com> When you look on low level, marketing driven benchmarks, you should be careful. Mellanox latest message rate numbers with ConnectX-2 more than doubled versus the old cards, and are for real message rate - separate messages on the wire. The competitor numbers are with using message coalescing, so it is not real separate messages on the wire, or not really message rate. Gilad -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Michael Di Domenico Sent: Friday, February 19, 2010 12:49 PM To: beowulf at beowulf.org Subject: Re: [Beowulf] Q: IB message rate & large core counts (per node)? the folks on the linux-rdma mailing list can probably share some slides with you about app load over different cards. if you dont get a response, i can drop a few names of people who definitely have the info, but i dont want to do it at large on the list The last set of slides i can (thinking way back when i was still with qlogic) recall, yes the ipath cards could do 30m mesg/sec whereas the mlnx cards were half to a two thirds lower. this does have an affect on core count traffic, but only under certain application loads mlnx and qlogic made a trade off with their card designs, the qlogic cards have a real high msg/rate, but the rdma bandwidth performance can suffer in certain cases, where as mlnx has a higher rdma bandwidth but a lower mesg rate getting the balance right is a mastery of art. and the tipping point slides easily and with every application it's been a while since i was at qlogic and have forgotten a lot of the sales mumbo... On Fri, Feb 19, 2010 at 1:25 PM, Brian Dobbins wrote: > > Hi guys, > > ? I'm beginning to look into configurations for a new cluster and with the > AMD 12-core and Intel 8-core chips 'here' (or coming soonish), I'm curious > if anyone has any data on the effects of the messaging rate of the IB > cards.? With a 4-socket node having between 32 and 48 cores, lots of > computing can get done fast, possibly stressing the network. > > ? I know Qlogic has made a big deal about the InfiniPath adapter's extremely > good message rate in the past... is this still an important issue?? How do > the latest Mellanox adapters compare?? (Qlogic documents a ~30M messages > processsed per second rate on its QLE7342, but I didn't see a number on the > Mellanox ConnectX-2... and more to the point, do people see this effecting > them?) > > ? On a similar note, does a dual-port card provide an increase in on-card > processing, or 'just' another link?? (The increased bandwidth is certainly > nice, even in a flat switched network, I'm sure!) > > ? I'm primarily concerned with weather and climate models here - WRF, CAM, > CCSM, etc., and clearly the communication rate will depend to a large degree > on the resolutions used, but any information, even 'gut instincts' people > have are welcome.? The more info the merrier. > > ? Thanks very much, > ? - Brian > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From lindahl at pbm.com Fri Feb 19 13:57:30 2010 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 19 Feb 2010 13:57:30 -0800 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? In-Reply-To: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> Message-ID: <20100219215730.GK2857@bx9.net> On Fri, Feb 19, 2010 at 01:25:07PM -0500, Brian Dobbins wrote: > I know Qlogic has made a big deal about the InfiniPath adapter's extremely > good message rate in the past... is this still an important issue? Yes, for many codes. If I recall stuff I published a while ago, WRF sent a surprising number of short messages. But really, the right approach for you is to do some benchmarking. Arguing about microbenchmarks is pointless; they only give you clues that help explain your real application results. I believe that both QLogic and Mellanox have test clusters you can borrow. Tom Elken ought to have some WRF data he can share with you, showing message sizes as a function of cluster size for one of the usual WRF benchmark datasets. > On a similar note, does a dual-port card provide an increase in on-card > processing, or 'just' another link? (The increased bandwidth is certainly > nice, even in a flat switched network, I'm sure!) Published microbenchmarks in for Mellanox parts the SDR/DDR generation showed that only large messages got a benefit. I've never seen any application benchmarks comparing 1 and 2 port cards. -- greg (formerly the system architect of InfiniPath's SDR and DDR generations) From lindahl at pbm.com Fri Feb 19 14:05:38 2010 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 19 Feb 2010 14:05:38 -0800 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? In-Reply-To: <9FA59C95FFCBB34EA5E42C1A8573784F02662C76@mtiexch01.mti.com> References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> <9FA59C95FFCBB34EA5E42C1A8573784F02662C76@mtiexch01.mti.com> Message-ID: <20100219220538.GL2857@bx9.net> > Mellanox latest message rate numbers with ConnectX-2 more than > doubled versus the old cards, and are for real message rate - > separate messages on the wire. The competitor numbers are with using > message coalescing, so it is not real separate messages on the wire, > or not really message rate. Gilad, I think you forgot which side you're supposed to be supporting. The only people I have ever seen publish message rate with coalesced messages are DK Panda (with Mellanox cards) and Mellanox. QLogic always hated coalesced messages, and if you look back in the archive for this mailing list, you'll see me denouncing coalesced messages as meanless about 1 microsecond after the first result was published by Prof. Panda. Looking around the Internet, I don't see any numbers ever published by PathScale/QLogic using coalesced messages. At the end of the day, the only reason microbenchmarks are useful is when they help explain why one interconnect does better than another on real applications. No customer should ever choose which adapter to buy based on microbenchmarks. -- greg (formerly employed by QLogic) From lindahl at pbm.com Fri Feb 19 14:17:21 2010 From: lindahl at pbm.com (Greg Lindahl) Date: Fri, 19 Feb 2010 14:17:21 -0800 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? In-Reply-To: <4B7EDCAB.5090403@scalableinformatics.com> References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com> <4B7EDCAB.5090403@scalableinformatics.com> Message-ID: <20100219221721.GM2857@bx9.net> On Fri, Feb 19, 2010 at 01:47:07PM -0500, Joe Landman wrote: > The big issue will be contention for the resource. Joe, What "the resource" is depends on implementation. All network cards have the limit of the line rate of the network. As far as I can tell, the Mellanox IB cards have a limited number of engines that process messages. For short messages from a lot of CPUs, they don't have enough. For long messages, they have plenty, & hit the line rate. Don't storage systems typically send mostly long messages? The InfiniPath (now True Scale) design uses a pipelined approach. You can analytically compute the performance on short messages by knowing 2 numbers: the line rate, and the "dead time" between back-to-back packets, which is determined by the length of the longest pipeline stage. I was thrilled when we figured out that our performance graph was exactly determined by that equation. And the pipeline is a resource that you can't oversubscribe. -- greg (formerly... yada yada) From Shainer at mellanox.com Fri Feb 19 14:36:34 2010 From: Shainer at mellanox.com (Gilad Shainer) Date: Fri, 19 Feb 2010 14:36:34 -0800 Subject: [Beowulf] Q: IB message rate & large core counts (per node)? References: <2b5e0c121002191025p521196bdm941cd3f018e8b305@mail.gmail.com><9FA59C95FFCBB34EA5E42C1A8573784F02662C76@mtiexch01.mti.com> <20100219220538.GL2857@bx9.net> Message-ID: <9FA59C95FFCBB34EA5E42C1A8573784F02662CA3@mtiexch01.mti.com> Nice to hear from you Greg, hope all is well. I don't forget anything, at least for now. OSU has different benchmarks so you can measure message coalescing or real message rate. Funny to read that Q hated coalescing when they created the first benchmark for that ...:-) but lets not argue on that. Nowadays it seems that QLogic promotes the message rate as non coalescing data and I almost got bought by their marketing machine till I looked on at the data on the wire... interesting what the bits and bytes and symbols can tell you... -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Greg Lindahl Sent: Friday, February 19, 2010 2:06 PM To: beowulf at beowulf.org Subject: Re: [Beowulf] Q: IB message rate & large core counts (per node)? > Mellanox latest message rate numbers with ConnectX-2 more than > doubled versus the old cards, and are for real message rate - > separate messages on the wire. The competitor numbers are with using > message coalescing, so it is not real separate messages on the wire, > or not really message rate. Gilad, I think you forgot which side you're supposed to be supporting. The only people I have ever seen publish message rate with coalesced messages are DK Panda (with Mellanox cards) and Mellanox. QLogic always hated coalesced messages, and if you look back in the archive for this mailing list, you'll see me denouncing coalesced messages as meanless about 1 microsecond after the first result was published by Prof. Panda. Looking around the Internet, I don't see any numbers ever published by PathScale/QLogic using coalesced messages. At the end of the day, the only reason microbenchmarks are useful is when they help explain why one interconnect does better than another on real applications. No customer should ever choose which adapter to buy based on microbenchmarks. -- greg (formerly employed by QLogic) _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From patrick at myri.com Fri Feb 19 14:56:18 2010 From: patrick at myri.com (Patrick Geoffray) Date: Fri, 19 Feb 2010 17:56:18 -0500 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: <20100218192604.GP15788@mail.nih.gov> References: <4B7D82F5.9050503@tamu.edu> <20100218192604.GP15788@mail.nih.gov> Message-ID: <4B7F1712.3040303@myri.com> On 2/18/2010 2:26 PM, Jesse Becker wrote: > On Thu, Feb 18, 2010 at 01:12:05PM -0500, Gerald Creager wrote: >> For what you're describing, I'd consider CoRAID's AoE technology and > I'll second this recommendation. The Coraid servers are fairly +1. The AoE spec is very simple, I wish it would have more traction outside CoRaID. On the opposite, iSCSI is a utter mess with all the bad technical choices. Patrick From rpnabar at gmail.com Fri Feb 19 15:05:01 2010 From: rpnabar at gmail.com (Rahul Nabar) Date: Fri, 19 Feb 2010 17:05:01 -0600 Subject: [Beowulf] Any recommendations for a good JBOD? In-Reply-To: <4B7F1712.3040303@myri.com> References: <4B7D82F5.9050503@tamu.edu> <20100218192604.GP15788@mail.nih.gov> <4B7F1712.3040303@myri.com> Message-ID: On Fri, Feb 19, 2010 at 4:56 PM, Patrick Geoffray