From tjrc at sanger.ac.uk Tue May 1 01:16:56 2007 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Tue, 1 May 2007 09:16:56 +0100 Subject: [Beowulf] Sorry sorry sorry Message-ID: <53A48808-E518-4C0B-B003-821790E4599E@sanger.ac.uk> Ouch. I am now the colour of a beetroot/cranberry/red-fruit-of-your- choice. No idea how that happened. I suspect incorrect automatic mail client address autocompletion by Apple Mail, coupled with idiotic user (me) sending mail too late at night. Sorry about that folks. At least it wasn't anything incriminating... Tim From j.boyle at manchester.ac.uk Tue May 1 07:32:26 2007 From: j.boyle at manchester.ac.uk (Jonathan Boyle) Date: Tue, 1 May 2007 15:32:26 +0100 Subject: Fwd: Re: [Beowulf] Why is communication so expensive for very small messages? Message-ID: <200704261216.37229.j.boyle@manchester.ac.uk> Thanks, we're using 1.2.6, so we'll have to look into upgrading. ---------- Forwarded Message ---------- Subject: Re: [Beowulf] Why is communication so expensive for very small messages? Date: Tue, 24 Apr 2007 18:03:51 -0600 From: "Michael H. Frese" To: Jonathan Boyle Cc: beowulf at beowul.org Sorry, the most recent version of mpich1 is 1.2.7. The older version that was doing the message aggregation was 1.2.1. >You don't say which version of mpich you are using, but we found small >messages taking 1 ms last fall. Upgrading from an old version of mpich1 >(ca. 2001) to the most recent version (ca. 2005, 1.2.27?) fixed the >problem. The problem was probably one of the OS holding the teensy little >messages hoping for more data to send it with -- message aggregation, I >suppose it is called. The newer version of mpich must have set the OS >flags properly to prevent that. > >I can't tell you about mpich2, as we have no experience with that yet. Mike Frese At 09:54 AM 4/24/2007, you wrote: >I apologise if this is a naive question, but I'm new to this world of >beowulfs. > >I'm using C++/mpi, to get a feel for communication costs I ran tests using >mpptest and my own programs. > >For 2 processor blocking calls, mpptest indicates a latency of about 30 >microseconds. > >However when I measure communication times in my own program using a loop as >follows.... > >MPI_Barrier(MPI_COMM_WORLD); >start = MPI_Wtime(); >for (unsigned t=1; t<=5000; t++) >{ > if (my_rank==0) > { > MPI_Send(data, size, MPI_INT, 1, tag, MPI_COMM_WORLD); > } > else > { > MPI_Recv(data, size, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); > } >} >end = MPI_Wtime(); > >for size>=4, I get a latency of about 30 microseconds as expected, however >for >size<4, communication costs increase massively, and latency now appears to > be 1ms! > >Firstly, I assume this isn't normal? > >Secondly, can anyone suggest what's going on, or where I can go for more >information. > >Many thanks. > >We're using mpich. > >Processors are Intel(R) Xeon(TM) CPU 3.60GHz. > >Interconnects are Dell PowerConnect 5324 24-port gigabit switches. > > >_______________________________________________ >Beowulf mailing list, Beowulf at beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf ------------------------------------------------------- From mathog at caltech.edu Tue May 1 12:27:46 2007 From: mathog at caltech.edu (David Mathog) Date: Tue, 01 May 2007 12:27:46 -0700 Subject: [Beowulf] Strange hardware? problems Message-ID: Robert G. Brown" wrote > I've been coding, one way or another, for coming up on 35 years or > thereabouts, starting with paper tape, going through cards (lots of > cards), and up the evolutionary ladder. In all of that time, I've > encountered one -- count it, one -- time that a consistent error in code > I was running was due to a real failure in the hardware I was running on > and not a bug in my own code. RGB has an extra 5 years on me, but my experience has been similar: only very, very, very rarely is a program fault the result of a true hardware issue. (This excludes anything that runs from one box to another over a cable or fiber, where hardware issues are more common.) We once tracked a bug in an FFT subroutine running on an array processor to faulty memory, and right down to a memory pattern suggesting two address pins were shorted together. On opening the beast up, sure enough, the short was right where it had to be, and it was repaired with a scalpel. This was around 1982. Anyway, one caveat. With the proliferation of x86 variants I now on occasion hit a binary which has been compiled for some other processor variant that blows up when it tries to use an instruction which is not supported on the processor it is actually running on. As I mentioned previously, valgrind can catch these for you. Or recompile using switches you know are supported on the target processor. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From orion at cora.nwra.com Tue May 1 14:41:28 2007 From: orion at cora.nwra.com (Orion Poplawski) Date: Tue, 01 May 2007 15:41:28 -0600 Subject: [Beowulf] Re:Strange hardware? problems In-Reply-To: References: Message-ID: <4637B408.8070706@cora.nwra.com> David Mathog wrote: > Since the same code runs differently on two different Opteron models > it's probably either a memory access issue or the use of a compiler > flag that enables some feature on one model that is not present > on the other. For instance, SSE3 vs. SSE2, although I don't know > enough about these models to tell you what the most likely flag would > be. (The fact that it runs ok on the newer one and blows up on the > older one is consistent with this type of error.) > > Assuming gcc, recompile with: > > -O0 -g -std=c99 -Wall > > and clean up any warnings that result until you get a clean build. > Repeat with -O3 and -O2, as for strange reasons that sometimes uncovers > logic problems not seen at -O0. Then run the resulting binary > within valgrind. Fix any memory access violations which are found. > Valgrind can also alert you to the use of unsupported operations. > The code compiles quite cleanly, but I am seeing different behavior with different compiler flags and different compilers. We'll see if I can bisect the problem into a small enough box. Thanks for the poke in that direction... -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com From rgb at phy.duke.edu Tue May 1 16:28:08 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 1 May 2007 19:28:08 -0400 (EDT) Subject: [Beowulf] Strange hardware? problems In-Reply-To: References: Message-ID: On Tue, 1 May 2007, David Mathog wrote: > Anyway, one caveat. With the proliferation of x86 variants I now > on occasion hit a binary which has been compiled for some other > processor variant that blows up when it tries to use an instruction > which is not supported on the processor it is actually running on. As I > mentioned previously, valgrind can catch these for you. Or recompile > using switches you know are supported on the target processor. Completely agreed. In fact in my case the problem was a mix of compiler and the fact that I was using an Intel CPU which had an obscure multiplication bug that was eventually worked around in the compiler. The point is that EVEN if the problem turns out to be hardware or compiler or something nominally "beyond your control", the solution is always the same. Instrument the hell out of your code, run it to failure, accumulate data on the failure thereby, reinstrument to catch the failure more tightly, iterate until you know that this run failed between these two instructions and that the values of all possible loop indices and variables at that time was the following vector of numbers and that they all are/are not what they should be and if not they began to diverge here, and here are the values of everything around THAT point. Then you may have to literally single step through the logic to see where either you asked the machine to multiple six (and the variable indeed contained six) times seven (and the variable indeed contained seven) and the stupid computer returned forty-ONE. Or where it returned 42 but three lines further on your index went out of bounds on the dynamic array pointer and you overwrote 42 with 0xA321FD07 (random garbage). The worst bug I can recall ever having to squash in my own code was back in my fortran days, where I had strong type checking and everything. On a single line in a program of several thousand lines I had typed an N instead of an M in a program that used both (in fact N was absolute value of M). Where I did this both made sense, but N was not in fact initialized and hence had a fixed value of zero. Zero was a possible value -- these were angular momentum indices -- and the function values returned were not only plausible they were correct for certain values of the input parameters to the overall program. They where just wrong, usually by a fixed factor, for others (and fortunately I had a pretty strong idea of what right was). EVEN instrumenting the code to where I literally single stepped through the fortran -- and this was code on cards, so there was nothing like an interactive debugger, mind -- I swear I stared at the code for close to a week before the N/M typo finally jumped out of all those lines and smacked me right between the eyes. The point being don't expect the process to be easy. At least you have the advantage of having a high probability of failure. The worst that can happen to you is that as you instrument the code with output lines the problem will disappear. This, too, has happened to me on more than one occassion -- changing the alignment of the code even in small ways sometimes causes a failure to be missed if the problem is with pointers and memory or a failure like the one David describes. I actually debugged a "heisenbug" like this at a different level in scripted code on Friday. A long and complex program was being started up on a dedicated server as the last step in the boot process. The program itself wrote extensively to output during startup and was initiated via a fairly standard init.d script that backgrounded the startup call. The system was booting fine, you could see the application startup occurring "successfully", but post-boot it wasn't running! Consistently. If you started it by hand, it came up perfectly. When I started to instrument the startup script to "watch" it start up, it suddenly started to come up during the boot. It turned out to be a race condition between the program and the completion of the rc boot process by init. The program took five seconds to start and wrote to stdout the whole way in the background. When the rc script finished in the foreground, it went away taking the backgrounded script's tty with it, so it crashed without a trace. Running it by hand from a tty obviously worked fine as long as you watched. Running it by script in the boot worked fine as long as you watched. To get it to work WITHOUT watching, one had to either add a 6 second sleep to the startup script to wait for it to finish writing to stdout (and hence get a /var/log/messages trace) or redirect stdout and stderr to /dev/null (and lose it). Or rewrite the code itself that was being backgrounded to log directly and not write to stdout except in debug mode... but it wasn't my code. Or maybe even move the code up to where at least five seconds worth of other startup remained before rc boot completed. Bugs can be SUBTLE. Bugs can be heisenbugs like this that are other people's "fault" (but your problem). Be patient, be systematic, be meditative and await Enlightenment. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From pxrist at gmail.com Tue May 1 03:34:57 2007 From: pxrist at gmail.com (Panagiotis Christopoulos) Date: Tue, 1 May 2007 13:34:57 +0300 Subject: [Beowulf] Problem while booting diskless node. In-Reply-To: <82C63820-6368-481D-BE08-74B573EC3E30@ulb.ac.be> References: <82C63820-6368-481D-BE08-74B573EC3E30@ulb.ac.be> Message-ID: <200705011334.57546.pxrist@gmail.com> On Monday 30 April 2007 13:04, Maxime Kinet wrote: > Hi, > > I'm trying to set up my first cluster with diskless nodes. To achieve > that, I'm using PXElinux on a server, running Fedora Core 6, and a > NFS-mounted root partition on the node. > Everything works perfectly > (getting the IP address, loading the kernel and mounting the > filesytem) until the node has to run some binaries located into /sbin > during the boot process. Apparently it's unable to execute them > because they have been compiled with dynamically linked libraries and > not statically. The /sbin directory of the node is a simple copy of > the one of the server. I suppose, that you have done something like, creating a /diskless directory inside your nfs,tftp,dhcp etc... server, copying files(eg. /usr/* ) from the server, inside that /diskless dir with the same hierarchy, and this resulted in a structure which would home, your nfs exported, root fs of your nodes. I'm not an expert, but because you said about dynamic libraries, I cannot understand, why this is a problem, you copied /sbin inside /diskless, but you didn't copy /lib or /usr/lib? The problem with dynamic libraries, starts because your /diskless does not have these libraries. > I tried to avoid the problem using the busybox > tools, and it worked a bit better but then it couldn't execute bash > scripts such as rc.sysinit. Have you created an init in your busybox, to chroot(exec switch_root) inside your nfs root fs after mounting it? > As anybody ever encountered such problems and what should I do to > solve it? recompile the kernel of the node or of the server? change > the distribution? Are there any other simpler method to proceed than > using PXE? There are two things you can do. As Douglas Eadline said, it starts by thinking if you want to reinvent the wheel or not. If you have time, machines, if you know that your teachers won't get annoyed and you can work in a university lab, so you will not pay for the power supply yourself:p continue with fedora and all these brainstorming things. You will learn linux administration and propably you will do amazing things. If you don't have time etc. the guys in warewulf, are doing the same job for about 7 years, and they provide you all this knowledge they gained, in a simple installation process. Back in the technical stuff, from your sayings, I think that something is wrong with your /diskless dir(if you have one, of course). I cannot understand why you want to use busybox. We use busybox when why want an initramfs to do specific jobs(such as unlocking and mounting encrypted partitions,yes, I know this is not the best example I could give) before chrooting inside our real root fs and exec init as Mark Hahn said or if we are running embedded. Also, I don't think that you have to change your distribution, and if you don't like PXE, you can see how the guys in LTSP boot( I think they use both PXE and "etherboot" and you can make a choice), but for me, syslinux is fine! This was my point of view, I hope I helped and if you want to ask something, feel free to send me a mail, or of course, ask again in the list, Panagiotis Christopoulos System Administrator Technological Institute of Athens Department of Informatics From supercomputer at gmail.com Tue May 1 05:34:52 2007 From: supercomputer at gmail.com (Chris Vaughan) Date: Tue, 1 May 2007 13:34:52 +0100 Subject: [Beowulf] Syslog Server-Traffic Message-ID: <216ee070705010534r7cc4eddr9328b8f8ea223925@mail.gmail.com> Hello, I'm researching setting up a cluster and I'm curious as to whether or not it's a good idea to set up a syslog server. The question I have is whether the traffic created from logging is going to slow down my network to the point of poor performance? What are peoples experiences with logging. The cluster will have a management network and a computational network. This is what I'm thinking: Greater than 64 Nodes, Yes. Greater than 64 and less than 128, Maybe? Greater than 128 No Any input would be great, Thanks! -- ------------------------------ Christopher Vaughan From lbickley at bickleywest.com Tue May 1 08:00:29 2007 From: lbickley at bickleywest.com (Lyle Bickley) Date: Tue, 1 May 2007 08:00:29 -0700 Subject: [Beowulf] Sorry sorry sorry In-Reply-To: <53A48808-E518-4C0B-B003-821790E4599E@sanger.ac.uk> References: <53A48808-E518-4C0B-B003-821790E4599E@sanger.ac.uk> Message-ID: <200705010800.29733.lbickley@bickleywest.com> On Tuesday 01 May 2007 01:16, Tim Cutts wrote: > Ouch. I am now the colour of a beetroot/cranberry/red-fruit-of-your- > choice. > > No idea how that happened. I suspect incorrect automatic mail client > address autocompletion by Apple Mail, coupled with idiotic user (me) > sending mail too late at night. > > Sorry about that folks. At least it wasn't anything incriminating... Gosh, I thought you included the Beowulf list because you were suggesting that any of us visiting the UK (and Cambridge in particular) were welcome in your home ;-) Lyle -- Lyle Bickley Bickley Consulting West Inc. Mountain View, CA http://bickleywest.com "Black holes are where God is dividing by zero" From dkondo at lri.fr Wed May 2 06:33:43 2007 From: dkondo at lri.fr (Derrick Kondo) Date: Wed, 2 May 2007 15:33:43 +0200 Subject: [Beowulf] [CFP] EuroPVM/MPI'07 -- submission deadline extended to May 14th Message-ID: <60ec14620705020633r2ed22633k773717e55c1df862@mail.gmail.com> The full paper submission deadline for EuroPVM/MPI 2007 has been extended to May 14th at 11:59 AM (noon) UTC ***However, please submit paper abstracts by May 7th.*** ************************************************************************ *** *** *** CALL FOR PAPERS *** *** *** ************************************************************************ EuroPVM/MPI 2007 14th European PVMMPI Users' Group Meeting Paris, France, September 30 - October 3, 2007 web: http://www.pvmmpi07.org e-mail: chairs at pvmmpi07.org submission deadline for papers abstracts: May 7th, 2007 submission deadline for full papers and poster abstracts: extended to May 14th, 2007 at 11:59 AM (noon) UTC submission site: http://pvmmpi07.lri.fr/submissions organized by Project Grand-Large (http://grand-large.lri.fr/index.php/Accueil) from INRIA Futurs (http://www-futurs.inria.fr) ------------------------------------------------------------------------------------------- BACKGROUND AND TOPICS PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) have evolved into the standard interfaces for high-performance parallel programming in the message-passing paradigm. EuroPVM/MPI is the most prominent meeting dedicated to the latest developments of PVM and MPI such as new support tools, implementation and applications using these interfaces. The EuroPVM/MPI meeting naturally encourages discussions of new message-passing and other parallel and distributed programming paradigms beyond MPI and PVM. The 14th European PVM/MPI Users' Group Meeting will be a forum for users and developers of PVM, MPI, and other message-passing programming environments. Through the presentation of contributed papers, vendor presentations, poster presentations and invited talks, attendees will have the opportunity to share ideas and experiences to contribute to the improvement and furthering of message-passing and related parallel programming paradigms. Topics of interest for the meeting include, but are not limited to: * PVM and MPI implementation issues and improvements * Latest extensions to PVM and MPI * PVM and MPI for high-performance computing, clusters and grid environments * New message-passing and hybrid parallel programming paradigms * Interaction between message-passing software and hardware * Fault tolerance in message-passing programs * Performance evaluation of PVM and MPI applications * Tools and environments for PVM and MPI * Algorithms using the message-passing paradigm * Applications in science and engineering based on message-passing This year special emphasis will be put on large-scale issues, such as those related to hardware and interconnect techologies, or the potential or demonstrated shortcomings of PVM or MPI. As in the preceding years, the special session 'ParSim' will focus on numerical simulation for parallel engineering environments. EuroPVM/MPI 2007 will also hold the new 'Outstanding Papers' session introduced in 2006, where the best papers selected by the program committee will be presented. SUBMISSION INFORMATION Submission site: http://pvmmpi07.lri.fr/submissions Contributors are invited to submit a full paper as a PDF (or Postscript) document not exceeding 8 pages in English (2 pages for poster abstracts and Late and Breaking Results). The title page should contain an abstract of at most 100 words and five specific keywords. The paper needs to be formatted according to the Springer LNCS guidelines [2]. The usage of LaTeX for preparation of the contribution as well as the submission in camera ready format is strongly recommended. Style files can be found at the URL [2]. New work that is not yet mature for a full paper, short observations, and similar brief announcements are invited for the poster session. Contributions to the poster session should be submitted in the form of a two-page abstract. All these contributions will be fully peer reviewed by the program committee. Submissions to the special session 'Current Trends in Numerical Simulation for Parallel Engineering Environments' (ParSim 2007) are handled and reviewed by the respective session chairs. For more information please refer to the ParSim website [1]. All accepted submissions are expected to be presented at the conference by one of the authors, which requires registration for the conference. IMPORTANT DATES Submission of paper abstracts May 7th, 2007 Submission of full papers and poster abstracts May 14th, 2007 at 11:59 AM (noon) UTC Notification of authors June 19th, 2007 Camera-ready papers July 9th, 2007 Submission of Late and Breaking Results September 15th, 2007 Tutorials September 30th, 2007 Conference October 1st-3rd, 2007 For up-to-date information, visit the conference web site at http//www.pvmmpi07.org. PROCEEDINGS In addition, selected papers of the conference, including those from the 'Outstanding Papers' session, will be considered for publication in a special issue of Parallel Computing in an extended format. GENERAL CHAIR * Jack Dongarra (University of Tennessee) PROGRAM CHAIRS * Franck Cappello (INRIA Futurs) * Thomas Herault (Universite Paris Sud-XI / INRIA Futurs) CONFERENCE VENUE The conference will be held in the historical, cultural and economic center of Paris, the capital of France. The city, which is renowned for its neo-classical architecture, hosts many museums and galleries and has an active nightlife. The symbol of Paris is the 324 metre (1,063 ft) Eiffel Tower on the banks of the Seine. Dubbed "the City of Light" (la Ville Lumiere) since the 19th century, Paris is regarded by many as one of the most beautiful and romantic cities in the world. It is also the most visited city in the world with more than 30 million foreign visitors per year. Paris is easily reachable from any European capital and most of the large European, American and Asian cities. It is an ideal starting point for visiting european institutes and cities. REFERENCES [1] ParSim 2007: http://wwwbode.in.tum.de/Par/arch/events/parsim07/ [2] Springer Guidelines: http://www.springer.de/comp/lncs/authors.html From landman at scalableinformatics.com Wed May 2 07:16:14 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Wed, 02 May 2007 10:16:14 -0400 Subject: [Beowulf] fast file copying In-Reply-To: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> Message-ID: <46389D2E.7040305@scalableinformatics.com> Geoff Galitz wrote: > > Hi folks, > > During an HPC talk some years ago, I recall someone mentioned a tool > which can copy large datasets across a cluster using a ring topology. > Perhaps someone here knows of this tool? There are a few, commercial, and open source. On the commercial side is exludus, xcp, and maybe one or two others. Exludus is basically a file pre-caching mechanism. Java based. xcp (by Scalable) is MPI based. It does a pretty good job of moving data. \ On the open source side, I havent seen things other than the udp broadcast based tools (we had written one several years ago, named mcp), but anyone using a cluster will tell you that udp broadcast can be very detrimental to non-udp broadcast usage of the switch, say for logins, NFS, command and control, ...) > > More to the point, we are pushing around datasets that are about > 1Gbyte. The datasets are pushed out to dozens of nodes all at once and > we foresee saturating the I/O system on our cluster as we grow. We are > limited to using just the available disks and are looking for a > reasonable solution that can support this kind of simultaneous access. xcp might help. > Currently we push the data out using rsync, but if I don't get any > better ideas I may simply move to a pull system where the data is > fetched by HTTP. I can get better throttling that way, at least. For a few dozen nodes, this might work. Joe > > -geoff > > > Geoff Galitz > geoff at galitz.org > > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From scheinin at crs4.it Wed May 2 07:30:09 2007 From: scheinin at crs4.it (Alan Louis Scheinine) Date: Wed, 02 May 2007 16:30:09 +0200 Subject: [Beowulf] fast file copying In-Reply-To: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> Message-ID: <4638A071.1080008@crs4.it> One possibility is nettee. http://saf.bio.caltech.edu/nettee.html The current version of nettee is 0.1.7, from July 20,2005. nettee is a network "tee" program. It can typically transfer data between N nodes at (nearly) the full bandwidth provided by the switch which connects them. It is handy for cloning nodes or moving large database files. From eugen at leitl.org Wed May 2 07:50:39 2007 From: eugen at leitl.org (Eugen Leitl) Date: Wed, 2 May 2007 16:50:39 +0200 Subject: [Beowulf] EuroPVM/MPI'07 -- submission deadline extended to May 14th Message-ID: <20070502145039.GK17691@leitl.org> ----- Forwarded message from Rolf Rabenseifner ----- From: Rolf Rabenseifner Date: Wed, 2 May 2007 16:50:28 +0200 (CEST) To: eugen at leitl.org Subject: EuroPVM/MPI'07 -- submission deadline extended to May 14th Dear HLRS User or member of my course-invitation-list, as member of the program committee, I'm sending you the CFP for the EuroPVM/MPI 2007. The deadline for full paper and poster abstract submissions is May 14th, 2007 (with deadline for abstracts already on May 7th). Best regards Rolf Rabenseifner ------------------------------------------------------------------------- The full paper submission deadline for EuroPVM/MPI 2007 has been extended to May 14th at 11:59 AM (noon) UTC ***However, please submit paper abstracts by May 7th.*** ************************************************************************ *** *** *** CALL FOR PAPERS *** *** *** ************************************************************************ EuroPVM/MPI 2007 14th European PVMMPI Users' Group Meeting Paris, France, September 30 - October 3, 2007 web: http://www.pvmmpi07.org e-mail: chairs at pvmmpi07.org submission deadline for papers abstracts: May 7th, 2007 submission deadline for full papers and poster abstracts: extended to May 14th, 2007 at 11:59 AM (noon) UTC submission site: http://pvmmpi07.lri.fr/submissions organized by Project Grand-Large (http://grand-large.lri.fr/index.php/Accueil) from INRIA Futurs (http://www-futurs.inria.fr) ------------------------------------------------------------------------------------------- BACKGROUND AND TOPICS PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) have evolved into the standard interfaces for high-performance parallel programming in the message-passing paradigm. EuroPVM/MPI is the most prominent meeting dedicated to the latest developments of PVM and MPI such as new support tools, implementation and applications using these interfaces. The EuroPVM/MPI meeting naturally encourages discussions of new message-passing and other parallel and distributed programming paradigms beyond MPI and PVM. The 14th European PVM/MPI Users' Group Meeting will be a forum for users and developers of PVM, MPI, and other message-passing programming environments. Through the presentation of contributed papers, vendor presentations, poster presentations and invited talks, attendees will have the opportunity to share ideas and experiences to contribute to the improvement and furthering of message-passing and related parallel programming paradigms. Topics of interest for the meeting include, but are not limited to: * PVM and MPI implementation issues and improvements * Latest extensions to PVM and MPI * PVM and MPI for high-performance computing, clusters and grid environments * New message-passing and hybrid parallel programming paradigms * Interaction between message-passing software and hardware * Fault tolerance in message-passing programs * Performance evaluation of PVM and MPI applications * Tools and environments for PVM and MPI * Algorithms using the message-passing paradigm * Applications in science and engineering based on message-passing This year special emphasis will be put on large-scale issues, such as those related to hardware and interconnect techologies, or the potential or demonstrated shortcomings of PVM or MPI. As in the preceding years, the special session 'ParSim' will focus on numerical simulation for parallel engineering environments. EuroPVM/MPI 2007 will also hold the new 'Outstanding Papers' session introduced in 2006, where the best papers selected by the program committee will be presented. SUBMISSION INFORMATION Submission site: http://pvmmpi07.lri.fr/submissions Contributors are invited to submit a full paper as a PDF (or Postscript) document not exceeding 8 pages in English (2 pages for poster abstracts and Late and Breaking Results). The title page should contain an abstract of at most 100 words and five specific keywords. The paper needs to be formatted according to the Springer LNCS guidelines [2]. The usage of LaTeX for preparation of the contribution as well as the submission in camera ready format is strongly recommended. Style files can be found at the URL [2]. New work that is not yet mature for a full paper, short observations, and similar brief announcements are invited for the poster session. Contributions to the poster session should be submitted in the form of a two-page abstract. All these contributions will be fully peer reviewed by the program committee. Submissions to the special session 'Current Trends in Numerical Simulation for Parallel Engineering Environments' (ParSim 2007) are handled and reviewed by the respective session chairs. For more information please refer to the ParSim website [1]. All accepted submissions are expected to be presented at the conference by one of the authors, which requires registration for the conference. IMPORTANT DATES Submission of paper abstracts May 7th, 2007 Submission of full papers and poster abstracts May 14th, 2007 at 11:59 AM (noon) UTC Notification of authors June 19th, 2007 Camera-ready papers July 9th, 2007 Submission of Late and Breaking Results September 15th, 2007 Tutorials September 30th, 2007 Conference October 1st-3rd, 2007 For up-to-date information, visit the conference web site at http//www.pvmmpi07.org. PROCEEDINGS In addition, selected papers of the conference, including those from the 'Outstanding Papers' session, will be considered for publication in a special issue of Parallel Computing in an extended format. GENERAL CHAIR * Jack Dongarra (University of Tennessee) PROGRAM CHAIRS * Franck Cappello (INRIA Futurs) * Thomas Herault (Universite Paris Sud-XI / INRIA Futurs) CONFERENCE VENUE The conference will be held in the historical, cultural and economic center of Paris, the capital of France. The city, which is renowned for its neo-classical architecture, hosts many museums and galleries and has an active nightlife. The symbol of Paris is the 324 metre (1,063 ft) Eiffel Tower on the banks of the Seine. Dubbed "the City of Light" (la Ville Lumiere) since the 19th century, Paris is regarded by many as one of the most beautiful and romantic cities in the world. It is also the most visited city in the world with more than 30 million foreign visitors per year. Paris is easily reachable from any European capital and most of the large European, American and Asian cities. It is an ideal starting point for visiting european institutes and cities. REFERENCES [1] ParSim 2007: http://wwwbode.in.tum.de/Par/arch/events/parsim07/ [2] Springer Guidelines: http://www.springer.de/comp/lncs/authors.html +++ We apologize if you receive this CfP more than once +++ ----- End forwarded message ----- -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From hahn at mcmaster.ca Wed May 2 21:54:54 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Thu, 3 May 2007 00:54:54 -0400 (EDT) Subject: [Beowulf] Syslog Server-Traffic In-Reply-To: <216ee070705010534r7cc4eddr9328b8f8ea223925@mail.gmail.com> References: <216ee070705010534r7cc4eddr9328b8f8ea223925@mail.gmail.com> Message-ID: > I'm researching setting up a cluster and I'm curious as to whether or > not it's a good idea to set up a syslog server. The question I have depends on how much syslog activity you have, and whether you care to look at it (in one spot). there _are_ scalable and more robust system/event logging approaches, but plain old syslog is pretty good. > is whether the traffic created from logging is going to slow down my > network to the point of poor performance? a syslog message is normally a smallish UDP packet - say 100 bytes. if you have 200 nodes each doing 5 per second, that's still only 100KB/s - a pretty small fraction of a server's gigabit bandwidth. and if you actually have 5/s, something's probably wrong... > experiences with logging. The cluster will have a management network > and a computational network. I'm always skeptical about this advice - it's obvious that it might be good in cases where a node sustains a nontrivial stream of management traffic (say, NFS traffic using jumbo frames) which would interfere with possible latency-sensitive/small MPI packets. but how often does that happen? consider that with a non-jumbo gigabit net, a full packet is only 15 us more than the ~40 or so for a minimal one. further, I observe MPI codes mostly getting packed into full nodes, and not interfering with themselves much (distinct MPI and file IO phases to the program.). I could more readily imagine segregating traffic to two nets based on packet size or TOS. or bonding them in the first place. in any case, I think you'd have to work pretty hard to generate enough syslog traffic to matter much. > Greater than 64 Nodes, Yes. > Greater than 64 and less than 128, Maybe? > Greater than 128 No for a smallish cluster like 64 nodes, I don't think I'd worry about syslog even if the net were just 100bT. for going above 200 nodes, I'd probably try to do some measurements and extrapolation, but the numbers above make it look minor. From felix.rauch.valenti at gmail.com Thu May 3 01:06:04 2007 From: felix.rauch.valenti at gmail.com (Felix Rauch Valenti) Date: Thu, 3 May 2007 18:06:04 +1000 Subject: [Beowulf] fast file copying In-Reply-To: <4638A071.1080008@crs4.it> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> <4638A071.1080008@crs4.it> Message-ID: <4eafc81b0705030106g72471b29u8344516d559e24f7@mail.gmail.com> On 03/05/07, Alan Louis Scheinine wrote: > One possibility is nettee. > http://saf.bio.caltech.edu/nettee.html > > The current version of nettee is 0.1.7, from July 20,2005. > > nettee is a network "tee" program. It can typically transfer > data between N nodes at (nearly) the full bandwidth provided by the switch > which connects them. It is handy for cloning nodes or moving large > database files. As a related side note: If the bandwidth you get is not what you expect, it may well be that your switch is bad (or that your disks are slow). That was my experience a couple of years ago, so we implemented a switch benchmark called "Switchbench", that helps to identify the bandwidth bottleneck in a network. - Felix From erwan at seanodes.com Thu May 3 03:14:14 2007 From: erwan at seanodes.com (Erwan Velu) Date: Thu, 03 May 2007 12:14:14 +0200 Subject: [Beowulf] Archives looks like broken Message-ID: <4639B5F6.30203@seanodes.com> Hey folks, I just realize the archive website looks like broken : http://www.scyld.com/pipermail/beowulf/ Can anyone fix it ? Thanks, Erwan, From orion at cora.nwra.com Thu May 3 15:48:46 2007 From: orion at cora.nwra.com (Orion Poplawski) Date: Thu, 03 May 2007 16:48:46 -0600 Subject: [Beowulf] Please help test compiler/hardware issue Message-ID: <463A66CE.3040100@cora.nwra.com> Okay, I have a test case for the problem I reported before that I've attached. We have two pairs of identical machines: - 2 Tyan S2882 Dual Processor 244 stepping 10 - 2 Tyan S2882-D Dual processor dual core Opteron 275 stepping 2 The attached code when compiled with the Portland Group Fortran compiler with -O2 and run on either of the 244's will abort in random locations: [orion at coop00 rams.debug]$ pgf95 -O2 -o testatob testatob.f90 [orion at coop00 rams.debug]$ ./testatob checkatob abort n= 246500 , i= 4685 a(i)= 8712085. b(i)= 8465585. Abort [orion at coop00 rams.debug]$ ./testatob checkatob abort n= 246500 , i= 145817 a(i)= 9592717. b(i)= 8853217. Abort [orion at coop01 rams.debug]$ time ./testatob checkatob abort n= 246500 , i= 118169 a(i)= 9565069. b(i)= 8825569. Aborted real 0m31.842s user 0m16.476s sys 0m0.060s Haven't seen it run longer than 1 minute yet. However, it runs fine on the 275's (or at least I haven't seen it crash yet). It also runs fine on the 244's when compiled with -O1. So, I guess this points to a hardware issue, but it may be a somewhat generalized hardware issue. I'd love to hear reports on other (particularly other Tyan S2882 dual 244's) systems. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com -------------- next part -------------- A non-text attachment was scrubbed... Name: testatob.f90 Type: text/x-fortran Size: 844 bytes Desc: not available URL: From rgb at phy.duke.edu Thu May 3 17:11:45 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 3 May 2007 20:11:45 -0400 (EDT) Subject: [Beowulf] Please help test compiler/hardware issue In-Reply-To: <463A66CE.3040100@cora.nwra.com> References: <463A66CE.3040100@cora.nwra.com> Message-ID: On Thu, 3 May 2007, Orion Poplawski wrote: > > Okay, I have a test case for the problem I reported before that I've > attached. > > We have two pairs of identical machines: > > - 2 Tyan S2882 Dual Processor 244 stepping 10 > - 2 Tyan S2882-D Dual processor dual core Opteron 275 stepping 2 > > The attached code when compiled with the Portland Group Fortran compiler with > -O2 and run on either of the 244's will abort in random locations: What about gfortran? Or pathscale? Mind you, I made myself actually look at the code below (shudder) in spite of it being fortran, and it looks ok as far as >>I<< can tell after not doing fortran unless my life depends on it for twenty years or so. To me it is wierd to use a(1) both as the address of a(1) (as an argument to the subroutines) and as the contents of a(1) = 1, but hey. It seems really really odd that any compiler or any program would fail on this piece of code, though. I wonder if a C memcpy would fail? Or what does stream (with a check) do? Stream's copy isn't much more than this. Maybe somebody who has used fortran more recently than the mid-eighties can comment further on the code, but to me it looks like a very odd compiler bug. rgb > > [orion at coop00 rams.debug]$ pgf95 -O2 -o testatob testatob.f90 > [orion at coop00 rams.debug]$ ./testatob > checkatob abort n= 246500 , i= 4685 a(i)= 8712085. > b(i)= 8465585. > Abort > [orion at coop00 rams.debug]$ ./testatob > checkatob abort n= 246500 , i= 145817 a(i)= 9592717. > b(i)= 8853217. > Abort > > [orion at coop01 rams.debug]$ time ./testatob > checkatob abort n= 246500 , i= 118169 a(i)= 9565069. > b(i)= 8825569. > Aborted > > real 0m31.842s > user 0m16.476s > sys 0m0.060s > > > Haven't seen it run longer than 1 minute yet. > > However, it runs fine on the 275's (or at least I haven't seen it crash yet). > It also runs fine on the 244's when compiled with -O1. > > So, I guess this points to a hardware issue, but it may be a somewhat > generalized hardware issue. I'd love to hear reports on other (particularly > other Tyan S2882 dual 244's) systems. > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From orion at cora.nwra.com Thu May 3 17:34:37 2007 From: orion at cora.nwra.com (orion at cora.nwra.com) Date: Thu, 3 May 2007 18:34:37 -0600 (MDT) Subject: [Beowulf] Please help test compiler/hardware issue In-Reply-To: <463A66CE.3040100@cora.nwra.com> References: <463A66CE.3040100@cora.nwra.com> Message-ID: <4949.71.208.238.171.1178238877.squirrel@www.cora.nwra.com> > > Okay, I have a test case for the problem I reported before Statically compiled binary at http://www.cora.nwra.com/~orion/testatob.bz2 for those of you without the PGF compiler to try. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com From bill at cse.ucdavis.edu Fri May 4 01:48:50 2007 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Fri, 04 May 2007 01:48:50 -0700 Subject: [Beowulf] fast file copying In-Reply-To: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> Message-ID: <463AF372.1010107@cse.ucdavis.edu> Geoff Galitz wrote: > > Hi folks, > > During an HPC talk some years ago, I recall someone mentioned a tool > which can copy large datasets across a cluster using a ring topology. > Perhaps someone here knows of this tool? Not sure about a ring topology, seems kinda silly... why not bit-torrent? It's opensource, extremely common, and already integrated into at least one cluster distribution. There's a zillion implementation, your favorite language is likely to have a few (at least for python, c, and java). I've installed a 170+ node rocks cluster in 10 minutes or so, the RPMs are distributed by bit-torrent so that it doesn't matter if one node dies as part of the install. Nor does it matter if your node list has some strange mapping to your physical network (which is often what you get when you ask a batch queue for 5% of a cluster). > More to the point, we are pushing around datasets that are about > 1Gbyte. The datasets are pushed out to dozens of nodes all at once and How often? I just bit-torrented a 1GB file to 165 nodes in 3 minutes, 1.5 minutes was the lazy why I launched it (the last node didn't start until 1.5 minutes into the run). BTW, 140 or so of those nodes already had 1 job per CPU running. > we foresee saturating the I/O system on our cluster as we grow. We are > limited to using just the available disks and are looking for a > reasonable solution that can support this kind of simultaneous access. There are various ways to maximize I/O with bit-torrent. Various seeders allow uploading each block only once (usually called super seeder mode). Assuming you have a few GB ram on the file server you could even prefetch the file before torrenting (i.e. dd if=file_to_server of=/dev/null) since the limit on bit-torrent bandwidth is often how quickly you can seek. Additionally you can make the chunk size larger to reduce the number of seeks. On the client side preallocation can greatly reduce the number of seeks. > Currently we push the data out using rsync, but if I don't get any > better ideas I may simply move to a pull system where the data is > fetched by HTTP. I can get better throttling that way, at least. > If you have a low churn rate you could generate a diff (with rsync) and distribute that via bit-torrent. What kind of per node bandwidths are you hoping for? 1GB sounds really easy unless you have to do it rather often. From mkinet at ulb.ac.be Wed May 2 08:11:20 2007 From: mkinet at ulb.ac.be (Maxime Kinet) Date: Wed, 2 May 2007 17:11:20 +0200 Subject: [Beowulf] Problem while booting diskless node. In-Reply-To: <200705011334.57546.pxrist@gmail.com> References: <82C63820-6368-481D-BE08-74B573EC3E30@ulb.ac.be> <200705011334.57546.pxrist@gmail.com> Message-ID: ok, I succeeded. Thanks to all for helpfull comments. ------------------ Maxime Kinet Universit? Libre de Bruxelles Physique Statistique et Plasmas, CP 231 Campus Plaine - Boulevard du Triomphe, 1050 Bruxelles. Tel. : +32-2-650.59.08 e-mail : mkinet at ulb.ac.be On 01 May 2007, at 12:34, Panagiotis Christopoulos wrote: > On Monday 30 April 2007 13:04, Maxime Kinet wrote: >> Hi, >> >> I'm trying to set up my first cluster with diskless nodes. To achieve >> that, I'm using PXElinux on a server, running Fedora Core 6, and a >> NFS-mounted root partition on the node. >> Everything works perfectly >> (getting the IP address, loading the kernel and mounting the >> filesytem) until the node has to run some binaries located into /sbin >> during the boot process. Apparently it's unable to execute them >> because they have been compiled with dynamically linked libraries and >> not statically. The /sbin directory of the node is a simple copy of >> the one of the server. > I suppose, that you have done something like, creating a /diskless > directory > inside your nfs,tftp,dhcp etc... server, copying files(eg. /usr/* ) > from the > server, inside that /diskless dir with the same hierarchy, and this > resulted > in a structure which would home, your nfs exported, root fs of your > nodes. > I'm not an expert, but because you said about dynamic libraries, I > cannot > understand, why this is a problem, you copied /sbin inside / > diskless, but you > didn't copy /lib or /usr/lib? The problem with dynamic libraries, > starts > because your /diskless does not have these libraries. >> I tried to avoid the problem using the busybox >> tools, and it worked a bit better but then it couldn't execute bash >> scripts such as rc.sysinit. > Have you created an init in your busybox, to chroot(exec > switch_root) inside > your nfs root fs after mounting it? >> As anybody ever encountered such problems and what should I do to >> solve it? recompile the kernel of the node or of the server? change >> the distribution? Are there any other simpler method to proceed than >> using PXE? > There are two things you can do. As Douglas Eadline said, it starts by > thinking if you want to reinvent the wheel or not. If you have time, > machines, if you know that your teachers won't get annoyed and you > can work > in a university lab, so you will not pay for the power supply > yourself:p > continue with fedora and all these brainstorming things. You will > learn linux > administration and propably you will do amazing things. If you > don't have > time etc. the guys in warewulf, are doing the same job for about 7 > years, and > they provide you all this knowledge they gained, in a simple > installation > process. > Back in the technical stuff, from your sayings, I think that > something is > wrong with your /diskless dir(if you have one, of course). I cannot > understand why you want to use busybox. We use busybox when why > want an > initramfs to do specific jobs(such as unlocking and mounting > encrypted > partitions,yes, I know this is not the best example I could give) > before > chrooting inside our real root fs and exec init as Mark Hahn said > or if we > are running embedded. Also, I don't think that you have to change your > distribution, and if you don't like PXE, you can see how the guys > in LTSP > boot( I think they use both PXE and "etherboot" and you can make a > choice), > but for me, syslinux is fine! > > This was my point of view, I hope I helped and if you want to ask > something, > feel free to send me a mail, or of course, ask again in the list, > > Panagiotis Christopoulos > System Administrator > Technological Institute of Athens > Department of Informatics From gmkurtzer at gmail.com Wed May 2 09:56:11 2007 From: gmkurtzer at gmail.com (Greg Kurtzer) Date: Wed, 2 May 2007 09:56:11 -0700 Subject: [Beowulf] fast file copying In-Reply-To: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> Message-ID: <2AF0822D-6311-4004-ACB8-75ECE6E0A4A0@gmail.com> I gave a talk where I referred to a "ring" boot mechanism for Warewulf using "Dolly". http://www.cs.inf.ethz.ch/CoPs/patagonia/dolly.html Thinking back now, I don't remember you being at that talk, so you are probably thinking of something else. ;) On Apr 30, 2007, at 11:35 AM, Geoff Galitz wrote: > > Hi folks, > > During an HPC talk some years ago, I recall someone mentioned a > tool which can copy large datasets across a cluster using a ring > topology. Perhaps someone here knows of this tool? > > More to the point, we are pushing around datasets that are about > 1Gbyte. The datasets are pushed out to dozens of nodes all at once > and we foresee saturating the I/O system on our cluster as we > grow. We are limited to using just the available disks and are > looking for a reasonable solution that can support this kind of > simultaneous access. Currently we push the data out using rsync, > but if I don't get any better ideas I may simply move to a pull > system where the data is fetched by HTTP. I can get better > throttling that way, at least. > > -geoff > > > Geoff Galitz > geoff at galitz.org > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- Greg Kurtzer I believe the world would be a better place if people didn't believe in their beliefs. -- gmk From estair at ilm.com Wed May 2 13:30:41 2007 From: estair at ilm.com (Eli Stair) Date: Wed, 02 May 2007 13:30:41 -0700 Subject: [Beowulf] Syslog Server-Traffic In-Reply-To: <216ee070705010534r7cc4eddr9328b8f8ea223925@mail.gmail.com> References: <216ee070705010534r7cc4eddr9328b8f8ea223925@mail.gmail.com> Message-ID: <4638F4F1.9060705@ilm.com> If you engineer the config well, you won't have any significant amount of "insignificant" traffic. Whether your situation is vulnerable to minute amounts of traffic and client-side processing of packets sent is site-specific. Using syslog-ng (or several other options), you can configure the compute nodes to only send log messages you want to be aware of (or restrict from sending known messages you don't want logged), as well as doing rate-limiting to avoid spamming your network/logserver in the event of a typical freak-out event. /eli Chris Vaughan wrote: > Hello, > > I'm researching setting up a cluster and I'm curious as to whether or > not it's a good idea to set up a syslog server. The question I have > is whether the traffic created from logging is going to slow down my > network to the point of poor performance? What are peoples > experiences with logging. The cluster will have a management network > and a computational network. > > This is what I'm thinking: > > Greater than 64 Nodes, Yes. > Greater than 64 and less than 128, Maybe? > Greater than 128 No > > Any input would be great, Thanks! > > -- > ------------------------------ > Christopher Vaughan > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From victux at gmail.com Wed May 2 15:57:23 2007 From: victux at gmail.com (Victor Gomez) Date: Wed, 2 May 2007 17:57:23 -0500 Subject: [Beowulf] SSH without login in nodes Message-ID: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> Hi, Im config a cluster with ssh password less, but the users can login into nodes. I want the clients, use de queue system (Torque, its works fine), without access into nodes. In the past, use rlogin, without rlogin. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vaughanc at gmail.com Thu May 3 05:22:51 2007 From: vaughanc at gmail.com (Chris Vaughan) Date: Thu, 3 May 2007 13:22:51 +0100 Subject: [Beowulf] Syslog Server-Traffic In-Reply-To: References: <216ee070705010534r7cc4eddr9328b8f8ea223925@mail.gmail.com> Message-ID: <216ee070705030522n770e2ef2ka54479449c286194@mail.gmail.com> Thank you all for the input, it's been very helpful in making my clustering decisions. On 5/3/07, Mark Hahn wrote: > > I'm researching setting up a cluster and I'm curious as to whether or > > not it's a good idea to set up a syslog server. The question I have > > depends on how much syslog activity you have, and whether you care > to look at it (in one spot). there _are_ scalable and more robust > system/event logging approaches, but plain old syslog is pretty good. > > > is whether the traffic created from logging is going to slow down my > > network to the point of poor performance? > > a syslog message is normally a smallish UDP packet - say 100 bytes. > if you have 200 nodes each doing 5 per second, that's still only > 100KB/s - a pretty small fraction of a server's gigabit bandwidth. > and if you actually have 5/s, something's probably wrong... > > > experiences with logging. The cluster will have a management network > > and a computational network. > > I'm always skeptical about this advice - it's obvious that it might be good > in cases where a node sustains a nontrivial stream of management traffic > (say, NFS traffic using jumbo frames) which would interfere with possible > latency-sensitive/small MPI packets. > > but how often does that happen? consider that with a non-jumbo gigabit net, > a full packet is only 15 us more than the ~40 or so for a minimal one. > further, I observe MPI codes mostly getting packed into full nodes, > and not interfering with themselves much (distinct MPI and file IO phases > to the program.). > > I could more readily imagine segregating traffic to two nets based on > packet size or TOS. or bonding them in the first place. in any case, > I think you'd have to work pretty hard to generate enough syslog traffic > to matter much. > > > Greater than 64 Nodes, Yes. > > Greater than 64 and less than 128, Maybe? > > Greater than 128 No > > for a smallish cluster like 64 nodes, I don't think I'd worry about > syslog even if the net were just 100bT. for going above 200 nodes, > I'd probably try to do some measurements and extrapolation, but the > numbers above make it look minor. > -- ------------------------------ Christopher Vaughan From james_cuff at harvard.edu Thu May 3 17:52:02 2007 From: james_cuff at harvard.edu (James Cuff) Date: Thu, 3 May 2007 20:52:02 -0400 Subject: [Beowulf] Please help test compiler/hardware issue In-Reply-To: <4949.71.208.238.171.1178238877.squirrel@www.cora.nwra.com> References: <463A66CE.3040100@cora.nwra.com> <4949.71.208.238.171.1178238877.squirrel@www.cora.nwra.com> Message-ID: <8F859709-B371-4B03-86D2-ACC7E9DB1E5C@harvard.edu> Hi Orion, I'm thinking you may have bad memory/hardware on one of those nodes here mate... Compiles and runs fine here in 32 bit ubuntu fiesty: jcuff at harold:~$ uname -a Linux harold 2.6.20-15-386 #2 Sun Apr 15 07:34:00 UTC 2007 i686 GNU/ Linux jcuff at harold:~$ cat /proc/cpuinfo | grep "model name" model name : Intel(R) Pentium(R) 4 CPU 2.53GHz jcuff at harold:~$ gfortran -O3 -o tt test.f90 jcuff at harold:~$ time ./tt ^C real 8m11.766s user 8m5.862s sys 0m4.280s Also your 64 bit static compiled version runs fine even on a rather crappy "64 bit" Celeron I have on FC 5: [jcuff at gw ~]$ cat /proc/cpuinfo | grep "model name" model name : Intel(R) Celeron(R) CPU 2.93GHz [jcuff at gw ~]$ uname -a Linux 2.6.20-1.2312.fc5 #1 SMP Tue Apr 10 15:14:58 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux [jcuff at gw ~]$ time ./testatob ^C real 5m5.794s user 3m38.785s sys 0m9.890s Hope this helps. Best, j. -- James Cuff, D. Phil. Director of Research Computing, Life Sciences Division. Bauer Laboratory, 7 Divinity Avenue, Cambridge, MA. 02138 Tel: 617-384-5065 Direct Dial: 617-384-7647 On May 3, 2007, at 8:34 PM, wrote: > > > > Okay, I have a test case for the problem I reported before > > Statically compiled binary at http://www.cora.nwra.com/~orion/ > testatob.bz2 > for those of you without the PGF compiler to try. > > -- > Orion Poplawski > Technical Manager 303-415-9701 x222 > NWRA/CoRA Division FAX: 303-415-9702 > 3380 Mitchell Lane orion at cora.nwra.com > Boulder, CO 80301 http://www.cora.nwra.com > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From jtracy at ist.ucf.edu Fri May 4 06:40:37 2007 From: jtracy at ist.ucf.edu (Judd Tracy) Date: Fri, 04 May 2007 09:40:37 -0400 Subject: [Beowulf] LVM performance problems Message-ID: <463B37D5.6020008@ist.ucf.edu> I am trying to bring up a small file server and am noticing some serious performance issues when using LVM. I created a software raid /dev/md0 which I can read at ~195MB/s using the raw raid device, but as soon as I put LVM on top of it the read speeds drop to ~95MB/s using a raw lvm partition without a filesystem. When I use and xfs filesystem on top of the lvm partition it drops down to ~40MB/s. This seems like a processing power issue, but the machine is a dual processor opteron system with 4GB of ram in it. Does anyone have some insight into why I might be having such problems with the system. Judd Tracy Institute for Simulation and Training University of Central Florida From peter.st.john at gmail.com Fri May 4 07:53:33 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Fri, 4 May 2007 10:53:33 -0400 Subject: [Beowulf] Please help test compiler/hardware issue In-Reply-To: <8F859709-B371-4B03-86D2-ACC7E9DB1E5C@harvard.edu> References: <463A66CE.3040100@cora.nwra.com> <4949.71.208.238.171.1178238877.squirrel@www.cora.nwra.com> <8F859709-B371-4B03-86D2-ACC7E9DB1E5C@harvard.edu> Message-ID: I don't understand what "allocatable" and "allocate" do. It would seem that atob writes an integer (assigned by a(i) = i) to an address which had also been specified by the a(i)=i assignment, and was not necessarily allocated to a. That would be expected to generate random errors, and since the example has hardcoded numbers like 8460901, it could write to a range that **normally** is writeable in user space, but which is not guaranteed to be by the allocation. If it were C like this: int *a; a = malloc(10); for(i = 0; i< 10; i++) a[i] = i; *(a[5]) = 5; that is, I'm presuming that the contents at the address "5" can be written with the value 5, but "5" is not necessarily in the address space allocated by malloc. I'm thinking this must not be what the subroutine ATOB does, maybe a call by reference instead of call by value confusion (to me). However, the example looks like it was written to show up a compiler dependency and not to stress test a CPU. In fact, it looks like it was written by a malicious C programmer :-) but I have an alibi. Peter On 5/3/07, James Cuff wrote: > > > Hi Orion, > > I'm thinking you may have bad memory/hardware on one of those nodes > here mate... > > Compiles and runs fine here in 32 bit ubuntu fiesty: > > jcuff at harold:~$ uname -a > Linux harold 2.6.20-15-386 #2 Sun Apr 15 07:34:00 UTC 2007 i686 GNU/ > Linux > > jcuff at harold:~$ cat /proc/cpuinfo | grep "model name" > model name : Intel(R) Pentium(R) 4 CPU 2.53GHz > > > jcuff at harold:~$ gfortran -O3 -o tt test.f90 > jcuff at harold:~$ time ./tt > ^C > real 8m11.766s > user 8m5.862s > sys 0m4.280s > > > Also your 64 bit static compiled version runs fine even on a rather > crappy "64 bit" Celeron I have on FC 5: > > [jcuff at gw ~]$ cat /proc/cpuinfo | grep "model name" > model name : Intel(R) Celeron(R) CPU 2.93GHz > > [jcuff at gw ~]$ uname -a > Linux 2.6.20-1.2312.fc5 #1 SMP Tue Apr 10 15:14:58 EDT 2007 x86_64 > x86_64 x86_64 GNU/Linux > > > [jcuff at gw ~]$ time ./testatob > ^C > real 5m5.794s > user 3m38.785s > sys 0m9.890s > > > Hope this helps. > > Best, > > j. > > -- > James Cuff, D. Phil. > Director of Research Computing, Life Sciences Division. > Bauer Laboratory, 7 Divinity Avenue, Cambridge, MA. 02138 > Tel: 617-384-5065 Direct Dial: 617-384-7647 > > > On May 3, 2007, at 8:34 PM, wrote: > > > > > > > Okay, I have a test case for the problem I reported before > > > > Statically compiled binary at http://www.cora.nwra.com/~orion/ > > testatob.bz2 > > for those of you without the PGF compiler to try. > > > > -- > > Orion Poplawski > > Technical Manager 303-415-9701 x222 > > NWRA/CoRA Division FAX: 303-415-9702 > > 3380 Mitchell Lane orion at cora.nwra.com > > Boulder, CO 80301 http://www.cora.nwra.com > > > > _______________________________________________ > > Beowulf mailing list, Beowulf at beowulf.org > > To change your subscription (digest mode or unsubscribe) visit > > http://www.beowulf.org/mailman/listinfo/beowulf > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathog at caltech.edu Fri May 4 08:24:28 2007 From: mathog at caltech.edu (David Mathog) Date: Fri, 04 May 2007 08:24:28 -0700 Subject: [Beowulf] Re: fast file copying Message-ID: Felix Rauch Valenti wrote: > On 03/05/07, Alan Louis Scheinine wrote: > > One possibility is nettee. > > http://saf.bio.caltech.edu/nettee.html > > As a related side note: If the bandwidth you get is not what you > expect, it may well be that your switch is bad (or that your disks are > slow). That was my experience a couple of years ago, so we implemented > a switch benchmark called "Switchbench", that helps to identify the > bandwidth bottleneck in a network. Since both dolly and nettee have been mentioned in this thread, it's probably appropriate to point out that they are very closely related. Felix wrote Dolly, and I forked his code and modified it a bit to arrive at nettee. Felix's point about nettee and slow disks is especially relevant when imaging nodes because historically some of the small linux environments used for such installations did not set DMA on the disks by default, and that slowed down the write speed on the disks dramatically. When file transfers are being carried out on busy systems it's also a good idea to put "buffer" or an equivalent program in the output stream on each node, so that a brief contention for IO writes to the local disk doesn't slow down the entire chain. Always use "buffer" or an equivalent if the local write stream is being piped through "tar -xf -" or an equivalent dearchiving command, as the dearchiving step can slow down the write speed if it has to create a large number of small files in a directory. Regards, David Mathog mathog at caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From orion at cora.nwra.com Fri May 4 08:43:37 2007 From: orion at cora.nwra.com (Orion Poplawski) Date: Fri, 04 May 2007 09:43:37 -0600 Subject: [Beowulf] compiler/hardware issue fixed In-Reply-To: <463A66CE.3040100@cora.nwra.com> References: <463A66CE.3040100@cora.nwra.com> Message-ID: <463B54A9.9060707@cora.nwra.com> Orion Poplawski wrote: > So, I guess this points to a hardware issue, but it may be a somewhat > generalized hardware issue. I'd love to hear reports on other > (particularly other Tyan S2882 dual 244's) systems. I updated the BIOS on the 244's and the problem appears to have gone away. I should have done this long ago, but I had mistakenly thought that I couldn't PXE boot the flash update utility. It's also somewhat gratifying to understand a bit more what the issue was. So, those of you with Tyan S2882s, update your bios! -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com From orion at cora.nwra.com Fri May 4 08:48:35 2007 From: orion at cora.nwra.com (Orion Poplawski) Date: Fri, 04 May 2007 09:48:35 -0600 Subject: [Beowulf] Please help test compiler/hardware issue In-Reply-To: References: <463A66CE.3040100@cora.nwra.com> <4949.71.208.238.171.1178238877.squirrel@www.cora.nwra.com> <8F859709-B371-4B03-86D2-ACC7E9DB1E5C@harvard.edu> Message-ID: <463B55D3.906@cora.nwra.com> Peter St. John wrote: > I'm thinking this must not be what the subroutine ATOB does, maybe a > call by > reference instead of call by value confusion (to me). Yup. All Fortran calls are by reference. So the routine is just copying one section of A to another section of A. Should be fine as long as your hardware works. -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion at cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com From mwill at penguincomputing.com Fri May 4 09:04:50 2007 From: mwill at penguincomputing.com (Michael Will) Date: Fri, 4 May 2007 09:04:50 -0700 Subject: [Beowulf] LVM performance problems In-Reply-To: <463B37D5.6020008@ist.ucf.edu> Message-ID: <433093DF7AD7444DA65EFAFE3987879C418346@orca.penguincomputing.com> What model server is this? What level raid is the software raid volume, how many drives and what type? How did you exactly establish the performance number? I have done performance benchmarking with LVM over hardware raid luns had have not been able to show a significant performance degradation, and it is actually faster when you stripe across several raid-protected luns. Some general disk benchmarking rules: Write at least twice as large files as you have RAM (so that's 8G minimum in your case) so caching does not inflate the numbers. Also don't forget to measure the time for it to be synced to disk. Example for large block I/O large file benchmarking on a 4G system: time dd if=/dev/zero of=/mnt/targetfilesystem/largefile bs=1M count=8192 time sync If you divide 8192 by the aggragate of the seconds used by both commands then you have a good MB/s estimate. For read you can just do: time dd of=/dev/null if=/mnt/targetfilesystem/largefile bs=1M count=8192 To do some more intense testing including mixed read/write, I use bonnie++. If you want to turn it into a real science and get very specific data for specific access patterns, you might want to look at iozone. Michael -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org] On Behalf Of Judd Tracy Sent: Friday, May 04, 2007 6:41 AM To: beowulf at beowulf.org Subject: [Beowulf] LVM performance problems I am trying to bring up a small file server and am noticing some serious performance issues when using LVM. I created a software raid /dev/md0 which I can read at ~195MB/s using the raw raid device, but as soon as I put LVM on top of it the read speeds drop to ~95MB/s using a raw lvm partition without a filesystem. When I use and xfs filesystem on top of the lvm partition it drops down to ~40MB/s. This seems like a processing power issue, but the machine is a dual processor opteron system with 4GB of ram in it. Does anyone have some insight into why I might be having such problems with the system. Judd Tracy Institute for Simulation and Training University of Central Florida _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf From bill at cse.ucdavis.edu Fri May 4 11:41:21 2007 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Fri, 04 May 2007 11:41:21 -0700 Subject: [Beowulf] LVM performance problems In-Reply-To: <463B37D5.6020008@ist.ucf.edu> References: <463B37D5.6020008@ist.ucf.edu> Message-ID: <463B7E51.5020305@cse.ucdavis.edu> Judd Tracy wrote: > I am trying to bring up a small file server and am noticing some serious > performance issues when using LVM. I created a software raid /dev/md0 How many drives? Which raid level? What stripesize? What RAID controller? > which I can read at ~195MB/s using the raw raid device, but as soon as I You did benchmark reads using accesses significantly larger than ram... right? > put LVM on top of it the read speeds drop to ~95MB/s using a raw lvm Exactly what config in LVM? What stripesize? > partition without a filesystem. When I use and xfs filesystem on top of > the lvm partition it drops down to ~40MB/s. This seems like a Exactly what mkfs.xfs parameters did you use? In particular switch, sunit, and related parameters. > processing power issue, but the machine is a dual processor opteron > system with 4GB of ram in it. Does anyone have some insight into why I > might be having such problems with the system. I've not seen many differences in my testing. Drives are capable of 30-60MB/sec each for sequential reads/writes, the trick is to balance the load across all disks for your workload. Beware, optimizing your setup for dd, and bonnie might lead to worse performance if your workload doesn't use similar access patterns. The interaction between N disks, reads/writes of M size, and various strip sizes (in the raid and in the filesystem) is rather complex. So my guess is that your access pattern used to hit many of your disks in parallel, but now is bottlenecked by a single disk. Not that you are somehow CPU limited. From peter.st.john at gmail.com Fri May 4 13:06:51 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Fri, 4 May 2007 16:06:51 -0400 Subject: [Beowulf] SSH without login in nodes In-Reply-To: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> Message-ID: There was a typogrphical error in the question. I had a brief exchange with se?or Gomez and he confirmed this translation: I am configuring a cluster with ssh (but without passwords) and currently the users can log in to compute nodes. I wish the clients to use the queue system (Torque, it works fine) without being able to access the compute nodes. In the past, we used rsh without allowing rlogin. Unfortunately my Spanish seems to be better than my Beowulfry so I can't help much :-) Peter On 5/2/07, Victor Gomez wrote: > > Hi, > > Im config a cluster with ssh password less, but the users can login into > nodes. > > I want the clients, use de queue system (Torque, its works fine), without > access into nodes. > > In the past, use rlogin, without rlogin. > > Thanks. > > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kilian at stanford.edu Fri May 4 13:40:50 2007 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Fri, 4 May 2007 13:40:50 -0700 Subject: [Beowulf] SSH without login in nodes In-Reply-To: References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> Message-ID: <200705041340.50461.kilian@stanford.edu> Hi, On Friday 04 May 2007 01:06:51 pm Peter St. John wrote: > There was a typogrphical error in the question. I had a brief exchange > with se?or Gomez and he confirmed this translation: > > I am configuring a cluster with ssh (but without passwords) and > currently the users can log in to compute nodes. > I wish the clients to use the queue system (Torque, it works fine) > without being able to access the compute nodes. > In the past, we used rsh without allowing rlogin. What you can do is configure PAM on the nodes, to only allow login for a specific set of users, if any. It should come with any modern distro. Be sure your /etc/pam.d/authconfig contains reference to pam_access, like: account required /lib/security/$ISA/pam_access.so And configure /etc/security/access.conf to match your needs, like: # Allow administrative login from everywhere +:wheel staff:ALL # Prevent user logins -:users:ALL You can give a look at http://www.informit.com/articles/article.asp?p=165226&seqNum=12&rl=1 for more info. Cheers, -- Kilian From siegert at sfu.ca Fri May 4 15:53:49 2007 From: siegert at sfu.ca (Martin Siegert) Date: Fri, 4 May 2007 15:53:49 -0700 Subject: [Beowulf] MPI application benchmarks Message-ID: <20070504225349.GC17163@stikine.ucs.sfu.ca> Hi, this is partially triggered by the mentioniong of the SPEC MPI2007 benchmark: what are people using as a benchmark suite for RFP purposes? We will be purchasing a shared cluster for a wide community (currently more than 1000 users). Thus, the common response on this list to evaluate hardware - "use your own application as benchmark" - does not work: users change, users' applications change, etc., etc. Thus, I need a benchmark suite that tests a wide spectrum of properties. In that respect the SPEC MPI2007 benchmark appears to be ideal. Alas, it does not appear to be open source (please correct me if I am wrong) - so far I have not even been able to figure out which applications are being used in that benchmark suite. Thus, although RFP evaluations are mentioned first under likely uses for SPEC MPI2007, not being open source seems to contradict that statement. Thus: what are people using? I have seen the HPC Challenge Benchmark. We almost certainly will include gromacs as an application benchmark. Which other applications do you suggest? Thanks in advance! Cheers, Martin -- Martin Siegert Head, HPC at SFU WestGrid Site Lead Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 From csamuel at vpac.org Fri May 4 21:42:58 2007 From: csamuel at vpac.org (Chris Samuel) Date: Sat, 5 May 2007 14:42:58 +1000 Subject: [Beowulf] SSH without login in nodes In-Reply-To: References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> Message-ID: <200705051442.59044.csamuel@vpac.org> On Sat, 5 May 2007, Peter St. John wrote: > I am configuring a cluster with ssh (but without passwords) and currently > the users can log in to compute nodes. I wish the clients to use the queue > system (Torque, it works fine) without being able to access the compute > nodes. In the past, we used rsh without allowing rlogin. We use a very ugly hack (this was already in place when I arrived) which has been very effective over the past few years at doing that and doesn't prevent people using SSH based MPI launchers (though we don't recommend them being used). Basically it's just the following in /etc/profile on our compute nodes. if echo $HOSTNAME | egrep -q '^node' ; then if [ ! $PBS_ENVIRONMENT ]; then if [ $USER != "root" ]; then if [ "$GROUP" != "systems" ]; then exit; fi; fi; fi; fi; How's that ? cheers, Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia From kilian at stanford.edu Sat May 5 09:24:37 2007 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Sat, 5 May 2007 09:24:37 -0700 Subject: [Beowulf] SSH without login in nodes In-Reply-To: <200705051442.59044.csamuel@vpac.org> References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> <200705051442.59044.csamuel@vpac.org> Message-ID: <200705050924.37869.kilian@stanford.edu> On Friday 04 May 2007 21:42:58 Chris Samuel wrote: > We use a very ugly hack (this was already in place when I arrived) which > has been very effective over the past few years at doing that and > doesn't prevent people using SSH based MPI launchers (though we don't > recommend them being used). > > Basically it's just the following in /etc/profile on our compute nodes. > > if echo $HOSTNAME | egrep -q '^node' ; then > if [ ! $PBS_ENVIRONMENT ]; > then if [ $USER != "root" ]; > then if [ "$GROUP" != "systems" ]; > then exit; > fi; > fi; > fi; > fi; > > > How's that ? Not that ugly, actually. But what if users do a ssh node -t "bash --noprofile"? ;) To handle of SSH based MPI launchers, we've disabled user logins from our frontend node to the compute nodes, but allowed them between compute nodes. So that the scheduler takes care of dispatching the initial process on a first node (no SSH involved), and then SSH connections can be used to dispatch the MPI daemons on the other nodes, from the initial one. Cheers, -- Kilian From brian.ropers.huilman at gmail.com Sat May 5 10:10:03 2007 From: brian.ropers.huilman at gmail.com (Brian D. Ropers-Huilman) Date: Sat, 5 May 2007 12:10:03 -0500 Subject: [Beowulf] MPI application benchmarks In-Reply-To: <20070504225349.GC17163@stikine.ucs.sfu.ca> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> Message-ID: On 5/4/07, Martin Siegert wrote: > We will be purchasing a shared cluster for a wide community (currently > more than 1000 users). Thus, the common response on this list to evaluate > hardware - "use your own application as benchmark" - does not work: > users change, users' applications change, etc., etc. Thus, I need a > benchmark suite that tests a wide spectrum of properties. My answer is still to "use your own application(s)." Poll your users and find out what they have and what they are going to run. Find some who already have codes that scale well (>1000 cores) and ask them to participate. Many vendors will allow you to run your own codes on systems they have at their own sites before you decide to purchase. These vendor-hosted systems are typically only 256 cores or less, but it gives you some idea as to how your codes might run. I also suggest picking some representative synthetic benchmarks to test floating point and integer operations, memory bandwidth, MPI ping-pongs (the SPEC MPI2007, among others, would fit here), the HPC Challenge codes, and the like. Many sites will then take all of these results (synthetic + their own applications) and aggregate the results, possibly with weighting factors, into a single number. If you do this over a number of years and number of systems, with the same benchmarks, you can even start to normalize against a "base" system and take things like different core counts and costs into account. -- Brian D. Ropers-Huilman, Director Systems Administration and Technical Operations Supercomputing Institute 599 Walter Library +1 612-626-5948 (V) 117 Pleasant Street S.E. +1 612-624-8861 (F) University of Minnesota Twin Cities Campus Minneapolis, MN 55455-0255 http://www.msi.umn.edu/ From csamuel at vpac.org Sun May 6 00:58:54 2007 From: csamuel at vpac.org (Chris Samuel) Date: Sun, 6 May 2007 17:58:54 +1000 Subject: [Beowulf] SSH without login in nodes In-Reply-To: <200705050924.37869.kilian@stanford.edu> References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> <200705051442.59044.csamuel@vpac.org> <200705050924.37869.kilian@stanford.edu> Message-ID: <200705061758.54756.csamuel@vpac.org> On Sun, 6 May 2007, Kilian CAVALOTTI wrote: > Not that ugly, actually. But what if users do a > ssh node -t "bash --noprofile"? ;) Then if any of the 500 odd tried we would spot them with some other scripts and chase them about it. We've not had to do that yet, though, fortunately! > To handle of SSH based MPI launchers, we've disabled user logins from our > frontend node to the compute nodes, but allowed them between compute > nodes. So that the scheduler takes care of dispatching the initial process > on a first node (no SSH involved), and then SSH connections can be used to > dispatch the MPI daemons on the other nodes, from the initial one. Now that there's the Torque PAM module (pam_pbssimpleauth) that Garrick wrote I'm tempted to set that up, but given our current system works I haven't dared break it. :-) cheers! Chris -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia From bernd-schubert at gmx.de Fri May 4 08:12:48 2007 From: bernd-schubert at gmx.de (Bernd Schubert) Date: Fri, 4 May 2007 17:12:48 +0200 Subject: [Beowulf] LVM performance problems In-Reply-To: <463B37D5.6020008@ist.ucf.edu> References: <463B37D5.6020008@ist.ucf.edu> Message-ID: <200705041712.48667.bernd-schubert@gmx.de> On Friday 04 May 2007 15:40:37 Judd Tracy wrote: > I am trying to bring up a small file server and am noticing some serious > performance issues when using LVM. I created a software raid /dev/md0 > which I can read at ~195MB/s using the raw raid device, but as soon as I > put LVM on top of it the read speeds drop to ~95MB/s using a raw lvm > partition without a filesystem. When I use and xfs filesystem on top of > the lvm partition it drops down to ~40MB/s. This seems like a > processing power issue, but the machine is a dual processor opteron > system with 4GB of ram in it. Does anyone have some insight into why I > might be having such problems with the system. try something like this: blockdev --setra 8192 /dev/{volumegroup}/{logical_volume} You should also try smaller and larger numbers. Hope it helps, Bernd From jlforrest at berkeley.edu Fri May 4 08:17:08 2007 From: jlforrest at berkeley.edu (Jon Forrest) Date: Fri, 04 May 2007 08:17:08 -0700 Subject: [Beowulf] fast file copying In-Reply-To: <4eafc81b0705030106g72471b29u8344516d559e24f7@mail.gmail.com> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> <4638A071.1080008@crs4.it> <4eafc81b0705030106g72471b29u8344516d559e24f7@mail.gmail.com> Message-ID: <463B4E74.2020604@berkeley.edu> Felix Rauch Valenti wrote: > As a related side note: If the bandwidth you get is not what you > expect, it may well be that your switch is bad (or that your disks are > slow). That was my experience a couple of years ago, so we implemented > a switch benchmark called "Switchbench", that helps to identify the > bandwidth bottleneck in a network. I've downloaded this program and it looks very promising. One suggestion - many of us have set up our clusters to allow "rsh" access without requiring passwords. Your program is written to use "ssh" for accessing the cluster nodes. It would be nice if your program could let the user choose which of these methods to use. Cordially, -- Jon Forrest Unix Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From jtracy at ist.ucf.edu Fri May 4 08:37:20 2007 From: jtracy at ist.ucf.edu (Judd Tracy) Date: Fri, 04 May 2007 11:37:20 -0400 Subject: [Beowulf] LVM performance problems In-Reply-To: <200705041712.48667.bernd-schubert@gmx.de> References: <463B37D5.6020008@ist.ucf.edu> <200705041712.48667.bernd-schubert@gmx.de> Message-ID: <463B5330.9080000@ist.ucf.edu> That seemed to work just fine, it now averages ~205MB/s read thanks. Judd Bernd Schubert wrote: > On Friday 04 May 2007 15:40:37 Judd Tracy wrote: > >> I am trying to bring up a small file server and am noticing some serious >> performance issues when using LVM. I created a software raid /dev/md0 >> which I can read at ~195MB/s using the raw raid device, but as soon as I >> put LVM on top of it the read speeds drop to ~95MB/s using a raw lvm >> partition without a filesystem. When I use and xfs filesystem on top of >> the lvm partition it drops down to ~40MB/s. This seems like a >> processing power issue, but the machine is a dual processor opteron >> system with 4GB of ram in it. Does anyone have some insight into why I >> might be having such problems with the system. >> > > > try something like this: > blockdev --setra 8192 /dev/{volumegroup}/{logical_volume} > > You should also try smaller and larger numbers. > > Hope it helps, > Bernd > > From rosing at peakfive.com Fri May 4 08:49:36 2007 From: rosing at peakfive.com (Matt) Date: Fri, 4 May 2007 09:49:36 -0600 Subject: [Beowulf] programming questions here? Message-ID: <17979.22032.204709.28791@lala.site> Hi, I'm looking for a mailing list, newsgroup, or whatever, where people are interested in talking about practical programming issues on parallel machines. Optimizing and maintaining code, software engineering for high performance code, or debugging, are the types of things I'm interested in. I'm not really interested in research so much as practical information that can be put to use now. Are there many programmers lurking on this list? It looks mostly to be OS issues (which makes sense, given the subject). Comp.parallel originally used to be a place for what I'm looking for but, I'm not sure how to say this, it's basically just a place to announce conferences anymore. Thanks, Matt From victux at gmail.com Fri May 4 10:57:12 2007 From: victux at gmail.com (Victor Gomez) Date: Fri, 4 May 2007 12:57:12 -0500 Subject: [Beowulf] SSH without log in to nodes Message-ID: <48f9d1380705041057nc8bf5dejad97d6a253783909@mail.gmail.com> Hi, I am configuring a cluster with ssh (but without passwords) and currently the users can log in to compute nodes. I wish the clients to use the queue system (Torque, it works fine) without being able to access the compute nodes. In the past, we used rsh without allowing rlogin. -------------- next part -------------- An HTML attachment was scrubbed... URL: From openlinuxsource at gmail.com Sat May 5 08:16:33 2007 From: openlinuxsource at gmail.com (Amy Lee) Date: Sat, 05 May 2007 23:16:33 +0800 Subject: [Beowulf] Help: About EMBOSS Message-ID: <463C9FD1.7020700@gmail.com> Hello, I'm a Chinese student in an agricultural university. And I built a Linux Beowulf cluster for Bioinformatics Department. According to plan, we should use EMBOSS in this cluster. I'd like to know, whether I can use EMBOSS to do a big problem on these nodes? I mean the whole nodes can solve one problem together at the same time, not arrange different jobs to nodes. And how about Sun Grid Engine? Is it useful in this plan? Thanks in advance! Amy Lee From mark.t.79 at gmail.com Sat May 5 23:48:02 2007 From: mark.t.79 at gmail.com (Mark Thompson) Date: Sun, 6 May 2007 02:48:02 -0400 Subject: [Beowulf] new to clusters... Message-ID: <213386ce0705052348q6c1bd4eatb78fb307c7c6ff38@mail.gmail.com> I am new to clusters and I am trying to learn as much as possible. I currently run the new ubuntu feisty fawn and am learning as much as I can about networking and such. I have several older computers ranging from a pentium 3 500 mhz to a athlon 1.8 ghz and even have a dual athlon 1.6 ghz server. I am working on getting more second hand computers for purposes of learning clustering and would like to know others input on my idea as well as suggestions for doing this. Any input would be helpful. Also, I would like suggestions for what to do with it once I have it up and running. I wouldn't mind providing services for others....would give me good experience with administration and such. thanks for your time. Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From toon at moene.indiv.nluug.nl Sun May 6 05:15:56 2007 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Sun, 06 May 2007 14:15:56 +0200 Subject: [Beowulf] MPI application benchmarks In-Reply-To: <20070504225349.GC17163@stikine.ucs.sfu.ca> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> Message-ID: <463DC6FC.4060202@moene.indiv.nluug.nl> Martin Siegert wrote: > Hi, > > this is partially triggered by the mentioniong of the SPEC MPI2007 > benchmark: what are people using as a benchmark suite for RFP purposes? Our own code. Anyone who does something different for serious RFP purposes is playing with their lives (at least in our surroundings - civil servants are heavily watched as far as fraud / or attempted fraud in these case go). -- Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.indiv.nluug.nl/~toon/ Who's working on GNU Fortran: http://gcc.gnu.org/ml/gcc/2007-01/msg00059.html From tjrc at sanger.ac.uk Sun May 6 08:08:26 2007 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Sun, 6 May 2007 16:08:26 +0100 Subject: [Beowulf] Help: About EMBOSS In-Reply-To: <463C9FD1.7020700@gmail.com> References: <463C9FD1.7020700@gmail.com> Message-ID: <748A4510-5AB4-45A4-A7E4-49E0C58D3AC2@sanger.ac.uk> On 5 May 2007, at 4:16 pm, Amy Lee wrote: > Hello, > > I'm a Chinese student in an agricultural university. And I built a > Linux Beowulf cluster for Bioinformatics Department. According to > plan, we should use EMBOSS in this cluster. > > I'd like to know, whether I can use EMBOSS to do a big problem on > these nodes? I mean the whole nodes can solve one problem together > at the same time, not arrange different jobs to nodes. > > And how about Sun Grid Engine? Is it useful in this plan? As far as I am aware, EMBOSS consists almost entirely of single threaded programs, so its role in a cluster environment is going to be for solving embarrassingly parallel problems where you can split the workload into many small independent jobs. Without knowing what the actual problem you are trying to solve is, I can't really advise on whether any of the programs in EMBOSS are what you want. Tim From rgb at phy.duke.edu Sun May 6 08:40:04 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 6 May 2007 11:40:04 -0400 (EDT) Subject: [Beowulf] SSH without login in nodes In-Reply-To: <200705061758.54756.csamuel@vpac.org> References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> <200705051442.59044.csamuel@vpac.org> <200705050924.37869.kilian@stanford.edu> <200705061758.54756.csamuel@vpac.org> Message-ID: On Sun, 6 May 2007, Chris Samuel wrote: > On Sun, 6 May 2007, Kilian CAVALOTTI wrote: > >> Not that ugly, actually. But what if users do a >> ssh node -t "bash --noprofile"? ;) > > Then if any of the 500 odd tried we would spot them with some other scripts > and chase them about it. We've not had to do that yet, though, fortunately! Yes, this is the other solution. Do nothing fancy in script-land. Just tell your user base "Do Not Login To The Nodes Directly And Run Jobs". Put up a TRIVIAL script to monitor and mail admin if someone should do so. Then keep a sucker rod handy to punish offenders (with the direct support and authorization to chasten from the cluster's owner(s)). In most cases with a moderate size user base, you'll have at most one or two offenses, will whack the offenders upside the head mouthing phrases like "loss of privileges to use the cluster at all", word will get out, and things will be just fine. If you organize the cluster on an isolated network so that the nodes are only visible "through" the head node, most users will never even bother to work out "how" they can login to nodes directly, especially if you tell them that You Will Be Watching and They'd Better Not If They Know What Is Good For Them. This MIGHT not work for a cluster with a very large, very dynamic, user base -- a Grid-like environment or a large public cluster with 1000 potential users. I would bet that one could make it work even then with minimal effort, but there is no doubt that you'd be bopping folks more often as a large population is bound to have a wise-ass would-be hacker in it. Find them, bop them, offer them a job. rgb > >> To handle of SSH based MPI launchers, we've disabled user logins from our >> frontend node to the compute nodes, but allowed them between compute >> nodes. So that the scheduler takes care of dispatching the initial process >> on a first node (no SSH involved), and then SSH connections can be used to >> dispatch the MPI daemons on the other nodes, from the initial one. > > Now that there's the Torque PAM module (pam_pbssimpleauth) that Garrick wrote > I'm tempted to set that up, but given our current system works I haven't > dared break it. :-) > > cheers! > Chris > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From rgb at phy.duke.edu Sun May 6 08:46:46 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 6 May 2007 11:46:46 -0400 (EDT) Subject: [Beowulf] MPI application benchmarks In-Reply-To: <463DC6FC.4060202@moene.indiv.nluug.nl> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> Message-ID: On Sun, 6 May 2007, Toon Moene wrote: > Martin Siegert wrote: >> Hi, >> >> this is partially triggered by the mentioniong of the SPEC MPI2007 >> benchmark: what are people using as a benchmark suite for RFP purposes? > > Our own code. > > Anyone who does something different for serious RFP purposes is playing with > their lives (at least in our surroundings - civil servants are heavily > watched as far as fraud / or attempted fraud in these case go). Wow do YOU have enlightened guardians of the public trust. Here, one is lucky if one's grant or corporate review officers understand anything but "Megaflops". Oops, that shows my age. I meant "Gigaflops" or maybe even "Teraflops". If you ask them what a "Flop" is, sometimes they might even be able to answer! One dimension is all that these folks can usually handle, at least without an extensive education process. Computer science grant review and in SOME cases general scientific grant review excepted, of course -- in some branches of physics there are actually folks that have heard of clusters at this point who know better. But there are also a whole lot of people stuck back there at "Flops", or "MIPs" for whom aggregate capacity in one simple measure is understandable, and for whom the nonlinear task scaling of a cluster will forever be a closed book. rgb > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From rgb at phy.duke.edu Sun May 6 08:53:54 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sun, 6 May 2007 11:53:54 -0400 (EDT) Subject: [Beowulf] programming questions here? In-Reply-To: <17979.22032.204709.28791@lala.site> References: <17979.22032.204709.28791@lala.site> Message-ID: On Fri, 4 May 2007, Matt wrote: > Hi, > > I'm looking for a mailing list, newsgroup, or whatever, where people > are interested in talking about practical programming issues on > parallel machines. Optimizing and maintaining code, software > engineering for high performance code, or debugging, are the types of > things I'm interested in. I'm not really interested in research so > much as practical information that can be put to use now. > > Are there many programmers lurking on this list? It looks mostly to be > OS issues (which makes sense, given the subject). There are many programmers on the list. Indeed, I'd guess that the majority of posters are coders as well as admins, cluster engineers, researchers. If you want to see the coders in action, just post a question like "Is C or Pascal a better language for writing parallel code." Just be sure to put on a biohazard suit first. Actually (more seriously) if you look back at the list archives you'll see lots of discussions on writing PVM and MPI code, a number of "friendly" religious wars on which language is the best, a fair number of articles on specific programming issues associated with a piece of code. All completely reasonable usage of the list, even if the issue isn't strictly about parallel code -- folks are happy to help out or talk about coding because they ARE coders and therefore like to show off their chops;-). There are also introductory resources on parallel coding methodology archived and delivered by places like the cluster monkey website saved from the short-lived Cluster World Magazine columns. Lots of articles on e.g. MPI, a few on PVM (mostly written by me:-). This kind of discussion doesn't usually 'dominate' the list traffic, but it is definitely an important component. rgb > > Comp.parallel originally used to be a place for what I'm looking for > but, I'm not sure how to say this, it's basically just a place to > announce conferences anymore. > > Thanks, > > Matt > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From landman at scalableinformatics.com Sun May 6 09:29:18 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Sun, 06 May 2007 12:29:18 -0400 Subject: [Beowulf] quickie OFED build question Message-ID: <463E025E.90409@scalableinformatics.com> Will ask this on the OFED mailing lists in a bit, but has anyone successfully built OFED 1.x against OpenSuSE 10.2 ? I have 1.0, 1.1, and 1.2-rc2 failing. Works on OpenSuSE 10.1. Nature of the problem appears to be some kernel changes between versions (2.6.16 vs 2.6.18) that pulled some specific things out. Also, I want to turn off the -m32 switch usage, badly. Any clues/patches out there? Google hasn't been very helpful here (must be searching for wrong things). Thanks. Joe -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From siegert at sfu.ca Sun May 6 12:05:20 2007 From: siegert at sfu.ca (Martin Siegert) Date: Sun, 6 May 2007 12:05:20 -0700 Subject: [Beowulf] MPI application benchmarks In-Reply-To: <463DC6FC.4060202@moene.indiv.nluug.nl> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> Message-ID: <20070506190520.GB28930@stikine.ucs.sfu.ca> On Sun, May 06, 2007 at 02:15:56PM +0200, Toon Moene wrote: > Martin Siegert wrote: > >Hi, > > > >this is partially triggered by the mentioniong of the SPEC MPI2007 > >benchmark: what are people using as a benchmark suite for RFP purposes? > > Our own code. Sigh. I thought I could avoid that response. Our own code (due to the no. of users who all believe that their code is the most important and therefore must be benchmarked) is so massive that any potential RFP respondent would have to work a year to run the code. Thus, we have to find a sensible cross section that is respresentative of the applications we care about. Since we will be purchasing several facilities with different performance characteristics (which are somewhat flexible depending on the price/performance ratio) we would like to setup a benchmark suite that covers the whole spectrum. > Anyone who does something different for serious RFP purposes is playing > with their lives (at least in our surroundings - civil servants are > heavily watched as far as fraud / or attempted fraud in these case go). I believe we can handle that. I did not say that we wouldn't run, test, modify/add applications for our own purposes. That's why we require open source. Since "own code" is the only answer that I appear to be getting, let me rephrase the question: which applications do you include as "your own code"? E.g., we will almost certainly include gromacs (which still leaves the question of the input parameters, etc.). Cheers, Martin -- Martin Siegert Head, HPC at SFU WestGrid Site Lead Academic Computing Services phone: (604) 291-4691 Simon Fraser University fax: (604) 291-4242 Burnaby, British Columbia email: siegert at sfu.ca Canada V5A 1S6 From James.P.Lux at jpl.nasa.gov Sun May 6 16:33:44 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Sun, 06 May 2007 16:33:44 -0700 Subject: [Beowulf] programming questions here? In-Reply-To: References: <17979.22032.204709.28791@lala.site> Message-ID: <6.2.3.4.2.20070506162939.02f3d6b0@mail.jpl.nasa.gov> At 08:53 AM 5/6/2007, Robert G. Brown wrote: >O >There are many programmers on the list. Indeed, I'd guess that the >majority of posters are coders as well as admins, cluster engineers, >researchers. If you want to see the coders in action, just post a >question like "Is C or Pascal a better language for writing parallel >code." Just be sure to put on a biohazard suit first. Actually, if you really want to see a lot of traffic, you need to post a question like: "I know that SNOBOL is the best language for parallel programming, but because distro X has a lame implementation, am I really stuck using the brain-dead constructs of Ada, or would I be better using distro Y and APL, and rewriting the compiler for my needs." {Langauge names changed to avoid stepping on too many landmines} It's the combination of the absolute assertion of superiority for one choice and the denigration of multiple other choices that will really bring out the comments. As you say, fireproof suit required. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From felix.rauch.valenti at gmail.com Sun May 6 20:20:38 2007 From: felix.rauch.valenti at gmail.com (Felix Rauch Valenti) Date: Mon, 7 May 2007 13:20:38 +1000 Subject: [Beowulf] fast file copying In-Reply-To: <463AF372.1010107@cse.ucdavis.edu> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> <463AF372.1010107@cse.ucdavis.edu> Message-ID: <4eafc81b0705062020v199aa5abh2fe4517ed8afc123@mail.gmail.com> On 04/05/07, Bill Broadley wrote: > Geoff Galitz wrote: > > During an HPC talk some years ago, I recall someone mentioned a tool > > which can copy large datasets across a cluster using a ring topology. > > Perhaps someone here knows of this tool? > > Not sure about a ring topology, seems kinda silly... Why would that be silly? To clarify: The transmission through the ring happens in parallel, i.e., while a node n receives the data stream from node n-1, it writes the stream to disk and at the same time forwards it to node n+1. I have yet to see a tool that can achieve better data rates in practice, for reliable, high speed and large scale data distribution in clusters. > > More to the point, we are pushing around datasets that are about > > 1Gbyte. The datasets are pushed out to dozens of nodes all at once and > > How often? I just bit-torrented a 1GB file to 165 nodes in 3 minutes, > 1.5 minutes was the lazy why I launched it (the last node didn't > start until 1.5 minutes into the run). BTW, 140 or so of those nodes > already had 1 job per CPU running. 1 GB file in 1.5 minutes translates to about 11 MB/s, which sounds a lot like Fast Ethernet (100 mbps). By today's standards that's relatively slow and it's quite likely that the network will be the bottleneck for almost any tool. > There are various ways to maximize I/O with bit-torrent. Various > seeders allow uploading each block only once (usually called super > seeder mode). Assuming you have a few GB ram on the file server > you could even prefetch the file before torrenting (i.e. dd if=file_to_server > of=/dev/null) since the limit on bit-torrent bandwidth is often how > quickly you can seek. > > Additionally you can make the chunk size larger to reduce the number > of seeks. On the client side preallocation can greatly reduce > the number of seeks. More advantages of the ring topology: It uploads every block on every node exactly once, no prefetching and no seeks are required (if you replicate a whole partition or a single large file). If you are interested in more details about the technology, like models and performance measurements (somewhat old by now), check out the second paper in this list: http://www.cs.inf.ethz.ch/cops/patagonia/#relmat - Felix From hahn at mcmaster.ca Sun May 6 21:36:34 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 7 May 2007 00:36:34 -0400 (EDT) Subject: [Beowulf] SSH without login in nodes In-Reply-To: References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> <200705051442.59044.csamuel@vpac.org> <200705050924.37869.kilian@stanford.edu> <200705061758.54756.csamuel@vpac.org> Message-ID: > Yes, this is the other solution. Do nothing fancy in script-land. Just right on! we have >1800 users from more institutions than I can count, and we just tell users to submit everything nontrivial through the queueing system. only rarely do we need to scold users for running nontrivial jobs on login nodes, and even more rarely do any of them mess with compute nodes directly. we make no effort to actually _prevent_ users from ssh'ing to compute nodes. IMO, part of this is making it easy to submit jobs. in our environment, just prefix your command by "sqsub" and it goes onto a compute node. (that is, extra job scripts are a mistake, and it's important to propogate cwd and env transparently to jobs.) if compilers took more than milliseconds to run, people could certainly define FC="sqsub f90" if they wanted, and it would work correctly. From hahn at mcmaster.ca Sun May 6 22:13:45 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 7 May 2007 01:13:45 -0400 (EDT) Subject: [Beowulf] MPI application benchmarks In-Reply-To: <20070506190520.GB28930@stikine.ucs.sfu.ca> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> <20070506190520.GB28930@stikine.ucs.sfu.ca> Message-ID: > Sigh. I thought I could avoid that response. Our own code (due to the no. > of users who all believe that their code is the most important and > therefore must be benchmarked) is so massive that any potential RFP > respondent would have to work a year to run the code. Thus, we have to sure. the suggestion is only useful if the cluster is dedicated to a single purpose or two. for anything else, I really think that microbenchmarks are the only way to go. after all, your code probably doesn't do anything which is truely unique, but rather is some combination of a theoretical microbenchmark "basis set". no, I don't know how to establish the factor weights, or whether this approach really provides a good predictor. but isn't it the obvious way, even the only tractable way? >> Anyone who does something different for serious RFP purposes is playing >> with their lives (at least in our surroundings - civil servants are >> heavily watched as far as fraud / or attempted fraud in these case go). I don't really understand this statement. no one is really going to audit your decision and make you prove that you bought from the "correct" vendor - you simply need to have a plausible rationale for the decision. > E.g., we will almost certainly include gromacs (which still leaves the > question of the input parameters, etc.). that's what makes the "your own code" suggestion so uselessly narrow. I'd be surprised if gromacs couldn't be persuaded (through varied inputs and config) to prefer most any particular hardware: IB vs 10G, x86_64 vs ia64 vs power, even more-cheaper-smaller vs fewer-fatter nodes. this is, of course, complicated by the fact that some workloads use 5 MB/core, and others would like 6000x that much. the former are probably serial, and the latter are probably not large-tight-mpi. I know of no really good way to grok this in its fullness. From landman at scalableinformatics.com Sun May 6 22:43:06 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 07 May 2007 01:43:06 -0400 Subject: [Beowulf] quickie OFED build question In-Reply-To: <463E025E.90409@scalableinformatics.com> References: <463E025E.90409@scalableinformatics.com> Message-ID: <463EBC6A.3010701@scalableinformatics.com> Joe Landman wrote: > Will ask this on the OFED mailing lists in a bit, but has anyone > successfully built OFED 1.x against OpenSuSE 10.2 ? > > I have 1.0, 1.1, and 1.2-rc2 failing. Works on OpenSuSE 10.1. Nature > of the problem appears to be some kernel changes between versions > (2.6.16 vs 2.6.18) that pulled some specific things out. Also, I want > to turn off the -m32 switch usage, badly. Any clues/patches out there? > Google hasn't been very helpful here (must be searching for wrong things). So the answer is ... update the kernel. 2.6.18 in the OpenSuSE 10.2 flavor is not compatible (currently) with OFED-1.2 (nightly, past rc2). Looks like they are focussing on specific RH/SuSE builds. Haven't seen it on Debian/Ubuntu. Built it on OpenSuSe with updated 2.6.20.11 kernel. Thought you would like to know. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From rgb at phy.duke.edu Sun May 6 23:22:26 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 7 May 2007 02:22:26 -0400 (EDT) Subject: [Beowulf] SSH without login in nodes In-Reply-To: References: <48f9d1380705021557i1196eb25w44cf4ef8ebca0572@mail.gmail.com> <200705051442.59044.csamuel@vpac.org> <200705050924.37869.kilian@stanford.edu> <200705061758.54756.csamuel@vpac.org> Message-ID: On Mon, 7 May 2007, Mark Hahn wrote: >> Yes, this is the other solution. Do nothing fancy in script-land. Just > > right on! we have >1800 users from more institutions than I can count, > and we just tell users to submit everything nontrivial through the queueing > system. only rarely do we need to scold users for running nontrivial jobs on > login nodes, and even more rarely do any of them mess with compute nodes > directly. we make no effort to actually _prevent_ users from ssh'ing to > compute nodes. > > IMO, part of this is making it easy to submit jobs. in our environment, > just prefix your command by "sqsub" and it goes onto a compute node. > (that is, extra job scripts are a mistake, and it's important to propogate > cwd and env transparently to jobs.) if compilers took more than milliseconds > to run, people could certainly define FC="sqsub f90" if they wanted, and it > would work correctly. Ya. In the olde days they would describe this as the difference between fascist administration style and -- not so fascist administration style. It is important to differentiate between security requirements and the human desire to control other humans. Even security is a cost-benefit trade-off, but it is one where being a bit more "stringent" is appropriate. Where it isn't security, you'll probably save energy effort time ulcers if you just back off on rigid control and try instead to use human measures. In fact, even to use a queuing system in the first place requires enough users and conflicting requirements to make it worth all the hassle. You can go a long ways on a small mixed-ownership cluster where people tend to run long jobs on their own nodes all the time and share other people's when they are idle with the shout-down-the hall "hey Joe -- you using those nodes?" scheduler. rgb > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From toon.knapen at fft.be Mon May 7 00:56:10 2007 From: toon.knapen at fft.be (Toon Knapen) Date: Mon, 07 May 2007 09:56:10 +0200 Subject: [Beowulf] MPI application benchmarks In-Reply-To: References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> <20070506190520.GB28930@stikine.ucs.sfu.ca> Message-ID: <463EDB9A.6000009@fft.be> Mark Hahn wrote: >> Sigh. I thought I could avoid that response. Our own code (due to the no. >> of users who all believe that their code is the most important and >> therefore must be benchmarked) is so massive that any potential RFP >> respondent would have to work a year to run the code. Thus, we have to > > sure. the suggestion is only useful if the cluster is dedicated to a > single purpose or two. for anything else, I really think that > microbenchmarks are the only way to go. after all, your code probably > doesn't do anything which is truely unique, but rather is some > combination of a theoretical microbenchmark "basis set". no, I don't > know how to establish the factor weights, or whether this approach > really provides a good predictor. but isn't it the obvious way, > even the only tractable way? > Agreed. On one hand you need micro-benchmark. OTOH you need your users to specify what are the sensitive points of their application. First of all, I suppose their applications are parallel, but are they BW-bound or latency-bound. How much time do the applications spend on communication? Are the app's capable of running in mixed-mode (MPI combined with multithreading), ... Why do'nt you make a list of multiple-choice questions in a style as described above and ask your users to fill that in. This solves also the 'weighting factor' because the users that respond to your question _care_ about the machine being suitable while the others care less. t From peter.st.john at gmail.com Mon May 7 07:23:35 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Mon, 7 May 2007 10:23:35 -0400 Subject: [Beowulf] new to clusters... In-Reply-To: <213386ce0705052348q6c1bd4eatb78fb307c7c6ff38@mail.gmail.com> References: <213386ce0705052348q6c1bd4eatb78fb307c7c6ff38@mail.gmail.com> Message-ID: Mark, You might be interested in the discussion that followed Kyle Spaan "A Start in Parallel Programming" from March 13. I believe we had some discussion of relatively simple parallelizable apps, as well as a language flame war pitting the Righteous against the Misguided :-) I believe RGB actually detonated a small fissionable device. Peter On 5/6/07, Mark Thompson wrote: > > I am new to clusters and I am trying to learn as much as possible. I > currently run the new ubuntu feisty fawn and am learning as much as I can > about networking and such. I have several older computers ranging from a > pentium 3 500 mhz to a athlon 1.8 ghz and even have a dual athlon 1.6 ghz > server. I am working on getting more second hand computers for purposes of > learning clustering and would like to know others input on my idea as well > as suggestions for doing this. Any input would be helpful. Also, I would > like suggestions for what to do with it once I have it up and running. I > wouldn't mind providing services for others....would give me good experience > with administration and such. thanks for your time. > > Mark > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From elken at pathscale.com Mon May 7 08:30:42 2007 From: elken at pathscale.com (Tom Elken) Date: Mon, 07 May 2007 08:30:42 -0700 Subject: [Beowulf] MPI application benchmarks Message-ID: <463F4622.2000803@pathscale.com> "Martin Siegert" wrote: > In that respect the SPEC MPI2007 benchmark appears to be ideal. Alas, > it does not appear to be open source (please correct me if I am wrong) SPEC MPI2007 will not be open source, but the price will not be very high either (and that price could be left up to the vendors to pay, bidders on your procurement). > - so far I have not even been able to figure out which applications are > being used in that benchmark suite. The names of the SPEC MPI2007 benchmarks have not been made public yet. SPEC confidentiality rules do not allow that until the suite is released. One reason for that is that it is still possible to drop a candidate code from the suite (though unlikely) and SPEC wants to spare the author of such a code possible embarrassment. The expected release date is late June this year. To give more of a hint about what MPI2007 will be like, it will consist of real applications that have been made more portable and had correctness tests defined. The types of applications of the CPU2006 floating point suite ( http://www.spec.org/cpu2006/CFP2006/ ) are similar to what will be in the MPI2007 suite, though only a few are in common between the two suites. I don't think I'm allowed to reveal the price of the MPI2007 suite yet, but it should be somewhat similar to the price structure for CPU2006 (hint, hint). > From: "Brian D. Ropers-Huilman" Subject: To: "Martin Siegert" Cc: beowulf at beowulf.org Message-ID: Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 5/4/07, Martin Siegert wrote: >> > We will be purchasing a shared cluster for a wide community (currently >> > more than 1000 users). Thus, the common response on this list to evaluate >> > hardware - "use your own application as benchmark" - does not work: >> > users change, users' applications change, etc., etc. Thus, I need a >> > benchmark suite that tests a wide spectrum of properties. > > My answer is still to "use your own application(s)." Poll your users > and find out what they have and what they are going to run. Find some > who already have codes that scale well (>1000 cores) and ask them to > participate. Many vendors will allow you to run your own codes on > systems they have at their own sites before you decide to purchase. > These vendor-hosted systems are typically only 256 cores or less, but > it gives you some idea as to how your codes might run. > > I also suggest picking some representative synthetic benchmarks to > test floating point and integer operations, memory bandwidth, MPI > ping-pongs (the SPEC MPI2007, among others, would fit here), SPEC MPI2007 will fit in the class of what I call standard benchmarks, but not "synthetic" benchmarks. SPEC starts with the real applications as you download them, and then tries to make as few changes as possible to make it so the codes can be built, and the results can be validated on a wide range of compilers and platforms. > the HPC > Challenge codes, and the like. Many sites will then take all of these > results (synthetic + their own applications) and aggregate the > results, possibly with weighting factors, into a single number. Agreed. This is a good strategy that us used quite often. Benchmarks like SPEC CPU2006 are often added to various synthetic benchmarks and user applications as benchmark requirements for an RFP. We hope that the same will happen with SPEC MPI2007. Hopefully having a suite of applications available will make you feel like you don't have to develop quite as large a (portable) suite of your own applications as you would otherwise, or to deal as many questions from vendors as they try to get your codes to run. -Tom Elken QLogic Corp. and the SPEC High Performance Group. From wrankin at ee.duke.edu Mon May 7 14:07:05 2007 From: wrankin at ee.duke.edu (Bill Rankin) Date: Mon, 7 May 2007 17:07:05 -0400 Subject: [Beowulf] MPI application benchmarks In-Reply-To: <463EDB9A.6000009@fft.be> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> <20070506190520.GB28930@stikine.ucs.sfu.ca> <463EDB9A.6000009@fft.be> Message-ID: <93630D73-4A52-40B6-B983-D45AA4A8BD94@ee.duke.edu> Toon Knapen wrote: > Mark Hahn wrote: >> sure. the suggestion is only useful if the cluster is dedicated >> to a single purpose or two. for anything else, I really think >> that microbenchmarks are the only way to go. I'm not sure that I agree with this - there are just so many different micro benchmarks that I would worry that relying upon them for anything other than basic system validation (which they are very good at) leaves the potential for some very big holes in your requirements. Especially in a general purpose system like the one proposed. > Why don't you make a list of multiple-choice questions in a style > as described above and ask your users to fill that in. This solves > also the 'weighting factor' because the users that respond to your > question _care_ about the machine being suitable while the others > care less. This is an excellent idea. Even with 1000's of users, you still need to understand the mix of application types you will see. Then you can make some informed judgments on the overall system architecture. Some questions that you will need to consider: - What is the "experience level" of your audience. ? - Are they writing their own MPI or pthreads code? Or are they just using canned apps? What apps and how do you handle licenses? - Do you have a significant population who are looking to do larger scale (100+ process) MPI jobs? Is it worth the expense to invest in high-speed interconnects, or is GigE sufficient? - Pay attention to the storage system, both in scale and performance. It's often both a hidden bottleneck and a single point of failure. - Do you have users who want to toss around Terabytes off data? - How do you plan on backing up these Terabytes of data? These are just a few off the top of my head. There are more. But I think that a quick fairly simple survey will be invaluable towards planning this facility. Good luck, -bill From rgb at phy.duke.edu Mon May 7 14:59:18 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Mon, 7 May 2007 17:59:18 -0400 (EDT) Subject: [Beowulf] MPI application benchmarks In-Reply-To: <93630D73-4A52-40B6-B983-D45AA4A8BD94@ee.duke.edu> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> <20070506190520.GB28930@stikine.ucs.sfu.ca> <463EDB9A.6000009@fft.be> <93630D73-4A52-40B6-B983-D45AA4A8BD94@ee.duke.edu> Message-ID: On Mon, 7 May 2007, Bill Rankin wrote: > > Toon Knapen wrote: >> Mark Hahn wrote: >>> sure. the suggestion is only useful if the cluster is dedicated to a >>> single purpose or two. for anything else, I really think that >>> microbenchmarks are the only way to go. > > I'm not sure that I agree with this - there are just so many different micro > benchmarks that I would worry that relying upon them for anything other than > basic system validation (which they are very good at) leaves the potential > for some very big holes in your requirements. Especially in a general > purpose system like the one proposed. I think that there is something to be said for both. The standard answer for how to prototype systems for use in a single or few purpose cluster is "with your applications" to be sure, but general purpose clusters are a different ball of wax because they inevitably involve cost-benefit compromises. Remember, one has to optimize design between relatively few, very fast, very large memory, very expensive network nodes (suitable for tightly coupled fine-grained parallel stuff) where on might well spend MORE on just network and memory than on "the computer" (everything else) and a much larger stack of utterly disposable boxes on pretty much any old network with a base 512 MB of memory (which is more than you need but it is difficult to get any more) to run very simple EP applications. [Leaving out clusters with complex or high speed storage requirements, which basically can double the high end cost again or close to it.] When computing the ecomonics of the compromises involved when you don't even know for sure what KIND of applications that will be running, or are pretty sure they range from (say) 70% EP or very coarse grained to 30% fine grained, with only the most generic idea of the needs of the fine grained applications (to what extent are they latency bound? bw bound? memory bound? storage bound?) then it really, really helps to have a "rich" set of microbenchmarks -- at least at the level of lmbench if not beyond. Otherwise you might know how fast it (comparably) runs Joe Down the Hall's application, but without a lot of study you won't have any idea what that means. Something that is to some extent true even of suites of macro benchmarks like SPEC -- unless you really take the time to work through the code and see what it looks like, even a "typical monte carlo" benchmark component might not scale at all like YOUR monte carlo benchmark because theirs might be in 2D with binary (Ising) spins and sized to fit into cache on a modern 64 bit CPU while yours might be in 4D with O(3) spins and a fair bit of trig per update, or theirs might be Metropolis and yours might be heat bath or cluster (sampling method) with very different algorithms. Just for a single example -- stream is very popular, but omits division and doesn't give you a good picture of memory access times in LESS than ideal circumstances where instead of streaming through vector like one bops all over. There is a pretty big difference there in both cases, and yes, real code sometimes requires actual division, real code (especially simulation code) sometimes cannot be made "cache local" and sometimes cannot be made memory local at all. Stream, or application benchmarks LIKE stream, that are linear algebra, multiply/add heavy may give you no clue as to how the system will respond if it is asked to divide numbers pulled from all over memory, maybe with a bit of trig or other transcendental calls mixed in. And did I mention the variations associated with compiler? Or sometimes with bios configuration, memory, operating system (the code almost certainly contains systems calls if only for memory management). There is nothing wrong with SPEC as a measure of general purpose performance, and things like HPCC have been engineered by smart people who know all of this. They are useful. Applications (where you know what they are) that will run on the system need no justification to use as a benchmark. However, if one needs to be able to ESTIMATE upper or lower performance bounds on code you haven't seen before, code that may not even exist yet, microbenchmarks are very, very useful. At least then one can say "Gee, I don't know specifically about your code, but small-message latency on this system is X, and here is a graph of latency/bandwidth as a function of packet size, this shows you how long it takes to memcpy a block of memory, here are some curves that show how it degrades for different sizes of memory (as they sweep across cache boundaries) and use non-favorable patterns, here is how fast it generates the primary transcendentals for this particular compiler -- if you know how much your code does these things, you can at least estimate your code's performance on this hardware with this OS and compiler and network." Perhaps fortunately (perhaps not) there is a lot less variation in system performance with system design than there once was. Everybody uses one of a few CPUs, one of a few chipsets, generic memory, standardized peripherals. There can be small variations from system to system, but in many cases one can get a pretty good idea of the nonlinear "performance fingerprint" of a given CPU/OS/compiler family (e.g. opteron/linux/gcc) all at once and have it not be crazy wrong or unintelligible as you vary similar systems from different manufacturers or vary clock speed within the family. There are enough exceptions that it isn't wise to TRUST this rule, but it is still likely correct within 10% or so. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From siegert at sfu.ca Mon May 7 15:05:54 2007 From: siegert at sfu.ca (Martin Siegert) Date: Mon, 7 May 2007 15:05:54 -0700 Subject: [Beowulf] MPI application benchmarks In-Reply-To: <463EDB9A.6000009@fft.be> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> <20070506190520.GB28930@stikine.ucs.sfu.ca> <463EDB9A.6000009@fft.be> Message-ID: <20070507220554.GG30191@stikine.ucs.sfu.ca> On Mon, May 07, 2007 at 09:56:10AM +0200, Toon Knapen wrote: > Mark Hahn wrote: > >>Sigh. I thought I could avoid that response. Our own code (due to the no. > >>of users who all believe that their code is the most important and > >>therefore must be benchmarked) is so massive that any potential RFP > >>respondent would have to work a year to run the code. Thus, we have to > > > >sure. the suggestion is only useful if the cluster is dedicated to a > >single purpose or two. for anything else, I really think that > >microbenchmarks are the only way to go. after all, your code probably > >doesn't do anything which is truely unique, but rather is some > >combination of a theoretical microbenchmark "basis set". no, I don't > >know how to establish the factor weights, or whether this approach > >really provides a good predictor. but isn't it the obvious way, > >even the only tractable way? > > > > > Agreed. On one hand you need micro-benchmark. OTOH you need your users > to specify what are the sensitive points of their application. First of > all, I suppose their applications are parallel, but are they BW-bound or > latency-bound. How much time do the applications spend on communication? > Are the app's capable of running in mixed-mode (MPI combined with > multithreading), ... > > Why do'nt you make a list of multiple-choice questions in a style as > described above and ask your users to fill that in. This solves also the > 'weighting factor' because the users that respond to your question > _care_ about the machine being suitable while the others care less. Several reasons why this does not work quite the way we would like: - it is surprising (or not ...) how many users simply do not know how to characterize their application. The only way is to get a copy of those applications that chew up most of the walltime, compile them, try to understand what the application is doing, and then classify it yourself. Takes a lot of time ... unless it is a well known application like gromacs. - we start the work on the benchmark now. The RFP will be issued many months down the road. The equipment will be purchased many more months down the road, ... At the time when users get on the facilities the users have change, their applications have changed, etc., etc. Thus, "our own applications" today are not (necessarily) relevant tomorrow when the equipment is purchased. Thus, using setting up a benchmark suite that covers the whole spectrum of "interesting" applications and then use weight factors at the time when you make the decision is more practical and sensible than the "use your own applications" approach. (You do have to know at least approximately in which category those applications fall you care most about, but you do not have to assemble those applications into a benchmark suite). Cheers, Martin From toon.knapen at fft.be Mon May 7 22:37:01 2007 From: toon.knapen at fft.be (Toon Knapen) Date: Tue, 08 May 2007 07:37:01 +0200 Subject: [Beowulf] MPI application benchmarks In-Reply-To: <20070507220554.GG30191@stikine.ucs.sfu.ca> References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> <20070506190520.GB28930@stikine.ucs.sfu.ca> <463EDB9A.6000009@fft.be> <20070507220554.GG30191@stikine.ucs.sfu.ca> Message-ID: <46400C7D.20304@fft.be> Martin Siegert wrote: >> Why do'nt you make a list of multiple-choice questions in a style as >> described above and ask your users to fill that in. This solves also the >> 'weighting factor' because the users that respond to your question >> _care_ about the machine being suitable while the others care less. > > Several reasons why this does not work quite the way we would like: > - it is surprising (or not ...) how many users simply do not know > how to characterize their application. The only way is to get a > copy of those applications that chew up most of the walltime, > compile them, try to understand what the application is doing, > and then classify it yourself. Takes a lot of time ... unless > it is a well known application like gromacs. Are your users using commercial apps or home-brewn apps? If your users use home-brewn stuff, they (at least the coder) should know how to charaterize the code. If your users use commercial apps, ask the vendor of the app. They should be able to consult you in what should be the best hardware for their app (at least, that's what we do) > - we start the work on the benchmark now. The RFP will be issued > many months down the road. The equipment will be purchased many > more months down the road, ... At the time when users get on the > facilities the users have change, their applications have changed, > etc., etc. Thus, "our own applications" today are not (necessarily) > relevant tomorrow when the equipment is purchased. > Carefully engineered apps that try to get every bit of performance out of the system do not evolve (in terms of requirements on the system) that fast. So for the apps that do care about performance, the requirements do not change fast over time. From toon.knapen at fft.be Mon May 7 23:38:34 2007 From: toon.knapen at fft.be (Toon Knapen) Date: Tue, 08 May 2007 08:38:34 +0200 Subject: [Beowulf] MPI application benchmarks In-Reply-To: References: <20070504225349.GC17163@stikine.ucs.sfu.ca> <463DC6FC.4060202@moene.indiv.nluug.nl> <20070506190520.GB28930@stikine.ucs.sfu.ca> <463EDB9A.6000009@fft.be> <93630D73-4A52-40B6-B983-D45AA4A8BD94@ee.duke.edu> Message-ID: <46401AEA.8000004@fft.be> Robert G. Brown wrote: > Perhaps fortunately (perhaps not) there is a lot less variation in > system performance with system design than there once was. Everybody > uses one of a few CPUs, one of a few chipsets, generic memory, > standardized peripherals. There can be small variations from system to > system, but in many cases one can get a pretty good idea of the > nonlinear "performance fingerprint" of a given CPU/OS/compiler family > (e.g. opteron/linux/gcc) all at once and have it not be crazy wrong or > unintelligible as you vary similar systems from different manufacturers > or vary clock speed within the family. There are enough exceptions that > it isn't wise to TRUST this rule, but it is still likely correct within > 10% or so. I agree that this rule is true for almost all codes ... that are perfectly in cache and that do not try to benefit from specific optimisations. HPC codes however are always pushing the limits and this means you will always stumble on some bottleneck somewhere. Once you removed the bottleneck, you stumble on another. And every bottleneck mask all others until you remove it. E.g. it was already mentioned in this thread that one should not forget to pay attention to storage. However often people run parallel codes with each process performing heave IO without an adapted storage system. Or another example, GotoBLAS is well known to outperform netlib-blas. However, in an application calling many dgemm's on small matrices (up to 50x50), netlib-blas will _really_ (i.e. a factor 30) outperform GotoBLAS (because GotoBLAS 'looses' time aligning the matrices etc. which becomes significant for small matrices) toon From Michael.Fitzmaurice at gtsi.com Fri May 4 10:09:18 2007 From: Michael.Fitzmaurice at gtsi.com (Michael Fitzmaurice) Date: Fri, 4 May 2007 13:09:18 -0400 Subject: [Beowulf] bwbug: Next BWBUG meeting will be May 8th at Georgetown University Message-ID: The speaker will be Terry Hulettfrom NetEffect to talk about the very fast low latency HPC Interconnect that runs on 10 Gb TCP/IP. Terry Hulett Vice President of Architecture and Silicon Engineering Terry leads NetEffect's innovation efforts, leveraging more than 23 years of semiconductor design experience. Prior to joining Banderacom and NetEffect, he held key positions at Advanced Micro Devices (AMD) and Motorola Semiconductor, leading several microprocessor development teams. At AMD, Terry led the development of over 15 chips and built AMD's first high-volume card-based products. He joined Banderacom in 1999 where he successfully led the Company's InfiniBand chip development program delivering the first product in 9 months. Terry also previously served as vice president of research and development for Austin-based Si Solutions, Inc. Terry received his BSEE at Oklahoma State University. Mike Fitzmaurice -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- _______________________________________________ bwbug mailing list bwbug at bwbug.org http://www.pbm.com/mailman/listinfo/bwbug From mark.t.79 at gmail.com Sun May 6 17:28:36 2007 From: mark.t.79 at gmail.com (Mark Thompson) Date: Sun, 6 May 2007 20:28:36 -0400 Subject: [Beowulf] bsd implementation? Message-ID: <213386ce0705061728h67cf626r96bf913cd5b563ac@mail.gmail.com> I was wondering if linux was the only os that is practical to implement a beowulf cluster with...it seems to me that BSD would be good for this and I know dragonfly BSD is working toward providing this natively...my question is that what are the limitations presented when using BSD in clusters? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosing at peakfive.com Mon May 7 09:48:04 2007 From: rosing at peakfive.com (Matt) Date: Mon, 7 May 2007 10:48:04 -0600 Subject: [Beowulf] Re: programming questions here? In-Reply-To: <200705070758.l477w7MU013014@bluewest.scyld.com> References: <200705070758.l477w7MU013014@bluewest.scyld.com> Message-ID: <17983.22596.769231.783245@lala.site> > At 08:53 AM 5/6/2007, Robert G. Brown wrote: > >There are many programmers on the list. Indeed, I'd guess that the > >majority of posters are coders as well as admins, cluster engineers, > >researchers. If you want to see the coders in action, just post a > >question like "Is C or Pascal a better language for writing parallel > >code." Just be sure to put on a biohazard suit first. > > Actually, if you really want to see a lot of traffic, you need to > post a question like: > > "I know that SNOBOL is the best language for parallel programming, > but because distro X has a lame implementation, am I really stuck > using the brain-dead constructs of Ada, or would I be better using > distro Y and APL, and rewriting the compiler for my needs." > > {Langauge names changed to avoid stepping on too many landmines} > > It's the combination of the absolute assertion of superiority for one > choice and the denigration of multiple other choices that will really > bring out the comments. > > As you say, fireproof suit required. SNOBOL, huh? I don't know who's showing their age more, you for mentioning it or me for knowing what you're talking about :) I'm sure someone is still using it now but when I used it it was written on punch cards. But I don't want to start a flame war. I don't care about what programming languages people use. I can use whatever you'd like me to use. I'm really trying to figure out whether I should stay in the parallel processing field. I don't get paid to write papers, although I do on occasion. I get paid to do two rather synergistic activities. First, I help people maintain (debug, port, optimize, etc) parallel code, and second I build software tools to help this process. Actually, I don't get paid to build the tools but I use them. The main tool I built was a preprocessor that takes fortran plus user defined directives and parallelizes the code, or instruments the code, or helps debug the code, or whatever might benefit from rewriting the code. I've used this tool on several 100k+ LOC programs and it's a very powerful technique. After doing this for a number of years I've come to the conclusion that developing and maintaining parallel code is so expensive that few people can afford to maintain old code and add new features. Debugging, porting, and optimizing parallel code is a real time sink. Numerical code is even worse. For example, it took months of time to get one code from 1.5% of peak performance on one processor to 20% on 1800 processors. The code has evolved from early days on a CM-2 and it's difficult to understand. It can take weeks to find out why some variable starts losing 5 or 6 digits of accuracy when running over 1000 processors. One big question I have is what other people have these kinds of problems? Does anyone know how much money they're spending on maintaining old code as opposed to creating new? I'd like to extend the tools I've built and find customers that either want these tools or would pay me to help them, but I first want to find some potential customers to find out what they need to make sure I'm on the right track. Thanks, Matt From fly at anydata.co.uk Tue May 8 05:22:15 2007 From: fly at anydata.co.uk (Fred Youhanaie) Date: Tue, 08 May 2007 13:22:15 +0100 Subject: [Beowulf] bsd implementation? In-Reply-To: <213386ce0705061728h67cf626r96bf913cd5b563ac@mail.gmail.com> References: <213386ce0705061728h67cf626r96bf913cd5b563ac@mail.gmail.com> Message-ID: <46406B77.2020101@anydata.co.uk> Mark Thompson wrote: > I was wondering if linux was the only os that is practical to implement a > beowulf cluster with...it seems to me that BSD would be good for this and I > know dragonfly BSD is working toward providing this natively...my question > is that what are the limitations presented when using BSD in clusters? Mark, Here is one example of a BSD based cluster from 2003, I expect things have moved on since then. http://people.freebsd.org/~brooks/papers/bsdcon2003/fbsdcluster/ Cheers f. From larry.stewart at sicortex.com Tue May 8 06:11:01 2007 From: larry.stewart at sicortex.com (Larry Stewart) Date: Tue, 08 May 2007 09:11:01 -0400 Subject: [Beowulf] Re: programming questions here? In-Reply-To: <17983.22596.769231.783245@lala.site> References: <200705070758.l477w7MU013014@bluewest.scyld.com> <17983.22596.769231.783245@lala.site> Message-ID: <464076E5.1070106@sicortex.com> > > At 08:53 AM 5/6/2007, Robert G. Brown wrote: > > > Actually, if you really want to see a lot of traffic, you need to > > post a question like: > > > > "I know that SNOBOL is the best language for parallel programming, > > but because distro X has a lame implementation, am I really stuck > > using the brain-dead constructs of Ada, or would I be better using > > distro Y and APL, and rewriting the compiler for my needs." > > So in 1984 I was teaching a programming languages class, with assignments in Pascal, LISP, APL, and SNOBOL. The SNOBOL image (running on TOPS-20) had been assembled in 1971 (!). The SNOBOL assignment was to fit solution words into a crossword puzzle using pattern matching. One student had a solution that was actually fast as well a right. If you held up the listing at arms length, you would swear it was a FORTRAN program. From which I conclude that you can write fast programs in any language, but they come out looking like FORTRAN. -Larry From rgb at phy.duke.edu Tue May 8 06:48:40 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 8 May 2007 09:48:40 -0400 (EDT) Subject: [Beowulf] Re: programming questions here? In-Reply-To: <464076E5.1070106@sicortex.com> References: <200705070758.l477w7MU013014@bluewest.scyld.com> <17983.22596.769231.783245@lala.site> <464076E5.1070106@sicortex.com> Message-ID: On Tue, 8 May 2007, Larry Stewart wrote: > One student had a solution that was actually fast as well a right. > If you held up the listing at arms length, you would swear > it was a FORTRAN program. > > From which I conclude that you can write fast programs in > any language, but they come out looking like FORTRAN. Hmmm, a sample of one leading to a conclusion? That would be "unless they are written in APL, in which case they come out looking like Martian, or are written in C, in which case they come out look like they were written in the Language of God, or are written in assembler, in which case they come out looking like they were written in the lost proto-indo-european script, translated semiphonetically into english, and then somebody came along and wiped out 2/3 of the letters more or less at random". :-) rgb > > -Larry > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From jac67 at georgetown.edu Wed May 9 14:19:14 2007 From: jac67 at georgetown.edu (Jess Cannata) Date: Wed, 09 May 2007 17:19:14 -0400 Subject: [Beowulf] Off Topic: Job Opening (Cluster Admin) Message-ID: <46423AD2.6020401@georgetown.edu> I am posting this for a friend. The group runs a couple of Beowulf clusters and is located in Baltimore. For more information, contact Alex MacKerell. His contact information is out the bottom of the e-mail. Jess *** POSITION TYPE: Exempt Position, Department of Pharmaceutical Sciences / School of Pharmacy FUNCTIONAL TITLE: Systems Programmer DESCRIPTION: This position provides system administration for both the Department of Pharmaceutical Sciences (PSC) and Pharmaceutical Research Computing (PRC) of the Department of Pharmaceutical Health Services Research in the School of Pharmacy. This position will maintain all the computational resources, facilitate their replacement and help direct expansion of the scientific computing capabilities of both the PSC department (80%) and PRC (20%). The research programs within PSC department involve an extensive amount of scientific computing. Accordingly, the PSC department has a variety of computing, storage, and backup systems. These include a 48 processor Linux cluster running 2.8 GHz Athlon processors, an Opteron cluster with 25 dual core 2.4 GHz Athlon Opteron nodes, a 32 processor Linux cluster running 2.4 GHz Athlon Opertons, 4 MacIntosh and one Windows PC, 15 Linux-based workstations, multiple color HP laser printers, a 2.5 Terabyte RAID storage system including 4 dual processor 2.8 GHz Althon nodes along with tape backup (Amanda) and an additional 2.5 Terabyte RAID storage system. These resources include workstations that connect to the departmental NMR and X-ray spectrometers. The research programs within PRC primarily involve data management and analytic computing. The PRC owned computer system consists of Solaris and Linux based servers, Network Area Storage devices, and peripheral tape (DLT) readers. Responsibilities will include insuring the security of the local scientific computing environment, user account management, direction, maintenance and upgrades of system hardware and software, including end-user scientific application software, as well as the management of any backup systems. REQUIREMENTS: Requires at least 4 years of appropriate education, experience and/or training with networked centrally operated and supported computing systems. The ideal candidate will be very familiar with operating systems commonly used in scientific computing (Unix, Linux, Irix, Solaris), Amanda backup system and related-technologies including multiprocessing, distributed computing and networking (e.g., NFS). The individual should also be very comfortable connecting these systems to each other as well as PCs based on the Windows and Macintosh OS X platforms. The position will require familiarity with contemporary TCP/IP networks and their technologies. HIRING RANGE: $55,000 to $75,000 (Commensurate with education and experience) Contact Alex MacKerell for more information alex at outerbanks.umaryland.edu 410-706-7442 From angelv at iac.es Tue May 8 06:25:36 2007 From: angelv at iac.es (Angel de Vicente) Date: Tue, 08 May 2007 14:25:36 +0100 Subject: [Beowulf] SSH without login in nodes In-Reply-To: <200705041340.50461.kilian@stanford.edu> (Kilian CAVALOTTI's message of "Fri, 4 May 2007 13:40:50 -0700") References: <200705041340.50461.kilian@stanford.edu> Message-ID: Hi, If it is of any help, we use a similar setting to the one given below by Kilian, where our access file in the compute nodes only has root and myself. When a user submits something to the queuing system (Torque+Maui), the access.conf of the given nodes is modified with a prologue script, so that access is given to them in the allocated nodes, and when the job finishes their name is taken from access.conf in an epilogue script. Nothing fancy, but it works pretty well (you could easily figure how to abuse it, but people usually behave nicely, and this was needed mostly to prevent accidentally submitting jobs to other nodes, not to tackle abuse). At the same time, we have a script that runs once per day to check whether there are any jobs from users not allowed (according to the queueing system) to do so, and if found they are just mercilessly killed (on very rare occasions zombies are hanging around). To the original poster, if you want details about this setting, I can provide them, with a special bonus: in Spanish :-) Cheers, ?ngel de Vicente Kilian CAVALOTTI writes: > Hi, > > On Friday 04 May 2007 01:06:51 pm Peter St. John wrote: >> There was a typogrphical error in the question. I had a brief exchange >> with se=F1or Gomez and he confirmed this translation: >> >> I am configuring a cluster with ssh (but without passwords) and >> currently the users can log in to compute nodes. >> I wish the clients to use the queue system (Torque, it works fine) >> without being able to access the compute nodes. >> In the past, we used rsh without allowing rlogin. > > What you can do is configure PAM on the nodes, to only allow login for a=20 > specific set of users, if any. It should come with any modern distro. > > Be sure your /etc/pam.d/authconfig contains reference to pam_access, like= > : > account required /lib/security/$ISA/pam_access.so > > And configure /etc/security/access.conf to match your needs, like: > # Allow administrative login from everywhere > +:wheel staff:ALL > # Prevent user logins=20 > -:users:ALL > > You can give a look at=20 > http://www.informit.com/articles/article.asp?p=3D165226&seqNum=3D12&rl=3D= > 1 for=20 > more info. > > Cheers, > --=20 > Kilian > -- ---------------------------------- Instituto de Astrof?sica de Canarias From fslack at gmail.com Tue May 8 14:26:45 2007 From: fslack at gmail.com (fernando carazo) Date: Tue, 8 May 2007 18:26:45 -0300 Subject: [Beowulf] DisklessWorkstations Message-ID: Hi people, I'm trying to boot from 3 1/2 diskless, I using etherboot for it, the problem is: When I boot from 3 1/2 diskless it's all ok, It's detected the network card until it try to configure the IP, load the kernel and other things, the message that I received is: ... ... Probing pci nic .... [rtl8139] - ioaddr 0XB800, irq 3, addr 00:E0:7D:B0:B0:3C 10Mbps half-duplex Searching for server (DHCP)..... No IP address . No IP address . No IP address . No IP address . No IP address .. .. I boot from the hard disk between two machines and I don't have any problem, the client's network card received the IP from DHCP, but when I try to boot from 3 1/2 diskless I have problems. I'm waiting yours answers, regards, Fernando Carazo -------------- next part -------------- An HTML attachment was scrubbed... URL: From dkondo at lri.fr Wed May 9 01:40:01 2007 From: dkondo at lri.fr (Derrick Kondo) Date: Wed, 9 May 2007 10:40:01 +0200 Subject: [Beowulf] last [CFP] EuroPVM/MPI'07 -- submission deadline: May 14th Message-ID: <60ec14620705090140p48fbc9dbme40f16a30682087b@mail.gmail.com> ************************************************************************ *** *** *** LAST CALL FOR PAPERS *** *** *** ************************************************************************ EuroPVM/MPI 2007 14th European PVMMPI Users' Group Meeting Paris, France, September 30 - October 3, 2007 web: http://www.pvmmpi07.org e-mail: chairs at pvmmpi07.org submission deadline for full papers and poster abstracts: extended to May 14th, 2007 at 11:59 AM (noon) UTC submission site: http://pvmmpi07.lri.fr/submissions organized by Project Grand-Large (http://grand-large.lri.fr/index.php/Accueil) from INRIA Futurs (http://www-futurs.inria.fr) ------------------------------------------------------------------------------------------- BACKGROUND AND TOPICS PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) have evolved into the standard interfaces for high-performance parallel programming in the message-passing paradigm. EuroPVM/MPI is the most prominent meeting dedicated to the latest developments of PVM and MPI such as new support tools, implementation and applications using these interfaces. The EuroPVM/MPI meeting naturally encourages discussions of new message-passing and other parallel and distributed programming paradigms beyond MPI and PVM. The 14th European PVM/MPI Users' Group Meeting will be a forum for users and developers of PVM, MPI, and other message-passing programming environments. Through the presentation of contributed papers, vendor presentations, poster presentations and invited talks, attendees will have the opportunity to share ideas and experiences to contribute to the improvement and furthering of message-passing and related parallel programming paradigms. Topics of interest for the meeting include, but are not limited to: * PVM and MPI implementation issues and improvements * Latest extensions to PVM and MPI * PVM and MPI for high-performance computing, clusters and grid environments * New message-passing and hybrid parallel programming paradigms * Interaction between message-passing software and hardware * Fault tolerance in message-passing programs * Performance evaluation of PVM and MPI applications * Tools and environments for PVM and MPI * Algorithms using the message-passing paradigm * Applications in science and engineering based on message-passing This year special emphasis will be put on large-scale issues, such as those related to hardware and interconnect techologies, or the potential or demonstrated shortcomings of PVM or MPI. As in the preceding years, the special session 'ParSim' will focus on numerical simulation for parallel engineering environments. EuroPVM/MPI 2007 will also hold the new 'Outstanding Papers' session introduced in 2006, where the best papers selected by the program committee will be presented. SUBMISSION INFORMATION Submission site: http://pvmmpi07.lri.fr/submissions Contributors are invited to submit a full paper as a PDF (or Postscript) document not exceeding 8 pages in English (2 pages for poster abstracts and Late and Breaking Results). The title page should contain an abstract of at most 100 words and five specific keywords. The paper needs to be formatted according to the Springer LNCS guidelines [2]. The usage of LaTeX for preparation of the contribution as well as the submission in camera ready format is strongly recommended. Style files can be found at the URL [2]. New work that is not yet mature for a full paper, short observations, and similar brief announcements are invited for the poster session. Contributions to the poster session should be submitted in the form of a two-page abstract. All these contributions will be fully peer reviewed by the program committee. Submissions to the special session 'Current Trends in Numerical Simulation for Parallel Engineering Environments' (ParSim 2007) are handled and reviewed by the respective session chairs. For more information please refer to the ParSim website [1]. All accepted submissions are expected to be presented at the conference by one of the authors, which requires registration for the conference. IMPORTANT DATES Submission of full papers and poster abstracts May 14th, 2007 at 11:59 AM (noon) UTC Notification of authors June 19th, 2007 Camera-ready papers July 9th, 2007 Submission of Late and Breaking Results September 15th, 2007 Tutorials September 30th, 2007 Conference October 1st-3rd, 2007 For up-to-date information, visit the conference web site at http//www.pvmmpi07.org. PROCEEDINGS In addition, selected papers of the conference, including those from the 'Outstanding Papers' session, will be considered for publication in a special issue of Parallel Computing in an extended format. GENERAL CHAIR * Jack Dongarra (University of Tennessee) PROGRAM CHAIRS * Franck Cappello (INRIA Futurs) * Thomas Herault (Universite Paris Sud-XI / INRIA Futurs) CONFERENCE VENUE The conference will be held in the historical, cultural and economic center of Paris, the capital of France. The city, which is renowned for its neo-classical architecture, hosts many museums and galleries and has an active nightlife. The symbol of Paris is the 324 metre (1,063 ft) Eiffel Tower on the banks of the Seine. Dubbed "the City of Light" (la Ville Lumiere) since the 19th century, Paris is regarded by many as one of the most beautiful and romantic cities in the world. It is also the most visited city in the world with more than 30 million foreign visitors per year. Paris is easily reachable from any European capital and most of the large European, American and Asian cities. It is an ideal starting point for visiting european institutes and cities. REFERENCES [1] ParSim 2007: http://wwwbode.in.tum.de/Par/arch/events/parsim07/ [2] Springer Guidelines: http://www.springer.de/comp/lncs/authors.html From reuti at staff.uni-marburg.de Thu May 10 08:33:17 2007 From: reuti at staff.uni-marburg.de (Reuti) Date: Thu, 10 May 2007 17:33:17 +0200 Subject: [Beowulf] SSH without login in nodes In-Reply-To: References: <200705041340.50461.kilian@stanford.edu> Message-ID: Am 08.05.2007 um 15:25 schrieb Angel de Vicente: > If it is of any help, we use a similar setting to the one given > below by Kilian, > where our access file in the compute nodes only has root and > myself. When a user > submits something to the queuing system (Torque+Maui), the > access.conf of the > given nodes is modified with a prologue script, so that access is > given to them > in the allocated nodes, and when the job finishes their name is > taken from > access.conf in an epilogue script. > > Nothing fancy, but it works pretty well (you could easily figure > how to abuse > it, but people usually behave nicely, and this was needed mostly to > prevent > accidentally submitting jobs to other nodes, not to tackle abuse). > At the same > time, we have a script that runs once per day to check whether > there are any > jobs from users not allowed (according to the queueing system) to > do so, and if > found they are just mercilessly killed (on very rare occasions > zombies are > hanging around). This is always a point where I wonder, why there is still no rsh replacement in Torque like it's available in SUN GridEngine with its qrsh command. Simply disable rsh and ssh in the complete cluster (or limit it to admin staff) and all startup of processes in the cluster is done by SGE's private daemons for each qrsh call by using a Tight Integration - so you can't access nodes which you are not supposed to use. In addition, this will also allow correct accounting for Linda/ PVM jobs which still seems not be possible with Torque. -- Reuti From rgb at phy.duke.edu Thu May 10 09:01:34 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 10 May 2007 12:01:34 -0400 (EDT) Subject: [Beowulf] Quick question... on Fortran Message-ID: I am (as you may well know) extremely fortran averse. However, a researcher in our department has recently asked what the current limits are on the size of an array in modern fortran(s) under linux. I suppose he'd like an answer for both 32 and 64 bit systems. From what I have been able to google out, it looks like the answer is 2^31 bytes in 32 bit systems and as big as the hardware permits less maybe a GB in 64 bit systems. Is this correct? Any fortran experts out there? I'd be happy for answers for more than one compiler, of course -- I'd guess that e.g. pathscale might have a greater capacity than e.g. gfortran than g77 legacy... rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From laytonjb at charter.net Thu May 10 09:05:26 2007 From: laytonjb at charter.net (laytonjb at charter.net) Date: Thu, 10 May 2007 9:05:26 -0700 Subject: [Beowulf] Quick question... on Fortran Message-ID: <396669303.1178813126619.JavaMail.root@fepweb04> > I am (as you may well know) extremely fortran averse. and Hell freezes over... Georg Bush gets an IQ above 100... Hilary Clinton suddenly becomes nice... Pamela Anderson starts dating a non-rock star... Too many things happening too quickly. Ahh!!!!! From rosing at peakfive.com Thu May 10 09:59:19 2007 From: rosing at peakfive.com (Matt) Date: Thu, 10 May 2007 10:59:19 -0600 Subject: [Beowulf] re: Quick question... on Fortran In-Reply-To: <200705101607.l4AG6o1Z008302@bluewest.scyld.com> References: <200705101607.l4AG6o1Z008302@bluewest.scyld.com> Message-ID: <17987.20327.828615.862099@lala.site> >I am (as you may well know) extremely fortran averse. However, a >researcher in our department has recently asked what the current limits >are on the size of an array in modern fortran(s) under linux. I suppose >he'd like an answer for both 32 and 64 bit systems. From what I have >been able to google out, it looks like the answer is 2^31 bytes in 32 >bit systems and as big as the hardware permits less maybe a GB in 64 >bit The limit is 2GB for Intel ia-32 compilers, fortran or not. You may need to compile -static to get it. Since this is half the 4GB limit of a 32 bit system (maybe the last bit is lost because of signed arithmetic?), I'm guessing the ia-64 compilers can have much larger arrays. Fortran90 doesn't say anything about array limits. Matt From brian.dobbins at yale.edu Thu May 10 11:09:58 2007 From: brian.dobbins at yale.edu (Brian Dobbins) Date: Thu, 10 May 2007 18:09:58 -0000 Subject: [Beowulf] Quick question... on Fortran In-Reply-To: References: Message-ID: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> >From one Fortran averse person to another: Using the PGI compilers (at least as of 6.0, but 7.0 is out now and does the same), you can allocate at -least- up to 32GB in an array with Fortran on 64-bit systems. I say at least because I don't currently have more than 32GB on any of my nodes. :) The way to do this is to use the 'mcmodel=medium' option and to promote integers to 8-byte values (for indexing the entries in this array) with '-i8'. (For example: 'pgf90 -mcmodel=medium -i8 test.f90 -o test.exe') Of course, if you want to try this yourself, Robert, I'm happy to let you log in and give it a shot for the sole pleasure of seeing you touch Fortran code. ;) From rgb at phy.duke.edu Thu May 10 13:37:58 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 10 May 2007 16:37:58 -0400 (EDT) Subject: [Beowulf] Quick question... on Fortran In-Reply-To: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> References: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> Message-ID: On Sat, 14 Apr 2007, Brian Dobbins wrote: >> From one Fortran averse person to another: > > Using the PGI compilers (at least as of 6.0, but 7.0 is out now and does > the same), you can allocate at -least- up to 32GB in an array with Fortran > on 64-bit systems. I say at least because I don't currently have more > than 32GB on any of my nodes. :) > > The way to do this is to use the 'mcmodel=medium' option and to promote > integers to 8-byte values (for indexing the entries in this array) with > '-i8'. > > (For example: 'pgf90 -mcmodel=medium -i8 test.f90 -o test.exe') > > Of course, if you want to try this yourself, Robert, I'm happy to let you > log in and give it a shot for the sole pleasure of seeing you touch > Fortran code. ;) Hah. I still have several megabytes worth of fortran source code that I typed into a computer one byte at a time over a decade and a half. I also have a couple of books, e.g. Forsythe, Malcolm and Moler, with lovely numerical code in fortran written into them. I've even gone in and tediously, carefully, ported that code to C so I could still use it. Now I just use the GSL. I honestly wonder how much fortran legacy is maintained just because nobody wants to rewrite numerical code -- but either way the GSL is better than whatever it was that some grad student wrote, way back when, in nearly all cases. Just a glance at cernlib convinced me that whole chunks of cernlib could go away, for example, if anybody ever sat down and slogged through the port. Of course at this point I have WAY more C source than I ever did fortran. Maybe as much as 8-10 MB. That's quite a lot, one character at a time, even if it does include e.g. comments. However, I fully recognize a) that fortran can be a lot easier for compilers to optimize than C, because it is hard to optimize the arbitrary manipulations of pointers that C permits. If you use C and pointers, you code closer to the assembly language bone and can do amazing things, but you're also responsible for more of the work of optimization. Also b) legacy or not, alternatives or not, choosing to port or not port a large existing code base from fortran to C is not cheap. Been there, done that, one more than one occasion. It requires a major commitment of resources, time and energy. Once you're DONE there are benefits as well -- significant ones in my opinion. However, as time passes even those benefits are diminishing as fortran becomes more C-like. One of the interesting things about computer languages is that they really are dynamic, if somewhat slowly varying (decadal time scale, not annual timescale). The fortran you code in is not (probably not, anyway) the Fortran IV I learned back in the 70's and coded in extensively until the mid to latter 80's. The F77 Steffen is talking about expanding isn't the f90, f95, f2003 that gfortran is moving towards. The C I learned back in the mid-80's (DeSmet C on an IBM PC:-) isn't the Unix K&R C I learned a bit later, or the ANSI C I had to port to after that, or the C99 that gcc is sort-of compliant with. In fact, gcc is almost a variant all its own. Who knows, in ten years we may see a merge of fortran and c into a "supercompiler" that permits near transparent switching of syntax, or inlining of fortran in c the way assembler can be inlined now. It could happen. After all, there is no limit to how smart one can make a compiler, and the components are all really there already in g-compilers. That might permit a genetic mixing of the two and emergence of a new "Cortran" or "fortraC" compiler that adds a binary exponentiation operator to C, maybe some stronger typing, maybe a cappucino machine. Just kidding about the cappucino machine. rgb > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From bill at cse.ucdavis.edu Thu May 10 13:48:59 2007 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu, 10 May 2007 13:48:59 -0700 Subject: [Beowulf] re: Quick question... on Fortran In-Reply-To: <17987.20327.828615.862099@lala.site> References: <200705101607.l4AG6o1Z008302@bluewest.scyld.com> <17987.20327.828615.862099@lala.site> Message-ID: <4643853B.60204@cse.ucdavis.edu> > The limit is 2GB for Intel ia-32 compilers, fortran or not. You may > need to compile -static to get it. Since this is half the 4GB limit of > a 32 bit system (maybe the last bit is lost because of signed > arithmetic?), I'm guessing the ia-64 compilers can have much larger > arrays. For g77 with a 32 bit binary I think you are right: bigmem.f:119: error: size of variable 'a' is too large pathf90 producing a 32 bit binary: ### Assertion failure at line 726 of ../../be/cg/x8664/exp_loadstore.cxx: ### Compiler Error in file bigmem.f during Code_Expansion phase: ### NYI: 64-bit offset under -m32 Gcc and a 32 bit binary seems fine with 3GB: # file ./a.out ./a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), not stripped Verified with top: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 16246 root 25 0 3073m 3.0g 288 R 100 76.7 0:07.09 a.out with x86-64 binaries I've tried 6GB arrays, and could try 12 GB if there is interest. From rgb at phy.duke.edu Thu May 10 14:01:17 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 10 May 2007 17:01:17 -0400 (EDT) Subject: [Beowulf] Quick question... on Fortran In-Reply-To: <46436520.5070100@moene.indiv.nluug.nl> References: <46436520.5070100@moene.indiv.nluug.nl> Message-ID: On Thu, 10 May 2007, Toon Moene wrote: > Robert G. Brown wrote: > >> I am (as you may well know) extremely fortran averse. However, a >> researcher in our department has recently asked what the current limits >> are on the size of an array in modern fortran(s) under linux. I suppose >> he'd like an answer for both 32 and 64 bit systems. From what I have >> been able to google out, it looks like the answer is 2^31 bytes in 32 >> bit systems and as big as the hardware permits less maybe a GB in 64 bit >> systems. Is this correct? Any fortran experts out there? I'd be happy >> for answers for more than one compiler, of course -- I'd guess that e.g. >> pathscale might have a greater capacity than e.g. gfortran than g77 >> legacy... > > I think the easiest way is to use the Fortran program I once wrote to test > the limits on g77 (attached) and keep enlarging NSIZE until the program > complains. Yeah, I had that handy (thanks for the new copy though:-) but that requires having a really big memory machine to run it on, though. Which begs the question -- which ULTIMATELY is: Is there any point in Steffen (Bass) buying a Really Big Memory Machine (RBMM) to run his old fortran codes if there is some advantage to him in speed or otherwise from being able to run larger computations? The answer (from several people now) seems like it is "Yes, if you have a compiler that supports the medium memory model" e.g. PGI or pathscale, to name two that do. And a 64 bit system, of course. Medium memory seems to be 32 bit (2GB limit) on code, which (as was recently discussed on list at length) is enough to handle all but a set of near-zero measure of all code ever written and 64 bits of data, which is near enough to being infinite as to make no never mind. Except that it really isn't 64 bits -- IIRC it is either 40 or 48 at the moment -- but what the heck 40 bits is still a terabyte and hence infinite. I found this site: http://www.ualberta.ca/CNS/RESEARCH/LinuxClusters/64-bit.html which actually answers nearly all of the questions one might have. 64-bit gfortran supports -mcmodel=medium (and 32-bit returns an error if you try it). I suspect that all of the g90/g95 compliant compilers will support at least medium memory models, and only Intel's apparently supports large (which are the nearly-useless >2GB code programs etc). rgb > > Kind regards, > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From lindahl at pbm.com Thu May 10 14:00:57 2007 From: lindahl at pbm.com (Greg Lindahl) Date: Thu, 10 May 2007 14:00:57 -0700 Subject: [Beowulf] Quick question... on Fortran In-Reply-To: References: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> Message-ID: <20070510210057.GB11720@bx9.net> On Thu, May 10, 2007 at 04:37:58PM -0400, Robert G. Brown wrote: > Who knows, in ten years we may see a merge of fortran and c into a > "supercompiler" that permits near transparent switching of syntax, or > inlining of fortran in c the way assembler can be inlined now. Existing compilers do inline Fortran into C, it's needed for some of the SPEC benchmarks which mix C with Fortran77 numerical routines. I don't see much point in having both languages in the same source file. -- greg From bill at cse.ucdavis.edu Thu May 10 14:51:20 2007 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Thu, 10 May 2007 14:51:20 -0700 Subject: [Beowulf] fast file copying In-Reply-To: <4eafc81b0705062020v199aa5abh2fe4517ed8afc123@mail.gmail.com> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> <463AF372.1010107@cse.ucdavis.edu> <4eafc81b0705062020v199aa5abh2fe4517ed8afc123@mail.gmail.com> Message-ID: <464393D8.5030008@cse.ucdavis.edu> Felix Rauch Valenti wrote: > On 04/05/07, Bill Broadley wrote: >> Geoff Galitz wrote: >> > During an HPC talk some years ago, I recall someone mentioned a tool >> > which can copy large datasets across a cluster using a ring topology. >> > Perhaps someone here knows of this tool? >> >> Not sure about a ring topology, seems kinda silly... > > Why would that be silly? The normal ring based disadvantages, reliability and performance. In the case where your head node and the client nodes have the same speed network, all clients are present, all clients are idle, all clients survive until the end of the transfer you can get great performance. It certainly seems like 90% of so of line speed is possible. Seems like any number of things could make the ring based approach a poor choice, where the worst case of the ring could dramatically slow things down. Things like: * Head node's network connection is 10 times faster * A single node dies during the transfer * A single node joins late * A single node is very busy (I/O, memory constrained, or CPU) A bit-torrent like approach would handle all of the above relatively gracefully. The nettee approach does have the advantage that all disk accesses are sequential. But with a large chunk size of say 64 MB (when transferring a few GB file) seems like seeks wouldn't be a major issue. I've seen 15 MB/sec per client with the default chunk size (fairly small), when I wrote the file to a better disk system I managed 30MB/sec. I've yet to try larger chunk sizes on normal compute node disk systems. I'll do some more testing. > More advantages of the ring topology: It uploads every block on every Sounds like bittorrent. > node exactly once, no prefetching and no seeks are required (if you > replicate a whole partition or a single large file). bittorrent does seek more, but it seems trivial to reduce the seeks so that it's not a performance impact... say 1 per second. > If you are interested in more details about the technology, like > models and performance measurements (somewhat old by now), check out > the second paper in this list: > > http://www.cs.inf.ethz.ch/cops/patagonia/#relmat Interesting paper, I'll try a run with GigE so I can compare fairly. From rgb at phy.duke.edu Thu May 10 14:57:55 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Thu, 10 May 2007 17:57:55 -0400 (EDT) Subject: [Beowulf] Quick question... on Fortran In-Reply-To: <20070510210057.GB11720@bx9.net> References: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> <20070510210057.GB11720@bx9.net> Message-ID: On Thu, 10 May 2007, Greg Lindahl wrote: > On Thu, May 10, 2007 at 04:37:58PM -0400, Robert G. Brown wrote: > >> Who knows, in ten years we may see a merge of fortran and c into a >> "supercompiler" that permits near transparent switching of syntax, or >> inlining of fortran in c the way assembler can be inlined now. > > Existing compilers do inline Fortran into C, it's needed for some of > the SPEC benchmarks which mix C with Fortran77 numerical routines. > > I don't see much point in having both languages in the same source > file. Me neither, actually. Although there might be some point in developing a new language that smoothly merges the desireable features of both. (He ducks and runs for cover, just KNOWING that 80 people are about to announce that C++, python, matlab, pascal, f95, c99, already do...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From elken at pathscale.com Thu May 10 15:10:20 2007 From: elken at pathscale.com (Tom Elken) Date: Thu, 10 May 2007 15:10:20 -0700 Subject: [Beowulf] Quick question... on Fortran Message-ID: <4643984C.2050606@pathscale.com> From: "Brian Dobbins" Subject: Re: [Beowulf] Quick question... on Fortran To: "Robert G. Brown" > Using the PGI compilers ... > The way to do this is to use the 'mcmodel=medium' option and to promote > integers to 8-byte values (for indexing the entries in this array) with > '-i8'. --------------- rgb wrote: > I'd be happy > for answers for more than one compiler, of course -- I'd guess that e.g. > pathscale might have a greater capacity than e.g. gfortran than g77 > legacy... PathScale also supports -mcmodel=medium for large arrays, and I think gcc originated that flag name, so I would guess that gfortran supports it as well. pathf95 also supports the -i8 flag. If you need a math library to support large arrays with i8-type indices, when you download the PathScale version of ACML from AMD, you can choose the "PathScale 64-bit Int64" version. I would guess that Intel's MKL also has a version that supports 64bit integer arrays. -Tom -- ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tom Elken Manager, Performance Engineering tom.elken at qlogic.com QLogic Corporation 650.934.8056 System Interconnect Group From lindahl at pbm.com Thu May 10 16:23:30 2007 From: lindahl at pbm.com (Greg Lindahl) Date: Thu, 10 May 2007 16:23:30 -0700 Subject: [Beowulf] Quick question... on Fortran In-Reply-To: References: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> <20070510210057.GB11720@bx9.net> Message-ID: <20070510232330.GA26186@bx9.net> On Thu, May 10, 2007 at 05:57:55PM -0400, Robert G. Brown wrote: > >I don't see much point in having both languages in the same source > >file. > > Me neither, actually. Although there might be some point in developing > a new language that smoothly merges the desireable features of both. Yes, but the most desirable feature is generally "the language I religiously prefer", so there's no way to smoothly merge anything. -- greg From bill at cse.ucdavis.edu Fri May 11 02:28:55 2007 From: bill at cse.ucdavis.edu (Bill Broadley) Date: Fri, 11 May 2007 02:28:55 -0700 Subject: [Beowulf] fast file copying In-Reply-To: <40A23C4C-5386-4EF2-8D3C-48686E6D3B11@galitz.org> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> <463AF372.1010107@cse.ucdavis.edu> <4eafc81b0705062020v199aa5abh2fe4517ed8afc123@mail.gmail.com> <464393D8.5030008@cse.ucdavis.edu> <40A23C4C-5386-4EF2-8D3C-48686E6D3B11@galitz.org> Message-ID: <46443757.3020105@cse.ucdavis.edu> Geoff Galitz wrote: > - enter deployment phase > - check for member nodes that are alive > - dynamically build the config file > - bring the ring up The torrent equiv is building the .torrent file (basically a list of checksums, and launching the tracker). > - start the transfer Launching the bt client. > We use pdsh to do as much of the configuration and command execution as > possible. This made dolly a better choice for us rather than nettee as > we can issue the exact same command to all nodes in parallel. Nettee > required more specific commands on each node. Sounds like bt would be most like dolly in that respect. Nodes could even join after the transfer starts. > In our testing environment, we're getting as much as 45MB/sec and as > little as 11MB/sec in our various scenarios (mismatched hardware, busy How many nodes in your 11-45MB/sec runs? How much data are you distributing to each node? My last test took 36 seconds for a 1GB file (28MB/sec) with 4MB chunks. Bonnie++ measure between 20 and 40MB/sec which is somewhat disappointing actually. Not sure why it's so slow sometimes, and seems to significantly vary node to node even if I mkfs right before the bonnie++. It's easiest to tell when a node is done from the tracker (since the client doesn't exit). Keep in mind that a client will report in when done and often again to report any additional uploading it does. So as far as counting when a client is done look for the first "left=0" in the log for a given IP address. My logs from the run show the launch: 11/May/2007:02:00:25 clients finishing at: 11/May/2007:02:01:00 11/May/2007:02:01:01 11/May/2007:02:00:55 11/May/2007:02:00:49 11/May/2007:02:00:51 11/May/2007:02:00:58 11/May/2007:02:00:55 11/May/2007:02:00:50 11/May/2007:02:00:56 11/May/2007:02:00:50 11/May/2007:02:00:56 11/May/2007:02:00:52 11/May/2007:02:00:53 11/May/2007:02:00:56 11/May/2007:02:00:55 11/May/2007:02:00:53 11/May/2007:02:00:54 11/May/2007:02:00:59 11/May/2007:02:00:54 11/May/2007:02:00:55 11/May/2007:02:00:57 11/May/2007:02:00:55 11/May/2007:02:00:55 11/May/2007:02:01:01 11/May/2007:02:00:52 > network, different types of data). We did achieve our primary goal in > reducing load on the master/server system. In our old setup, our load > would increase to 25+. With dolly, our load never exceeds 1.5. Cool, so you have a solution. > I plan on also making the same test with torrent. Great. From dnlombar at ichips.intel.com Fri May 11 06:51:37 2007 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Fri, 11 May 2007 06:51:37 -0700 Subject: [Beowulf] Quick question... on Fortran In-Reply-To: References: Message-ID: <20070511135137.GA4052@nlxdcldnl2.cl.intel.com> On Thu, May 10, 2007 at 12:01:34PM -0400, Robert G. Brown wrote: > I am (as you may well know) extremely fortran averse. However, a > researcher in our department has recently asked what the current limits > are on the size of an array in modern fortran(s) under linux. I suppose > he'd like an answer for both 32 and 64 bit systems. From what I have > been able to google out, it looks like the answer is 2^31 bytes in 32 > bit systems and as big as the hardware permits less maybe a GB in 64 bit > systems. Is this correct? Any fortran experts out there? I'd be happy > for answers for more than one compiler, of course -- I'd guess that e.g. > pathscale might have a greater capacity than e.g. gfortran than g77 > legacy... In *theory*, the limits are as you described. However, that theory doesn't acknowledge memory layouts, shared libraries, memory-mapped space, &etc. Clearly, these details are mostly (only) significant on a 32-bit system. The gross 32-bit layout, from bottom to top is: program & data, shared libs & mmap, heap, and stack; with the space between the code/data and shared libs controlled by brk(2). Getting to Fortran, the next interesting bit is where the array is located. If in a common, the array is going to be in the break-controlled region between the program and the shared libraries & mmap. I don't think this varies among compilers, but could be wrong here; it's true for gcc. The kernel location TASK_UNMAPPED_BASE figures in this, as that's the base for the shared libraries & mmap(2) space. On i386, this is defined as TASK_SIZE/3, where TASK_SIZE is the top of the process address space, at 3GB, i.e., TASK_UNMAPPED_BASE=1GB. See the source at include//processor.h for these values. Shared library positioning can be tweaked via prelink(8), so max common-resident array size may well depend on the specific system. Otherwise, the array's going to be in heap or stack, and that's somewhere between TASK_UNMAPPED_BASE and TASK_SIZE. I think I recall that malloc(3) will also return stuff below TASK_UNMAPPED_BASE if there's not enough space above. On a 64-bit system, the 32-bit limits apply to a 32-bit code, otherwise, well the limitation is not likely the system. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From hahn at mcmaster.ca Fri May 11 07:50:30 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Fri, 11 May 2007 10:50:30 -0400 (EDT) Subject: [Beowulf] Quick question... on Fortran In-Reply-To: <20070511135137.GA4052@nlxdcldnl2.cl.intel.com> References: <20070511135137.GA4052@nlxdcldnl2.cl.intel.com> Message-ID: > The gross 32-bit layout, from bottom to top is: program & data, shared libs & > mmap, heap, and stack; with the space between the code/data and shared libs > controlled by brk(2). let me offer a brief program and a few more details (not in disagreement): #include #include char static_variable; int main() { char auto_variable; char *heap_variable = malloc(1); FILE *fp; char buf[1000]; printf("static variable at %p\n",&static_variable); printf("auto variable at %p\n",&auto_variable); printf("heap variable at %p\n",heap_variable); fp = fopen("/proc/self/maps","r"); while (fgets(buf,sizeof(buf),fp)) { write(1,buf,strlen(buf)); } return 0; } on older ia32 linux systems: [hahn at old-cat hahn]$ ./showmemory static variable at 0x80496cc auto variable at 0xbfffd92f heap variable at 0x80496d8 08048000-08049000 r-xp 00000000 09:00 65734 /home/hahn/showmemory 08049000-0804a000 rw-p 00000000 09:00 65734 /home/hahn/showmemory 0804a000-0804b000 rwxp 00000000 00:00 0 40000000-40012000 r-xp 00000000 08:01 208561 /lib/ld-2.2.93.so 40012000-40013000 rw-p 00012000 08:01 208561 /lib/ld-2.2.93.so 40013000-40015000 rw-p 00000000 00:00 0 40019000-4001a000 rw-p 00000000 00:00 0 42000000-42126000 r-xp 00000000 08:01 128278 /lib/i686/libc-2.2.93.so 42126000-4212b000 rw-p 00126000 08:01 128278 /lib/i686/libc-2.2.93.so 4212b000-4212f000 rw-p 00000000 00:00 0 bfffc000-c0000000 rwxp ffffd000 00:00 0 so, 128M of zero-page, text, static/bss, heap growing up towards mmap arena starting at 1G, with stack starting at 3G growing down. so the largest single brk-available space was ~896M. glibc malloc, however, will use mmap for large allocations so can give you nearly 2G beneath the stack. interestingly, modern ia32 (this is centos 4) look like this: static variable at 0x80497b0 auto variable at 0xbff4348f heap variable at 0x8ac0008 00287000-0029c000 r-xp 00000000 03:03 35888 /lib/ld-2.3.4.so 0029c000-0029d000 r-xp 00015000 03:03 35888 /lib/ld-2.3.4.so 0029d000-0029e000 rwxp 00016000 03:03 35888 /lib/ld-2.3.4.so 002a5000-003ca000 r-xp 00000000 03:03 35889 /lib/tls/libc-2.3.4.so 003ca000-003cc000 r-xp 00124000 03:03 35889 /lib/tls/libc-2.3.4.so 003cc000-003ce000 rwxp 00126000 03:03 35889 /lib/tls/libc-2.3.4.so 003ce000-003d0000 rwxp 003ce000 00:00 0 08048000-08049000 r-xp 00000000 03:03 4637246 /home/hahn/showmemory 08049000-0804a000 rw-p 00000000 03:03 4637246 /home/hahn/showmemory 08ac0000-08ae1000 rw-p 08ac0000 00:00 0 b7f78000-b7f79000 rw-p b7f78000 00:00 0 b7f89000-b7f8b000 rw-p b7f89000 00:00 0 bff42000-c0000000 rw-p bff42000 00:00 0 ffffe000-fffff000 ---p 00000000 00:00 0 putting those "standard" mmaps under 128M is a pretty nice tweak, since without any heroics (static compilation, etc) , an unbroken 2.8G are available. on x86_64, I see: static variable at 0x500bac auto variable at 0x7fbfffe860 heap variable at 0x501010 00400000-00401000 r-xp 00000000 00:15 17813231 /home/hahn/showmemory 00500000-00501000 rw-p 00000000 00:15 17813231 /home/hahn/showmemory 00501000-00522000 rwxp 00501000 00:00 0 2a95556000-2a95559000 rw-p 2a95556000 00:00 0 2a95579000-2a9557a000 r-xp 00000000 08:05 682971 /hptc_cluster/sharcnet/opt/pathscale/pathscale-2.2.1/lib/2.2.1/libpscrt.so.1 2a9557a000-2a95679000 ---p 00001000 08:05 682971 /hptc_cluster/sharcnet/opt/pathscale/pathscale-2.2.1/lib/2.2.1/libpscrt.so.1 2a95679000-2a9567a000 rw-p 00000000 08:05 682971 /hptc_cluster/sharcnet/opt/pathscale/pathscale-2.2.1/lib/2.2.1/libpscrt.so.1 2a9567a000-2a9567c000 rw-p 2a9567a000 00:00 0 383e200000-383e215000 r-xp 00000000 08:02 4235194 /lib64/ld-2.3.4.so 383e314000-383e316000 rw-p 00014000 08:02 4235194 /lib64/ld-2.3.4.so 383e600000-383e72a000 r-xp 00000000 08:02 4235196 /lib64/tls/libc-2.3.4.so 383e72a000-383e82a000 ---p 0012a000 08:02 4235196 /lib64/tls/libc-2.3.4.so 383e82a000-383e82d000 r--p 0012a000 08:02 4235196 /lib64/tls/libc-2.3.4.so 383e82d000-383e830000 rw-p 0012d000 08:02 4235196 /lib64/tls/libc-2.3.4.so 383e830000-383e834000 rw-p 383e830000 00:00 0 7fbfffd000-7fbffff000 rwxp 7fbfffd000 00:00 0 7fbffff000-7fc0000000 rw-p 7fbffff000 00:00 0 ffffffffff600000-ffffffffffe00000 ---p 00000000 00:00 0 which is similar in order to old ia32. note that a 32b program on a x86_64 kernel _will_ actually see a full 4G address space, so ~3.8G between mmap arena and stack. regards, mark. From mwheeler at startext.co.uk Fri May 11 10:16:01 2007 From: mwheeler at startext.co.uk (Martin Wheeler) Date: Fri, 11 May 2007 18:16:01 +0100 (BST) Subject: [Beowulf] Quick question... on Fortran In-Reply-To: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> References: <1066.192.168.123.22.1176577271.squirrel@192.168.4.80> Message-ID: On Sat, 14 Apr 2007, Brian Dobbins wrote: ^^^^^^^^^^^ > >From one Fortran averse person to another: Brian, your date setting appears to be a month behind everyone else. Cheers, -- Martin Wheeler - 00 44 1458 83-1103 Glastonbury - BA6 9PH - England (First introduced to FORTRAN by BFEC, Imerintsiatosika, 1967. Have always associated it with the IBM 1130 ever since :) From galtons at aecl.ca Thu May 10 08:45:37 2007 From: galtons at aecl.ca (Galton, Simon) Date: Thu, 10 May 2007 11:45:37 -0400 Subject: [Beowulf] SSH without login in nodes Message-ID: Here's a very simple suggestion. This disallows interactive logins to a non-login node. It does not stop something like "ssh nodename /home/username/longrunningjob", but it adequately prevents accidental logins; and it's easy to maintain. As coded it allows the root user and users in the "clustadm" group to access the nodes interactively but kicks out any users who "accidentally" attempt to interactively login to a node (after sending instructions on where they should login). Put the following text in a file called "/etc/profile.d/nologin.sh" # # Prevents interactive logins on cluster nodes # # Allow root and members of the "clustadm" group # set the "master" variable to the name of the login node # master="headnode" if [ "$LOGNAME" = "root" ] then : else groups=`groups | grep clustadm` if [ "$groups" = "" ] then echo "Please log into the master node, $master, for access to the cluster." echo "Logging you out now." echo exit fi fi CONFIDENTIAL AND PRIVILEGED INFORMATION NOTICE This e-mail, and any attachments, may contain information that is confidential, subject to copyright, or exempt from disclosure. Any unauthorized review, disclosure, retransmission, dissemination or other use of or reliance on this information may be unlawful and is strictly prohibited. AVIS D'INFORMATION CONFIDENTIELLE ET PRIVIL?GI?E Le pr?sent courriel, et toute pi?ce jointe, peut contenir de l'information qui est confidentielle, r?gie par les droits d'auteur, ou interdite de divulgation. Tout examen, divulgation, retransmission, diffusion ou autres utilisations non autoris?es de l'information ou d?pendance non autoris?e envers celle-ci peut ?tre ill?gale et est strictement interdite. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ireynolds at symplicity.com Thu May 10 08:49:36 2007 From: ireynolds at symplicity.com (Ian Reynolds) Date: Thu, 10 May 2007 11:49:36 -0400 Subject: [Beowulf] IBRIX Experiences Message-ID: <1178812176.22285.5.camel@centaur> Hey all -- we're considering IBRIX for a parallel storage cluster solution with an EMC Clarion CX3-20 at the center, as well as a handful of storage servers -- total of roughly 40 client servers, mix of 32 and 64 bit OSs. Can anyone offer their experiences with IBRIX, good or bad? We have worked with gpfs extensively, so any comparisons would also be helpful. Thanks, Ian Reynolds From geoff at galitz.org Thu May 10 15:06:31 2007 From: geoff at galitz.org (Geoff Galitz) Date: Thu, 10 May 2007 15:06:31 -0700 Subject: [Beowulf] fast file copying In-Reply-To: <464393D8.5030008@cse.ucdavis.edu> References: <48867455-3FE7-469D-99B3-6E5E9B54D507@galitz.org> <463AF372.1010107@cse.ucdavis.edu> <4eafc81b0705062020v199aa5abh2fe4517ed8afc123@mail.gmail.com> <464393D8.5030008@cse.ucdavis.edu> Message-ID: <40A23C4C-5386-4EF2-8D3C-48686E6D3B11@galitz.org> Thanks to all for responding... here is a follow up: We push our datasets out as part of a service deployment routine which includes a bunch of "other stuff" in addition to just getting the data to the nodes. I went ahead and modified our service deployment program to use dolly. Here is what we do: - enter deployment phase - check for member nodes that are alive - dynamically build the config file - bring the ring up - start the transfer - finish - tear everything down - enter next phase With this system, we can support a dynamic environment where nodes go on and offline at (our) will. We use pdsh to do as much of the configuration and command execution as possible. This made dolly a better choice for us rather than nettee as we can issue the exact same command to all nodes in parallel. Nettee required more specific commands on each node. In our testing environment, we're getting as much as 45MB/sec and as little as 11MB/sec in our various scenarios (mismatched hardware, busy network, different types of data). We did achieve our primary goal in reducing load on the master/server system. In our old setup, our load would increase to 25+. With dolly, our load never exceeds 1.5. I plan on also making the same test with torrent. -geoff > > The normal ring based disadvantages, reliability and performance. > snip > > Seems like any number of things could make the ring based approach > a poor choice, where the worst case of the ring could dramatically > slow things down. Things like: > * Head node's network connection is 10 times faster > * A single node dies during the transfer > * A single node joins late > * A single node is very busy (I/O, memory constrained, or CPU) > > more snipped From kingming0811 at yahoo.com Fri May 11 11:32:30 2007 From: kingming0811 at yahoo.com (M C) Date: Fri, 11 May 2007 11:32:30 -0700 (PDT) Subject: [Beowulf] a question about running HPL Message-ID: <216973.11400.qm@web39715.mail.mud.yahoo.com> I am a newbie in distributed computing... I have a problem of running HPL on a single machine. Hope I can get help here. Thanks. After I install HPL on the machine, I try to run it in the bin dir of HPL by "mpirun -np 1 xhpl". But it reports "cannot find mpirun command". Actually I have installed MPICH sucessfully, I do not know what 's wrong with it. Because I want to run HPL in one node, do I have to change something? Thanks for patient help. Ming ____________________________________________________________________________________ The fish are biting. Get more visitors on your site using Yahoo! Search Marketing. http://searchmarketing.yahoo.com/arp/sponsoredsearch_v2.php -------------- next part -------------- An HTML attachment was scrubbed... URL: From csamuel at vpac.org Sat May 12 18:21:25 2007 From: csamuel at vpac.org (Chris Samuel) Date: Sun, 13 May 2007 11:21:25 +1000 Subject: [Beowulf] a question about running HPL In-Reply-To: <216973.11400.qm@web39715.mail.mud.yahoo.com> References: <216973.11400.qm@web39715.mail.mud.yahoo.com> Message-ID: <200705131121.25306.csamuel@vpac.org> On Sat, 12 May 2007, M C wrote: > After I install HPL on the machine, I try to run it in the bin dir of HPL > by "mpirun -np 1 xhpl". But it reports "cannot find mpirun command". Looks like it can't find your MPI versions "mpirun" command. Which MPICH did you use ? -- Christopher Samuel - (03)9925 4751 - VPAC Deputy Systems Manager Victorian Partnership for Advanced Computing http://www.vpac.org/ Bldg 91, 110 Victoria Street, Carlton South, VIC 3053, Australia From kilian at stanford.edu Sat May 12 18:50:55 2007 From: kilian at stanford.edu (Kilian CAVALOTTI) Date: Sat, 12 May 2007 18:50:55 -0700 Subject: [Beowulf] a question about running HPL In-Reply-To: <216973.11400.qm@web39715.mail.mud.yahoo.com> References: <216973.11400.qm@web39715.mail.mud.yahoo.com> Message-ID: <200705121850.55176.kilian@stanford.edu> On Friday 11 May 2007 11:32:30 M C wrote: > I install HPL on the machine, I try to run it in the bin dir of HPL by > "mpirun -np 1 xhpl". But it reports "cannot find mpirun command". > Actually I have installed MPICH sucessfully, I do not know what 's > wrong with it. Because I want to run HPL in one node, do I have to > change something? Thanks for patient help. Be sure that your MPICH bin/ directory is in your $PATH. Cheers, -- Kilian From kingming0811 at yahoo.com Sun May 13 08:07:30 2007 From: kingming0811 at yahoo.com (M C) Date: Sun, 13 May 2007 08:07:30 -0700 (PDT) Subject: [Beowulf] how to run HPL on a single machine Message-ID: <110782.75545.qm@web39704.mail.mud.yahoo.com> I tried to run HPL on a single machine, but it always fails rtes1:/.../hpl/bin/RTES/ mpirun -np 4 xhpl p0_30424: p4_error: Path to program is invalid while starting /.../hpl/bin/RTES/xhpl with rsh on rtes1: -1 p4_error: latest msg from perror: No such file or directory p0_30235: (45.056821) net_send: could not write to fd=4, errno = 32 p0_30424: (9.047947) net_send: could not write to fd=4, errno = 32 p0_30424: (60.049669) net_send: could not write to fd=4, errno = 32 if I run rtes1:/.../hpl/bin/RTES/ mpirun -np 1 xhpl HPL ERROR from process # 0, on line 419 of function HPL_pdinfo: >>> Need at least 4 processes for these tests <<< HPL ERROR from process # 0, on line 621 of function HPL_pdinfo: >>> Illegal input in file HPL.dat. Exiting ... <<< I am not sure what is wrong. Actually, what I need is to let Linpack provide some measurable workload in our experiments. So I am not sure whether I need HPL or Linpack 1000 . Who can let me know some docs about MPI and HPL? Thanks. --------------------------------- Pinpoint customers who are looking for what you sell. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jean-marc.petit at insa-lyon.fr Mon May 14 10:03:51 2007 From: jean-marc.petit at insa-lyon.fr (Jean-Marc Petit) Date: Mon, 14 May 2007 19:03:51 +0200 Subject: [Beowulf] GADA'07 CFP Message-ID: <46489677.20600@insa-lyon.fr> ---------------------------------------------------------- -- Apologies if you receive this message more than once -- ---------------------------------------------------------- GADA 2007 - Call For Papers =========================== November 25 - November 30, 2007 Vilamoura, Algarve, Portugal International Symposium on Grid computing, high-performAnce and Distributed Applications (GADA'07) http://www.cs.rmit.edu.au/fedconf/index.html?page=gada2007cfp IMPORTANT DATES: - Abstract Submission Deadline June 14, 2007 - Paper Submission Deadline June 21, 2007 - Acceptance Notification August 22, 2007 - Camera Ready Due September 10, 2007 - Registration Due September 10, 2007 - OTM Conferences November 25 - 30, 2007 General Co-Chairs (fedconf at cs.rmit.edu.au) ? Robert Meersman, VU Brussels, Belgium ? Zahir Tari, RMIT University, Australia GADA PC Co-Chairs (gada2007 at cs.rmit.edu.au) ? Pilar Herrero, Universidad Polit?cnica de Madrid, Spain ? Daniel Katz, Louisiana State University and Jet Propulsion Laboratory, USA ? Mar?a S. P?rez, Universidad Polit?cnica de Madrid, Spain ? Domenico Talia, Universit? della Callabria, Italy Publication Chair (kwonlai at cs.rmit.edu.au) ? Kwong Yuen Lai, RMIT University, Australia Publicity Chair ? Jean-Marc Petit, INSA Lyon, France -- Jean-Marc Petit http://liris.insa-lyon.fr/~jmpetit From rbbrigh at sandia.gov Mon May 14 22:11:08 2007 From: rbbrigh at sandia.gov (Ron Brightwell) Date: Mon, 14 May 2007 23:11:08 -0600 Subject: [Beowulf] CFP: IEEE Cluster'07 Message-ID: <20070515051108.GA10258@ratbert.sandia.gov> ************** Paper submission deadline extended to May 18, 2007 ************* ******************************************************************************* Call for Papers 2007 IEEE International Conference on Cluster Computing (Cluster2007) 17 - 21 September 2007 Austin, Texas, USA http://www.cluster2007.org/ ******************************************************************************* In less than a decade, cluster computing has become the mainstream technology for High Performance Computing and Information Technology. It has gained this prominence by providing reliable, robust and cost-effective platforms for solving many complex computational problems, accessing and visualizing data, and providing information services. Cluster 2007 is hosted by Texas Advanced Computing Center (TACC) in the culturally rich, high-tech city of Austin, Texas. Here you will experience an open forum with fellow cluster researchers, system designers and installers, and users for presenting and discussing new directions, opportunities and ideas that will shape Cluster Computing. Cluster 2007 welcomes paper and poster submissions on innovative work from researchers in academia, industry and government describing original research work in cluster computing. The ability to aggregate the computing power of thousands of processors is a significant milestone in the scalability of commodity systems. Nevertheless, the ability to use both small and large systems efficiently is an ongoing effort in the areas of Networking, Management, Interconnects, and Application Optimization. A continued vigilance and assessment of R&D efforts is important to insure that Cluster Computing will harness the new technological advances in hardware and software to solve the challenges of our age, and the next generation. Topics of interest are (but not limited to): - Cluster Software and Middleware - Software Environments and Tools - Single -System Image Services - Parallel File Systems and I/O Libraries - Standard Software for Clusters - Cluster Networking - High-Speed Interconnects - High Performance Message Passing Libraries - Lightweight Communication Protocols - Applications - Application Methods and Algorithms - Adaptation to Multi-Core - Data Distribution, Load Balancing & Scaling - MPI/OpenMP Hybrid Computing - Visualization - Performance Analysis and Evaluation - Benchmarking & Profiling Tools - Performance Prediction & Modeling - Cluster Management - Security and Reliability - High Availability Solutions - Resource and Job Management - Administration and Maintenance Tools Paper Submission: Paper Format: Since the camera-ready version of accepted papers must be compliant with the IEEE Xplore format for publication, submitted papers must conform to the following Xplore layout, page limit, and font size. This will insure a size consistency and a uniform layout for the reviewers. (With minimal changes, accepted document can be styled for publication according to Xplore requirements explained in the Xplore formatting guide, which is also in Xplore format). - PDF files only. - Maximum 10 pages for Technical Papers, maximum 6 pages for Posters. - Single-spaced - 2-column numbered pages in IEEE Xplore format - (8.5x11-inch paper, margins in inches-- top: 0.75, bottom: 1.0, sides:0.625, and between columns:0.25, main text: 10pt ). - Format instructions are available for: LaTeX, Word document, PDF files. - Margin and placement guides are available in: Word, PDF and postscript files. - Concerning the final camera-ready version: Maximum of 2 extra pages at $100/page. Camera-ready means PDF file must comply with IEEE Xplore formatting and style for publication. - A conversion tool kit for converting from Word, LaTeX, and PostScript and checking compliance will be available by April 11. See the Final Submission section then. - Electronic Submission: Only web-based submission is accepted. The URL will be announced two weeks before the submission deadline, on the Cluster2007 web page. In addition to the normal technical paper sessions, we plan to organize vendor sessions and industrial exhibitions. Companies interested in participating in the vendor sessions or presenting their exhibits at the meeting or both, should contact the Exhibits Chair member, Ivan R. Judson (judson at mcs.anl.gov) by July 13, 2007. Important Dates: Technical paper submissions: 18 May 2007 Last minute paper abstracts: 18 May 2007 Workshop/tutorial proposals: 18 May 2007 Poster submissions: 8 Jun 2007 Panel proposals: 8 Jun 2007 Workshop/tutorial notification: 8 Jun 2007 Technical paper notification: 29 Jun 2007 Poster notification: 13 Jul 2007 Exhibit proposals: 13 Jul 2007 Last minute papers: 13 Jul 2007 Organization: General Chair Karl W. Schulz, University of Texas, USA Program Chair Kent Milfeld, University of Texas, USA Program Vice Chairs Toni Cortes, Barcelona Supercomputing Center, Spain Barney Maccabe, University of New Mexico, USA Mitsuhisa Sato, University of Tsukuba, Japan Steering Committee Liaison Daniel S. Katz, Louisiana State University, USA Tutorial Chair Ira Pramanick, SUN Microsystems, USA Workshop Chair Dan Stanzione, Arizona State University, USA Poster Chair Henry Tufo, National Center for Atmospheric Research, USA Publicity Chair Ron Brightwell, Sandia National Labs, USA Publication Chair Marcin Paprzycki, Warsaw School of Social Psychology, Poland Exhibit/Sponsors Chair Ivan Judson, Argonne National Lab, USA Finance Chair Janet McCord, University of Texas, USA Local Arrangements Chair Faith Singer-Villalobos, University of Texas, USA From wavelet at iutlecreusot.u-bourgogne.fr Tue May 15 02:08:09 2007 From: wavelet at iutlecreusot.u-bourgogne.fr (Wavelet colloque) Date: Tue, 15 May 2007 11:08:09 +0200 Subject: [Beowulf] Conf. Wavelet Applications in Industrial Processing V: general deadline prolongation for Optics East 07 Message-ID: <7CD44D67-B061-4879-A4ED-C43BCA7110B4@iutlecreusot.u-bourgogne.fr> >>> General deadline prolongation for Optics East 07 <<< Abstract Due Date : 18 May Wavelet Applications in Industrial Processing V (SA109) Part of SPIE?s International Symposium on Optics East 2007 9-12 September 2007 ? Seaport World Trade Center ? Boston, MA, USA --- Abstract Due Date : 18 May 2007 --- --- Manuscript Due Date: 13 August 2007 --- Web site http://www.spie.org/app/program/index.cfm? fuseaction=conferencedetail&export_id=x12510&ID=x6084&redir=x6084.xml&co nference_id=767974&event_id=765022&programtrack_id=769810 ABSTRACT TEXT Approximately 500 words. Conference Chairs: Fr?d?ric Truchetet, Univ. de Bourgogne (France); Olivier Laligant, Univ. de Bourgogne (France) Program Committee: Patrice Abry, ?cole Normale Sup?rieure de Lyon (France); Radu V. Balan, Siemens Corporate Research; Atilla M. Baskurt, Univ. Claude Bernard Lyon 1 (France); Amel Benazza-Benyahia, Ecole Sup?rieure des Communications de Tunis (Tunisia); Albert Bijaoui, Observatoire de la C?te d'Azur (France); Seiji Hata, Kagawa Univ. (Japan); Henk J. A. M. Heijmans, Ctr. for Mathematics and Computer Science (Netherlands); William S. Hortos, Associates in Communication Engineering Research and Technology; Jacques Lewalle, Syracuse Univ.; Wilfried R. Philips, Univ. Gent (Belgium); Alexandra Pizurica, Univ. Gent (Belgium); Guoping Qiu, The Univ. of Nottingham (United Kingdom); Hamed Sari-Sarraf, Texas Tech Univ.; Peter Schelkens, Vrije Univ. Brussel (Belgium); Paul Scheunders, Univ. Antwerpen (Belgium); Kenneth W. Tobin, Jr., Oak Ridge National Lab.; G?nther K. G. Wernicke, Humboldt-Univ. zu Berlin (Germany); Gerald Zauner, Fachhochschule Wels (Austria) The wavelet transform, multiresolution analysis, and other space- frequency or space-scale approaches are now considered standard tools by researchers in image and signal processing. Promising practical results in machine vision and sensors for industrial applications and non destructive testing have been obtained, and a lot of ideas can be applied to industrial imaging projects. This conference is intended to bring together practitioners, researchers, and technologists in machine vision, sensors, non destructive testing, signal and image processing to share recent developments in wavelet and multiresolution approaches. Papers emphasizing fundamental methods that are widely applicable to industrial inspection and other industrial applications are especially welcome. Papers are solicited but not limited to the following areas: o New trends in wavelet and multiresolution approach, frame and overcomplete representations, Gabor transform, space-scale and space- frequency analysis, multiwavelets, directional wavelets, lifting scheme for: - sensors - signal and image denoising, enhancement, segmentation, image deblurring - texture analysis - pattern recognition - shape recognition - 3D surface analysis, characterization, compression - acoustical signal processing - stochastic signal analysis - seismic data analysis - real-time implementation - image compression - hardware, wavelet chips. o Applications: - machine vision - aspect inspection - character recognition - speech enhancement - robot vision - image databases - image indexing or retrieval - data hiding - image watermarking - non destructive evaluation - metrology - real-time inspection. o Applications in microelectronics manufacturing, web and paper products, glass, plastic, steel, inspection, power production, chemical process, food and agriculture, pharmaceuticals, petroleum industry. All submissions will be peer reviewed. Please note that abstracts must be at least 500 words in length in order to receive full consideration. .wap2s -------------- next part -------------- An HTML attachment was scrubbed... URL: From angle1321 at hotmail.com Tue May 15 03:59:50 2007 From: angle1321 at hotmail.com (angle angle) Date: Tue, 15 May 2007 17:59:50 +0700 Subject: [Beowulf] Parallel programming with MPI and OpenMP Message-ID: hi everone I begin go learn about parallel programming. I would like to make the parallel programming project that compare the efficiency between shared memory and distributed memory so I must set the same environment of 2 implementations(the shared memory and the distributed memory implementation ) but the problem is I don't know that what tools that I can use for this project ? Othermore the tools will be the freeware. thanks so much diakala _________________________________________________________________ FREE pop-up blocking with the new MSN Toolbar - get it now! http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/ From ruhollah.mb at gmail.com Wed May 16 20:23:34 2007 From: ruhollah.mb at gmail.com (Ruhollah Moussavi Baygi ) Date: Thu, 17 May 2007 06:53:34 +0330 Subject: [Beowulf] DL_POLY3 compilation in parallel mode Message-ID: <1bef2ce30705162023u293f20b6mc4dfb3ed3e822a93@mail.gmail.com> Dear all at Beowulf, Has anyone any experience in compilation of DL_POLY3 package in parallel mode? I have downloaded DL_POLY3.07 from CCLRC Daresbury Laboratory, but I did not manage to compile and link it in parallel mode; however, I have no problem with compilation and making executable in serial mode. My cluster specs are: AMD athlon64 X2, compiler intel64bit Fortran compiler 9.1, MPICH2-1. Thanks for any coming suggestion -- Best, Ruhollah Moussavi Baygi -------------- next part -------------- An HTML attachment was scrubbed... URL: From angle1321 at hotmail.com Thu May 17 00:35:16 2007 From: angle1321 at hotmail.com (angle angle) Date: Thu, 17 May 2007 14:35:16 +0700 Subject: [Beowulf] the comparison between OpenMP and MPI Message-ID: hi everyone I would like to compare the efficiency between parallel programing with MPI and OpenMP that is on the difference architecture(on shared memory and distributed memory) but I don't know what the tools that I can use for implement MPI and OpenMP, Other more the tools must have the same and be freeware so please introduce me about the tools that I can use for the implementations. _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ From i.kozin at dl.ac.uk Fri May 18 02:12:17 2007 From: i.kozin at dl.ac.uk (Kozin, I (Igor)) Date: Fri, 18 May 2007 10:12:17 +0100 Subject: [Beowulf] DL_POLY3 compilation in parallel mode In-Reply-To: <1bef2ce30705162023u293f20b6mc4dfb3ed3e822a93@mail.gmail.com> Message-ID: Dear Ruhollah, you may find it useful to browse or/and register and post your question on the DLPOLY forum (part of our DisCo web site) http://www.cse.scitech.ac.uk/disco/forums/ubbthreads.php No, there is no problem compiling DLPOLY (all of them) as far as I know. If you prefer, post your question to me directly off the list. Best, Igor I. Kozin (i.kozin at dl.ac.uk) STFC Daresbury Laboratory, WA4 4AD, UK skype: in_kozin tel: +44 (0) 1925 603308 http://www.cse.scitech.ac.uk/disco -----Original Message----- From: beowulf-bounces at beowulf.org [mailto:beowulf-bounces at beowulf.org]On Behalf Of Ruhollah Moussavi Baygi Sent: 17 May 2007 04:24 To: beowulf at beowulf.org Subject: [Beowulf] DL_POLY3 compilation in parallel mode Dear all at Beowulf, Has anyone any experience in compilation of DL_POLY3 package in parallel mode? I have downloaded DL_POLY3.07 from CCLRC Daresbury Laboratory, but I did not manage to compile and link it in parallel mode; however, I have no problem with compilation and making executable in serial mode. My cluster specs are: AMD athlon64 X2, compiler intel64bit Fortran compiler 9.1, MPICH2-1. Thanks for any coming suggestion -- Best, Ruhollah Moussavi Baygi -------------- next part -------------- An HTML attachment was scrubbed... URL: From diep at xs4all.nl Fri May 18 06:41:20 2007 From: diep at xs4all.nl (Vincent Diepeveen) Date: Fri, 18 May 2007 15:41:20 +0200 Subject: [Beowulf] Network considerations for new generation cheap beowulf cluster Message-ID: <001b01c79952$3a266030$0900a8c0@objection> hi All of you, Now that developments go fast in CPU land, by 22 july or so, intel drops price of its quad core to $266 more or less. Hopefully AMD's quadcore chip releases soon too for a decent price. As intels memory subsystem is real weak, not to mention the extra price that AMD and intel ask for dual socket/quad socket capable chips, the optimal node is a single socket node. 4 cores is already a lot anyway for 1 highend network card. That means in short that you can produce for quite little money, far under $500, a node with 4 cores, or considering the far higher taxrates in Europe, far under 500 euro in Europe. Basically what a node needs is a mainboard, a bit of RAM, and a cpu with cooler. That keeps a node tiny and it's easier coolable. With some wood then you can build a great case that holds many nodes. Booting of course diskless over the gigabit network. Of course interesting to know secondly is whether putting in ECC-reg ram is interesting, considering its scandaleous high price always. What are opinions here? Of course now the question is how to get a reasonable low latency highend network with a reasonable bandwidth (latency bigger priority than bandwidth of course) and of course being capable of reading in memory without writing. Of course the switch/routing prices + cable prices need to be included in those price considerations. Perhaps some bit older generation card gets sold very cheap now. What are the options the coming years there, any manufacturer keeping up with the dropped price of a single quad core node? Gigabit ethernet is not an option of course, that just works for embarrassingly parallel software, it's usually interrupting bigtime the cpu and has an ugly one-way pingpong latency, especially when there is several threads simultaneously shipping messages. What are the options for the network in the future? Vincent -------------- next part -------------- An HTML attachment was scrubbed... URL: From jac67 at georgetown.edu Fri May 18 13:44:29 2007 From: jac67 at georgetown.edu (Jess Cannata) Date: Fri, 18 May 2007 16:44:29 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulf cluster In-Reply-To: <001b01c79952$3a266030$0900a8c0@objection> References: <001b01c79952$3a266030$0900a8c0@objection> Message-ID: <464E102D.3010305@georgetown.edu> While I can't foresee the future, I do think that we are going to a lot more low latency 10 Gb/s cards that use standard 10 Gb switches and cables such as Myricom's 10 Gb Myrinet/Ethernet card and NetEffect's 10 Gb Ethernet card. http://www.myricom.com/Myri-10G/product_list.html http://www.neteffect.com/ne020-features.html Jess Vincent Diepeveen wrote: > hi All of you, > > Now that developments go fast in CPU land, by 22 july or so, intel > drops price of its quad core to $266 more or less. > Hopefully AMD's quadcore chip releases soon too for a decent price. > > As intels memory subsystem is real weak, not to mention the extra > price that AMD and intel ask for dual socket/quad socket capable > chips, the optimal node is a single socket node. > > 4 cores is already a lot anyway for 1 highend network card. > > That means in short that you can produce for quite little money, far > under $500, a node with 4 cores, > or considering the far higher taxrates in Europe, far under 500 euro > in Europe. > > Basically what a node needs is a mainboard, a bit of RAM, and a cpu > with cooler. That keeps a node tiny and it's easier coolable. With > some wood then you can build a great case that holds many nodes. > Booting of course diskless over the gigabit network. Of course > interesting to know secondly is whether putting in ECC-reg ram is > interesting, considering its scandaleous high price always. > > What are opinions here? > > Of course now the question is how to get a reasonable low latency > highend network with a reasonable bandwidth (latency bigger priority > than bandwidth of course) and of course being capable of reading in > memory without writing. Of course the switch/routing prices + cable > prices need to be included in those price considerations. > > Perhaps some bit older generation card gets sold very cheap now. What > are the options the coming years there, any manufacturer keeping up > with the dropped price of a single quad core node? > > Gigabit ethernet is not an option of course, that just works for > embarrassingly parallel software, it's usually interrupting bigtime > the cpu and has an ugly one-way pingpong latency, especially when > there is several threads simultaneously shipping messages. > > What are the options for the network in the future? > > Vincent > > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From richa at sgi.com Fri May 18 13:35:15 2007 From: richa at sgi.com (Rich Altmaier) Date: Fri, 18 May 2007 13:35:15 -0700 Subject: [Beowulf] Re: the comparison between OpenMP and MPI Message-ID: <464E0E03.802@sgi.com> Hi, I strongly suggest you slightly violate your desire for freeware tools. After all, getting good data and efficient use of your time can be more valuable than finding the cheapest tools. Our hands-on experience with many codes suggests the Intel tool set can handle far more complex codes than open source compilers. When you need 100k lines of Fortran to compile correctly, you won't find an open source answer, in my opinion. http://www.intel.com/cd/software/products/asmo-na/eng/index.htm Take a look at the compiler, Vtune, libraries, thread analysis tools, and cluster tools. Intel's delivery of software developer tools here is very strong. The compiler supports OpenMP. For the MPI library, probably you should go with MVAPICH, http://mvapich.cse.ohio-state.edu/ Presently I see mvapich as strong for bandwidth and latency, for a large number of nodes. In your comparison, try this: once you have an optimal MPI code, convert it back to OpenMP and see how the balance between compute and communication can act in your favor. Just FYI, Rich Altmaier, SGI From diep at xs4all.nl Sun May 20 11:02:19 2007 From: diep at xs4all.nl (Vincent Diepeveen) Date: Sun, 20 May 2007 20:02:19 +0200 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> Message-ID: <002601c79b09$05e45710$0900a8c0@objection> hi, Thanks for your reaction. Ethernet is of course too slow in latency. the cheapest cable i see is 1 meter and $70 Cheapest card i see is $715 So the node price starts at $765, which is already way way more than the total price of 1 node. Now we didn't discuss the switches yet. Switches and routing of a network is important. The problem of myrinet nowadays is already that it is way too expensive when compared to the node price. I also tend to remember a few years ago that a myrinet card was like far under $500. Now cheapest card of myri i see is $715, and i didn't see the huge price of switches yet that will add up to node price. More interesting than paying a $1000 a node for 10 gigabit MPI, is having some older card say 3 gbit/s, which uses MPI and is DMA low latency with a bit older switch for say $300 a node. Then you've got a good low latency network for a small price, yet still making price of a node more expensive, from $450 to $750. Vincent ----- Original Message ----- From: "Jess Cannata" Cc: Sent: Friday, May 18, 2007 10:44 PM Subject: Re: [Beowulf] Network considerations for new generation cheap beowulfcluster > While I can't foresee the future, I do think that we are going to a lot > more low latency 10 Gb/s cards that use standard 10 Gb switches and cables > such as Myricom's 10 Gb Myrinet/Ethernet card and NetEffect's 10 Gb > Ethernet card. > > http://www.myricom.com/Myri-10G/product_list.html > http://www.neteffect.com/ne020-features.html > > Jess > > Vincent Diepeveen wrote: >> hi All of you, >> Now that developments go fast in CPU land, by 22 july or so, intel drops >> price of its quad core to $266 more or less. >> Hopefully AMD's quadcore chip releases soon too for a decent price. >> As intels memory subsystem is real weak, not to mention the extra price >> that AMD and intel ask for dual socket/quad socket capable chips, the >> optimal node is a single socket node. >> 4 cores is already a lot anyway for 1 highend network card. >> That means in short that you can produce for quite little money, far >> under $500, a node with 4 cores, >> or considering the far higher taxrates in Europe, far under 500 euro in >> Europe. >> Basically what a node needs is a mainboard, a bit of RAM, and a cpu with >> cooler. That keeps a node tiny and it's easier coolable. With some wood >> then you can build a great case that holds many nodes. Booting of course >> diskless over the gigabit network. Of course interesting to know secondly >> is whether putting in ECC-reg ram is interesting, considering its >> scandaleous high price always. >> What are opinions here? >> Of course now the question is how to get a reasonable low latency >> highend network with a reasonable bandwidth (latency bigger priority than >> bandwidth of course) and of course being capable of reading in memory >> without writing. Of course the switch/routing prices + cable prices need >> to be included in those price considerations. >> Perhaps some bit older generation card gets sold very cheap now. What >> are the options the coming years there, any manufacturer keeping up with >> the dropped price of a single quad core node? >> Gigabit ethernet is not an option of course, that just works for >> embarrassingly parallel software, it's usually interrupting bigtime the >> cpu and has an ugly one-way pingpong latency, especially when there is >> several threads simultaneously shipping messages. >> What are the options for the network in the future? >> Vincent >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From jac67 at georgetown.edu Sun May 20 13:36:42 2007 From: jac67 at georgetown.edu (Jess Cannata) Date: Sun, 20 May 2007 16:36:42 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <002601c79b09$05e45710$0900a8c0@objection> References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> Message-ID: <4650B15A.90206@georgetown.edu> I agree that all of the options (Infiniband, Myrinet, and 10 Gb Ethernet) are too expensive. I have been looking into the low latency 10 Gb Ethernet cards from NetEffect, which use the iWARP specifications to provide low latency. I haven't done any testing, yet, but the numbers that they are releasing show them competitive with Infiniband/Myrinet as the number of processes increase. Plus, I expect 10 Gb switches to rapidly drop in price. I believe that the only economical solution in the short term (3-5 year range) will be Ethernet based since "everyone" knows Ethernet. It is only by selling substantial volume that the prices drop to inexpensive. I foresee motherboard manufacturers placing 10 Gb Ethernet adapters on-board server motherboards soon enough; I see no reason why they can't be the low latency varieties. I hope to start testing some of the NetEffect 10 Gb cards soon and will try and post some numbers. Here is a link to some of the numbers that NetEffect is publishing: http://www.hpcwire.com/hpc/716435.html Jess Vincent Diepeveen wrote: > hi, > > Thanks for your reaction. > > Ethernet is of course too slow in latency. > > the cheapest cable i see is 1 meter and $70 > > > > Cheapest card i see is $715 > > So the node price starts at $765, which is already way way more than > the total price of 1 node. > Now we didn't discuss the switches yet. Switches and routing of a > network is important. > > The problem of myrinet nowadays is already that it is way too > expensive when compared to the node price. > > I also tend to remember a few years ago that a myrinet card was like > far under $500. > Now cheapest card of myri i see is $715, and i didn't see the huge > price of switches > yet that will add up to node price. > > More interesting than paying a $1000 a node for 10 gigabit MPI, is > having some older card say 3 gbit/s, > which uses MPI and is DMA low latency with a bit older switch for say > $300 a node. > > Then you've got a good low latency network for a small price, yet > still making price of a node more expensive, > from $450 to $750. > > Vincent > > ----- Original Message ----- From: "Jess Cannata" > Cc: > Sent: Friday, May 18, 2007 10:44 PM > Subject: Re: [Beowulf] Network considerations for new generation cheap > beowulfcluster > > >> While I can't foresee the future, I do think that we are going to a >> lot more low latency 10 Gb/s cards that use standard 10 Gb switches >> and cables such as Myricom's 10 Gb Myrinet/Ethernet card and >> NetEffect's 10 Gb Ethernet card. >> >> http://www.myricom.com/Myri-10G/product_list.html >> http://www.neteffect.com/ne020-features.html >> >> Jess >> >> Vincent Diepeveen wrote: >>> hi All of you, >>> Now that developments go fast in CPU land, by 22 july or so, intel >>> drops price of its quad core to $266 more or less. >>> Hopefully AMD's quadcore chip releases soon too for a decent price. >>> As intels memory subsystem is real weak, not to mention the extra >>> price that AMD and intel ask for dual socket/quad socket capable >>> chips, the optimal node is a single socket node. >>> 4 cores is already a lot anyway for 1 highend network card. >>> That means in short that you can produce for quite little money, >>> far under $500, a node with 4 cores, >>> or considering the far higher taxrates in Europe, far under 500 euro >>> in Europe. >>> Basically what a node needs is a mainboard, a bit of RAM, and a cpu >>> with cooler. That keeps a node tiny and it's easier coolable. With >>> some wood then you can build a great case that holds many nodes. >>> Booting of course diskless over the gigabit network. Of course >>> interesting to know secondly is whether putting in ECC-reg ram is >>> interesting, considering its scandaleous high price always. >>> What are opinions here? >>> Of course now the question is how to get a reasonable low latency >>> highend network with a reasonable bandwidth (latency bigger priority >>> than bandwidth of course) and of course being capable of reading in >>> memory without writing. Of course the switch/routing prices + cable >>> prices need to be included in those price considerations. >>> Perhaps some bit older generation card gets sold very cheap now. >>> What are the options the coming years there, any manufacturer >>> keeping up with the dropped price of a single quad core node? >>> Gigabit ethernet is not an option of course, that just works for >>> embarrassingly parallel software, it's usually interrupting bigtime >>> the cpu and has an ugly one-way pingpong latency, especially when >>> there is several threads simultaneously shipping messages. >>> What are the options for the network in the future? >>> Vincent >>> ------------------------------------------------------------------------ >>> >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf at beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >> _______________________________________________ >> Beowulf mailing list, Beowulf at beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> From hahn at mcmaster.ca Sun May 20 14:10:57 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Sun, 20 May 2007 17:10:57 -0400 (EDT) Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4650B15A.90206@georgetown.edu> References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> Message-ID: > I agree that all of the options (Infiniband, Myrinet, and 10 Gb Ethernet) are > too expensive. I'm curious what kinds of costs you're seeing (per-port) for each of these. > I have been looking into the low latency 10 Gb Ethernet cards > from NetEffect, which use the iWARP specifications to provide low latency. I why do you think iWARP is necessary to provide low latency? > haven't done any testing, yet, but the numbers that they are releasing show > them competitive with Infiniband/Myrinet as the number of processes increase. do you mean as you increase number of processes on a single node (that is, sharing a single interconnect port), or number of processes in the whole job or cluster? > Plus, I expect 10 Gb switches to rapidly drop in price. I believe that the I hope for that as well, but am not sanguine. expensive optic tranceivers preclude commoditization of small (~20 pt) switches, and I've heard people say bad things about the practicality of mass-produced/cheap 10G-baseT. (mainly complaining about complexity and power requirements.) > and post some numbers. Here is a link to some of the numbers that NetEffect > is publishing: > > http://www.hpcwire.com/hpc/716435.html no usable latency numbers there. if you squint, it looks like they're claiming latency of around 7 us, which is _not_ competitive with even myri 2G (nor recent IB nor myri 10G.) >> the cheapest cable i see is 1 meter and $70 nothing wrong with $70 cables - you need to quote the whole per-port price, including nic, cable and switch port. it looks to me as if Myri 10G is around $1500/port; I've never had a good read on IB prices (deconvolved from vendor/discount pricing issues.) >> Cheapest card i see is $715 nothing wrong with $715, even if the all-in per-port price is $1500 - it just means you won't be using $1000 desktop-spec nodes. that's OK, since if you're worried about ~3 us latency and 1GB bandwidth, you should also be using multiple cores, ECC memory, and probably a few GB/node, and therefore can easily amortize $715/node. >> So the node price starts at $765, which is already way way more than the >> total price of 1 node. only if you're looking at extremely low-end nodes. for such nodes, the only viable option is zero-cost Gb nics, of course, and mass-market switches (ie, not high-end chassis switches, etc). 5 years ago, the low-end approach was 100bT; now its 1000bT. the prime target for that approach (serial or EP) has simply gotten broader; I don't see this as anything to complain about. for "real" parallel, you have to pay for the network you need. there as well, you now get more for your money, no complaints. complaining that you can't get 1 us, 1GBps interconnect for $50/port is just silliness. From jac67 at georgetown.edu Mon May 21 07:11:36 2007 From: jac67 at georgetown.edu (Jess Cannata) Date: Mon, 21 May 2007 10:11:36 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> Message-ID: <4651A898.8070900@georgetown.edu> Mark Hahn wrote: >> I agree that all of the options (Infiniband, Myrinet, and 10 Gb >> Ethernet) are too expensive. > > I'm curious what kinds of costs you're seeing (per-port) for each of > these. > By too expensive, I mean much more expensive than Gig-E which is "free" on the NIC side and quite cheap on the switch side. >> I have been looking into the low latency 10 Gb Ethernet cards from >> NetEffect, which use the iWARP specifications to provide low latency. I > > why do you think iWARP is necessary to provide low latency? I'll admit that I don't have a great understanding how the NetEffect cards work. I do know that they are using the iWARP protocol (Remote Direct Memory Access, etc.) to reduce latency, but that isn't the only thing they are using. > >> haven't done any testing, yet, but the numbers that they are >> releasing show them competitive with Infiniband/Myrinet as the number >> of processes increase. > > do you mean as you increase number of processes on a single node (that > is, > sharing a single interconnect port), or number of processes in the > whole job or cluster? What I should have said is that the NetEffect card is competitive as the number of network connections to a process on a single node increases. Unfortunately, the HPCWire article did not include these numbers. I saw them is a presentation given by NetEffect. > >> Plus, I expect 10 Gb switches to rapidly drop in price. I believe >> that the > > I hope for that as well, but am not sanguine. expensive optic > tranceivers preclude commoditization of small (~20 pt) switches, and > I've heard people say bad things about the practicality of > mass-produced/cheap 10G-baseT. > (mainly complaining about complexity and power requirements.) I heard the same things for Gig-E. I'm confident a solution will be found either through better manufacturing, design, or new technology. > > >> and post some numbers. Here is a link to some of the numbers that >> NetEffect is publishing: >> >> http://www.hpcwire.com/hpc/716435.html > > no usable latency numbers there. if you squint, it looks like they're > claiming latency of around 7 us, which is _not_ competitive with even > myri 2G (nor recent IB nor myri 10G.) The numbers that I saw are not on HPCWire. I didn't realize that when I sent the link. I recommend that people check out the NetEffect cards and similar interconnects (low latency 10 Gb Ethernet with iWARP) to see if the claims that they make are valid. I'm not sure that they are, but I am interested to find out. AMD's developer site has a new cluster with both Infiniband and NetEffect's low latency 10 Gb cards installed so you will be able to do direct comparisons between low latency 10 Gb Ethernet and Infiniband. It is called "Smith." You can find it at https://devcenter.amd.com/about/systems.php. I haven't tested it, yet, but I plan to. I'd be interested in hearing about other's experience. > >>> the cheapest cable i see is 1 meter and $70 > > nothing wrong with $70 cables - you need to quote the whole per-port > price, > including nic, cable and switch port. it looks to me as if Myri 10G > is around $1500/port; I've never had a good read on IB prices > (deconvolved from vendor/discount pricing issues.) > >>> Cheapest card i see is $715 > > nothing wrong with $715, even if the all-in per-port price is $1500 - > it just means you won't be using $1000 desktop-spec nodes. that's OK, > since if you're worried about ~3 us latency and 1GB bandwidth, you > should also be using multiple cores, ECC memory, and probably a few > GB/node, > and therefore can easily amortize $715/node. > >>> So the node price starts at $765, which is already way way more than >>> the total price of 1 node. > > only if you're looking at extremely low-end nodes. for such nodes, > the only viable option is zero-cost Gb nics, of course, and > mass-market switches (ie, not high-end chassis switches, etc). > I thought--though I may be mistaken--that this was the point of the original post. What is/will be the new low-cost network solution? It doesn't seem to be Infiniband or Myri-10G since their price doesn't seem to be dropping much. > 5 years ago, the low-end approach was 100bT; now its 1000bT. the > prime target for that approach (serial or EP) has simply gotten broader; > I don't see this as anything to complain about. for "real" parallel, > you have to pay for the network you need. there as well, you now get > more for your money, no complaints. complaining that you can't get 1 > us, 1GBps interconnect for $50/port is just silliness. > I disagree on this last point. Why can't low latency interconnects become the standard? It is not just HPC applications that are demanding low latency networks. Jess From peter.st.john at gmail.com Mon May 21 08:24:03 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Mon, 21 May 2007 11:24:03 -0400 Subject: [Beowulf] Re: the comparison between OpenMP and MPI In-Reply-To: <464E0E03.802@sgi.com> References: <464E0E03.802@sgi.com> Message-ID: Rich, Two things; the small and obvious point, "cheapest" isn't the only motivation for open source, but you know that. What surprised me was "...can handle more complex codes" and "...to compile correctly". By compiling correctly, you mean, achieving the desired performance characteristics for the target executable? In my experience compilers are reliably logically correct. I once tracked a bug to the symbolic debugger :-), but never to the compiler itself (although I've always been able to use mature compilers). Compiler writers pretty much define "language law". (And I'm sure the ones at Intel are just as proud as the ones at IBM and CMU.) As for complexity, I've written things that exceeded the available stack depth, but really I don't understand a program being too complex for a compiler. Too long, sure. Everything has resource limitations. But not too complex. So I'd be very amused to see some examples, maybe of local complexity, I wouldn't be able to read the 100k lines of fortran myself :-) Peter On 5/18/07, Rich Altmaier wrote: > > Hi, I strongly suggest you slightly violate your desire for freeware > tools. > After all, getting good data and efficient use of your time can be > more valuable than finding the cheapest tools. > Our hands-on experience with many codes suggests > the Intel tool set can handle far more complex codes than > open source compilers. > When you need 100k lines of Fortran to compile correctly, > you won't find an open source answer, in my opinion. > > http://www.intel.com/cd/software/products/asmo-na/eng/index.htm > Take a look at the compiler, Vtune, libraries, thread analysis tools, > and cluster tools. Intel's delivery of software developer tools > here is very strong. The compiler supports OpenMP. > For the MPI library, probably you should go with MVAPICH, > http://mvapich.cse.ohio-state.edu/ > Presently I see mvapich as strong for bandwidth and latency, for > a large number of nodes. > > In your comparison, try this: once you have an optimal MPI code, > convert it back to OpenMP and see how the balance between compute > and communication can act in your favor. > > Just FYI, > Rich Altmaier, SGI > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From florent.calvayrac at univ-lemans.fr Mon May 21 09:19:40 2007 From: florent.calvayrac at univ-lemans.fr (Florent Calvayrac) Date: Mon, 21 May 2007 18:19:40 +0200 Subject: [Beowulf] Re: the comparison between OpenMP and MPI In-Reply-To: References: <464E0E03.802@sgi.com> Message-ID: <4651C69C.3060504@univ-lemans.fr> Peter St. John wrote: > Rich, > Two things; the small and obvious point, "cheapest" isn't the only > motivation for open source, but you know that. > What surprised me was "...can handle more complex codes" and "...to > compile correctly". By compiling correctly, you mean, achieving the > desired performance characteristics for the target executable? In my > experience compilers are reliably logically correct. I once tracked a > bug to the symbolic debugger :-), but never to the compiler itself > (although I've always been able to use mature compilers). Compiler > writers pretty much define "language law". (And I'm sure the ones at > Intel are just as proud as the ones at IBM and CMU.) > As for complexity, I've written things that exceeded the available > stack depth, but really I don't understand a program being too complex > for a compiler. Too long, sure. Everything has resource limitations. > But not too complex. So I'd be very amused to see some examples, maybe > of local complexity, I wouldn't be able to read the 100k lines of > fortran myself :-) > Peter > I once discovered a bug in the Cray Fortran compiler for T3E. Without optimization the code was running fine but at O2/O3 results were wrong. It turned out that some code lines I had autogenerated from Maple were implying a large number of variables, exceeding the number of registers available and wrapping on the first register. Without optimization no registers were used so the code was giving correct results and at O1 code was optimized on a line per line basis and not procedure-wide. So even commercial code can be wrong... -- Florent Calvayrac | Directeur du SC Informatique Ressources Num. de l'Universite du Maine Lab. de Physique de l'Etat Condense UMR-CNRS 6087 Inst. de Rech. en Ingenierie Molec. et Matx Fonctionnels FR CNRS 2575 From landman at scalableinformatics.com Mon May 21 09:47:31 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Mon, 21 May 2007 12:47:31 -0400 Subject: [Beowulf] Re: the comparison between OpenMP and MPI In-Reply-To: <4651C69C.3060504@univ-lemans.fr> References: <464E0E03.802@sgi.com> <4651C69C.3060504@univ-lemans.fr> Message-ID: <4651CD23.2040303@scalableinformatics.com> Florent Calvayrac wrote: > I once discovered a bug in the Cray Fortran compiler for T3E. > Without optimization the code was running fine but at O2/O3 > results were wrong. It turned out that some code lines I had autogenerated > from Maple were implying a large number of variables, exceeding > the number of registers available and wrapping on the first register. > Without optimization no registers were used so the code was giving > correct results and at O1 code was optimized on a line per line basis > and not procedure-wide. > > So even commercial code can be wrong... FWIW: recently (within the last month) we worked on building a 64 bit version of a code for a customer. F90. Intel happily compiled it (latest and greatest 9.1.x). PathScale compiler complained about problems. PathScale was in fact, correct. That Intel compiled and "ran" this code w/o nary a peep worries me. The issue was a somewhat esoteric type safety issue (never assume logical is an integer and vice versa, especially when moving from 32 bit to 64 bit code, and copying data ...). Regardless, the PathScale compiler was correct for raising the issue, and the Intel compiler was not for failing to raise it. Didn't try it with g95/gfortran. Another problem we have found with the intel compilers is the way they set particular code paths, specifically with regards to testing for capability. Rather than testing for sse2/3/4 instruction results, they test processor strings/ids. Which means their generated code often fails to run, or runs with slow code paths selected on Opterons. Which means we tend to wave people off these compilers unless their code is only ever going to run on an Intel platform. That and the compiler doesn't yet seem to do much with NUMA as the Intel platform doesn't need to worry about it. Joe > > > -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From tjrc at sanger.ac.uk Mon May 21 10:12:11 2007 From: tjrc at sanger.ac.uk (Tim Cutts) Date: Mon, 21 May 2007 18:12:11 +0100 Subject: [Beowulf] Re: the comparison between OpenMP and MPI In-Reply-To: <4651C69C.3060504@univ-lemans.fr> References: <464E0E03.802@sgi.com> <4651C69C.3060504@univ-lemans.fr> Message-ID: <3A29FE3F-3DFA-4716-8756-5F958CD6CB6B@sanger.ac.uk> On 21 May 2007, at 5:19 pm, Florent Calvayrac wrote: > I once discovered a bug in the Cray Fortran compiler for T3E. > Without optimization the code was running fine but at O2/O3 > results were wrong. It turned out that some code lines I had > autogenerated > from Maple were implying a large number of variables, exceeding > the number of registers available and wrapping on the first register. > Without optimization no registers were used so the code was giving > correct results and at O1 code was optimized on a line per line basis > and not procedure-wide. > > So even commercial code can be wrong... Some code written here exposed an optimisation bug in Intel's C compiler about a year ago. When I looked at the code, I wasn't surprised. At first glance, my initial reaction was the same as the compiler's; "Oh, that's dead code, we can eliminate that". Wrong! It was possible for the code to have an effect, but it was extremely indirect, and I didn't blame the compiler for screwing up! The function in question was FSM_compile() in the following file: http://cvs.sanger.ac.uk/cgi-bin/viewcvs.cgi/exonerate/src/struct/ fsm.c?rev=1.10&view=markup You will see the last couple of lines in the function look as though they have no effect, at first glance, and the Intel compiler thought so too. They've fixed the bug, since. At the time the workaround was either to turn off optimisation altogether for this file, or to mark the variable 'out' as volatile. Tim PS. I hasten to add this is not my code. But it is run on our cluster a *lot*. :-) From hahn at mcmaster.ca Mon May 21 10:46:34 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Mon, 21 May 2007 13:46:34 -0400 (EDT) Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4651A898.8070900@georgetown.edu> References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> Message-ID: >> 5 years ago, the low-end approach was 100bT; now its 1000bT. the prime >> target for that approach (serial or EP) has simply gotten broader; >> I don't see this as anything to complain about. for "real" parallel, >> you have to pay for the network you need. there as well, you now get more >> for your money, no complaints. complaining that you can't get 1 us, 1GBps >> interconnect for $50/port is just silliness. >> > I disagree on this last point. Why can't low latency interconnects become the > standard? because most of computing is not latency-sensitive. even in HPC, most cycles are consumed by throughput and loose-parallel apps (where Gb works fine.) > It is not just HPC applications that are demanding low latency > networks. what applications are you thinking of? the only I can think of would be lock/metadata traffic for very large parallel clustered DB/filesystem, and even there the argument is weak. From peter.st.john at gmail.com Mon May 21 12:42:24 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Mon, 21 May 2007 15:42:24 -0400 Subject: [Beowulf] Re: the comparison between OpenMP and MPI In-Reply-To: <4651EC63.3050604@moene.indiv.nluug.nl> References: <464E0E03.802@sgi.com> <4651C69C.3060504@univ-lemans.fr> <4651EC63.3050604@moene.indiv.nluug.nl> Message-ID: These points are well-taken but I'd like to defend "naive compiler faith" a little bit. The a-priori job of the compiler, translating n-GL to (n-1) GL, can be reliably correct. A sufficiently expressive programming language will have to have some possible expressions which are not, but for most use the equivalent of eculidean geometry is just as reliable. But there are (at least) two ways that actual compiler (implementations) disappoint us: 1. Warnings. That used to be a separate job, as when C Beautify was an app we ran before CC. There is no limit to what we could like a computer to warn us about, that can't be in the scope of provable correctness. When "cc foo.c-o foo" returns "You don't have time for this, you're late for lunch!" it will be really cool :-) 2. Optimizations. A literal optimization does something like maximize the value of success in proportion to the cost of failure; so literally the optimal solution to a problem may have bugs, because the cost of sporadic bugs is smaller than the cost of avoiding them. An example would be the test "if fopen()..." which may just waste a microsecond every call, in the context that fopen() always succeeds unless the hard drive fails, when everything would fail and there would be no value catching it at that point in your app. A more theoretical example is probablistic primality; you can spend tiny time getting a number that is almost certainly prime, and use it in cryptography with very high (but not certain) safety. Or, you can spend years of CPU time proving the number is prime first. The optimal solution for most applications is to use the "buggy" probabilistic prime. Surely some compiler suites provide cosier warnings (for your needs) and/or better optimizations, and more reliable optimizations, but it's just the strictly-compiling job that I (still) think of as highly reliable. But the point is well-taken, I've been lucky to only need very few explicit registers deep in nested loops, so I've never had to worry much about how the compiler copes when availability is exceeded (although it's imaginable to me that wrapping around could be desired behaviour but that would have to be documented). You guys bang your machines pretty hard, I'm humbled. Peter On 5/21/07, Toon Moene wrote: > > Florent Calvayrac wrote: > > > Without optimization no registers were used so the code was giving > > correct results and at O1 code was optimized on a line per line basis > > and not procedure-wide. > > > > So even commercial code can be wrong... > > As a former boss of mine put it, some 25 years ago: "You think the > compiler is perfect ? Hah, it's just another large program with its > own bugs !" > > -- > Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 > Saturnushof 14, 3738 XG Maartensdijk, The Netherlands > At home: http://moene.indiv.nluug.nl/~toon/ > Who's working on GNU Fortran: > http://gcc.gnu.org/ml/gcc/2007-01/msg00059.html > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jac67 at georgetown.edu Mon May 21 12:44:22 2007 From: jac67 at georgetown.edu (Jess Cannata) Date: Mon, 21 May 2007 15:44:22 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> Message-ID: <4651F696.6060803@georgetown.edu> Mark Hahn wrote: >>> 5 years ago, the low-end approach was 100bT; now its 1000bT. the >>> prime target for that approach (serial or EP) has simply gotten >>> broader; >>> I don't see this as anything to complain about. for "real" parallel, >>> you have to pay for the network you need. there as well, you now >>> get more for your money, no complaints. complaining that you can't >>> get 1 us, 1GBps interconnect for $50/port is just silliness. >>> >> I disagree on this last point. Why can't low latency interconnects >> become the standard? > > because most of computing is not latency-sensitive. even in HPC, most > cycles > are consumed by throughput and loose-parallel apps (where Gb works fine.) > >> It is not just HPC applications that are demanding low latency networks. > > what applications are you thinking of? the only I can think of would > be lock/metadata traffic for very large parallel clustered DB/filesystem, > and even there the argument is weak. > I'm thinking of applications like on-line video games (first-person shooters, real-time strategy, massively multi-player role-playing, etc. benefit from lower latencies) and automatic electronic stock trading to name a few. From jac67 at georgetown.edu Mon May 21 13:47:51 2007 From: jac67 at georgetown.edu (Jess Cannata) Date: Mon, 21 May 2007 16:47:51 -0400 Subject: [Beowulf] Off Topic: High Performance Computing System Administration Position at Georgetown University Message-ID: <46520577.5070209@georgetown.edu> High Performance Computing System Administration Position at Georgetown University To apply, go to the following web site: http://www10.georgetown.edu/hr/employment_services/joblist/jobs.html -------------------------- Job No: 2007-0443S* *Job Title: Systems Administrator* *Department: UIS - AITS Director* *Grade/Level: ( 10) * *Date Posted: May 11, 2007* This position is responsible for the operation of high performance computational resources and large storage facilities, including computer hardware, operating systems and systems software (systems operation) in the Advanced Research Computing Computational Core Facility. The position will monitor and ensure security, availability, performance, disaster recoverability (business continuity), software installation, configuration, maintenance, on going improvement and overall technical support. Also assist in the resolution of systems and applications software and hardware related problems. All work will be under the supervision of the existing systems administrator. The Computational Core Facility supports HPC needs of research scientists and divisions in Main Campus and Medical Center, including Protein Information Resources (PIR), Lombardi Comprehensive Cancer Center, Chemistry, Biology, Physics, PIR, et al. Minimum 2 Years Professional Linux Server Administration Experience Operating System Skills: Kernel Recompiling/tuning Unix Filesystem Administration Samba Fileserver Administration Custom Application Compiling Minimum 2 Years Server Hardware Troubleshooting Minimum 2 Years Networking Experience Scripting for basic System Administration Tasks Tape Library Administration Experience Implementation of disaster recovery plan Database Administration (MySQl, Oracle) Administering Fiber Channel/iSCSI Storage Architectures RHEL experience, a plus, all other *nix experiences Application Skills: Open Source HPC (MPI) experience Grid Computing Toolkits Experience supporting scientific applications on Linux Experience with Job Schedulers a plus Georgetown University offers salaries competitive with other higher education institutions in addition to excellent benefits which include: tuition coverage for employees and children, outstanding professional development opportunities, excellent dental and healthcare for employees and their families and retirement savings plans through a number of the country's best retirement investment firms. Salary commensurate with experience. From hahn at mcmaster.ca Mon May 21 21:38:13 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 22 May 2007 00:38:13 -0400 (EDT) Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4651F696.6060803@georgetown.edu> References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> Message-ID: >>> It is not just HPC applications that are demanding low latency networks. >> >> what applications are you thinking of? the only I can think of would be >> lock/metadata traffic for very large parallel clustered DB/filesystem, >> and even there the argument is weak. >> > I'm thinking of applications like on-line video games (first-person shooters, > real-time strategy, massively multi-player role-playing, etc. benefit from > lower latencies) and automatic electronic stock trading to name a few. humans are deadly slow, and tend to be separated by geograpic latency. the first hop on my cable connection is is 7 ms (perhaps not excellent), and all latency aspects of human perception are in that range. I have a hard time seeing how games would notice the difference between 60 and 30 us (readily achievable by Gb), let alone even detect 1-4 us latency of current HPC interconnect... unless you're suggesting that games use collectives akin to MPI broadcast/reduce/etc? I'm guessing that online auto-trading has "think time" that dominates 1-100 us network latency. again, geography and network topology would probably make more difference than shaving 30 us down to 10... regards, mark hahn. From eugen at leitl.org Mon May 21 23:52:06 2007 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 22 May 2007 08:52:06 +0200 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> Message-ID: <20070522065206.GG17691@leitl.org> On Tue, May 22, 2007 at 12:38:13AM -0400, Mark Hahn wrote: > unless you're suggesting that games use collectives akin to MPI > broadcast/reduce/etc? I could see how a low-latency interconnect would help with a Second Life-like virtual reality, where most things happen server-side. On the other hand, given that low-end servers come with 4 GBit Ethernet NICs, just distributing the game world across a 2d torus would do as well, or better. Has anyone ever heard of a cluster wired in a diamond lattice (made from tetrahedrons) topology? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From Florent.Calvayrac at univ-lemans.fr Tue May 22 00:09:47 2007 From: Florent.Calvayrac at univ-lemans.fr (Florent Calvayrac) Date: Tue, 22 May 2007 09:09:47 +0200 Subject: [Beowulf] fwd : Casablanca Summer school on computational physics :first announcement Message-ID: <4652973B.7030507@univ-lemans.fr> Dear Professors, Dear colleagues and friends, I have pleasure to inform you that the first announcement of the first PNMP2007 school that will be organised by departement of physics in University Hassan II Ain Chock Casablanca on july 2007 in Casablanca Morocco is now launched through both the school website http://tnmp2007.ifrance.com/. The second announcement will concern the abstract submission deadline. Please dear colleagues contribute to the diffusion of information on this school E-mail de l'?cole: tnmp2007 at ifrance.com Title of school: "les techniques num?riques et mod?lisation en physique" With my best regards. D.Saifaoui Laboratoire de Physique Th?orique &Appliqu?e BP 5366 Maarif Casablanca Maroc d.saifaoui at fsac.ac.ma Tel :022230684 Fax 022230674 http://tnmp2007.ifrance.com/ tnmp2007 at ifrance.com -- Florent Calvayrac | Directeur du SC Informatique Ressources Num. de l'Universite du Maine Lab. de Physique de l'Etat Condense UMR-CNRS 6087 Inst. de Rech. en Ingenierie Molec. et Matx Fonctionnels FR CNRS 2575 From patrick at myri.com Tue May 22 00:53:26 2007 From: patrick at myri.com (Patrick Geoffray) Date: Tue, 22 May 2007 03:53:26 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4651A898.8070900@georgetown.edu> References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> Message-ID: <4652A176.6010101@myri.com> Hi Jess, Jess Cannata wrote: > By too expensive, I mean much more expensive than Gig-E which is "free" > on the NIC side and quite cheap on the switch side. Everything is very expensive when compared to GigE :-) > What I should have said is that the NetEffect card is competitive as the > number of network connections to a process on a single node increases. Do you mean TCP connections or iWARP connections ? The number of TCP connections only matters to TOEs (TCP Offload Engine) that have to keep a state in the NIC for each active connection. For stateless offload implementations, the number of TCP connections has no impact. Similarly, connection scaling problem only apply to iWARP and other connection-heavy protocols. >> no usable latency numbers there. if you squint, it looks like they're >> claiming latency of around 7 us, which is _not_ competitive with even >> myri 2G (nor recent IB nor myri 10G.) > The numbers that I saw are not on HPCWire. I didn't realize that when I > sent the link. I recommend that people check out the NetEffect cards and > similar interconnects (low latency 10 Gb Ethernet with iWARP) to see if 7-8 us is roughly the UDP or TCP latency on any decent 10GE NICs when you set interrupt coalescing to 0. For comparison, MX over Ethernet (MXoE) with Myri-10G gets 2.3 us MPI latency with a good 10 GigE switch (Fulcrum or latest Fujitsu), today. iWARP is a bad idea poorly implemented. It makes no sense to have TCP in the middle for cluster computing. It requires a TOE and the can of worms that comes with it. Furthermore, iWARP has serious Send/Recv semantic restrictions that requires explicit per-connection credit-based flow control. Just what the doctor prescribed... > original post. What is/will be the new low-cost network solution? It > doesn't seem to be Infiniband or Myri-10G since their price doesn't seem > to be dropping much. There is definitively already a price war in the 10GigE market. There are far more players than in the IB business. The volume has not ramped up yet and prices are relatively stable. However, low-margin shops such as Broadcom are joining the party, so I don't expect the market to stand still for very long. In the end, price pressure is good for customers, but not that good for vendors. I wholeheartedly agree with you about 10 Gigabit Ethernet having the most potential for lower cost in the medium term. Patrick From larry.stewart at sicortex.com Tue May 22 06:02:38 2007 From: larry.stewart at sicortex.com (Larry Stewart) Date: Tue, 22 May 2007 09:02:38 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <20070522065206.GG17691@leitl.org> References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> Message-ID: <4652E9EE.4010906@sicortex.com> Eugen Leitl wrote: >On Tue, May 22, 2007 at 12:38:13AM -0400, Mark Hahn wrote: > > > >>unless you're suggesting that games use collectives akin to MPI >>broadcast/reduce/etc? >> >> > >I could see how a low-latency interconnect would help with a >Second Life-like virtual reality, where most things happen >server-side. On the other hand, given that low-end servers >come with 4 GBit Ethernet NICs, just distributing the game >world across a 2d torus would do as well, or better. > >Has anyone ever heard of a cluster wired in a diamond >lattice (made from tetrahedrons) topology? > > > I haven't heard of anyone using a diamond lattice to wire a cluster. We're using a Kautz graph, which already makes my brain hurt. What would the advantages of a diamond lattice be? In terms of bisection and diameter? Ease of wiring? -Larry -------------- next part -------------- An HTML attachment was scrubbed... URL: From eugen at leitl.org Tue May 22 06:22:56 2007 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 22 May 2007 15:22:56 +0200 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4652E9EE.4010906@sicortex.com> References: <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> Message-ID: <20070522132256.GO17691@leitl.org> On Tue, May 22, 2007 at 09:02:38AM -0400, Larry Stewart wrote: > I haven't heard of anyone using a diamond lattice to wire a cluster. > We're > using a Kautz graph, which already makes my brain hurt. > What would the advantages of a diamond lattice be? In terms of 4 is the smallest number of edges for a node to span a 3-d mesh, without using a switch. It is suitable for problems with a purely local communication pattern only. 6 edges/node would be better, since resulting in a cubic primitive lattice of nodes (3d torus, or similiar), but I haven't seen a low end box with more than four GBit NICs. Speaking of which, why is nobody integrating small (8-port) GBit switches on motherboards themselves? I've only seen this in firewalls so far. The disadvantage of a diamond lattice is that decomposing your physical system is more complicated (I don't even know which polyhedra and common faces one would wind up with). It is probably not worth the hassle. > bisection and diameter? > Ease of wiring? -- Eugen* Leitl leitl http://leitl.org ______________________________________________________________ ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org 8B29F6BE: 099D 78BA 2FD3 B014 B08A 7779 75B0 2443 8B29 F6BE From rgb at phy.duke.edu Tue May 22 06:37:32 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 22 May 2007 09:37:32 -0400 (EDT) Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4652E9EE.4010906@sicortex.com> References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> Message-ID: On Tue, 22 May 2007, Larry Stewart wrote: > What would the advantages of a diamond lattice be? In terms of bisection and > diameter? > Ease of wiring? Four ports per system, probably, in a 3d lattice. 3d is good because the volume (number of hosts) scales like the maximum number of hops between hosts cubed. With two ports you can make a ring but max_hops increases linearly with number of hosts. With three ports you can make a triangular lattice or a tree structure depending on whether you want to optimize topdown or peer-to-peer, IIRC. With four you can make a 2d square lattice where the volume scales like the square of max_hops OR you can make a tetrahedron and get a volume that scales like the cube of max_hops. Probably with toroidal closure, although, as you say, it makes one's head hurt to visualize it and building the routing table would be "interesting". In fact, you'd probably have to write software to build the per-node routing table. I built a small demo cluster with a tetrahedral lattice of interconnects like this ONCE for Linux Expo about seven or eight years ago. Once (as Voltaire famously replied to the Marquis de Sade) is philosophy... rgb > > -Larry > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From larry.stewart at sicortex.com Tue May 22 07:13:12 2007 From: larry.stewart at sicortex.com (Larry Stewart) Date: Tue, 22 May 2007 10:13:12 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <464E102D.3010305@georgetown.edu> <002601c79b09$05e45710$0900a8c0@objection> <4650B15A.90206@georgetown.edu> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> Message-ID: <4652FA78.7020600@sicortex.com> Robert G. Brown wrote: > On Tue, 22 May 2007, Larry Stewart wrote: > >> What would the advantages of a diamond lattice be? In terms of >> bisection and diameter? >> Ease of wiring? > > > Four ports per system, probably, in a 3d lattice. 3d is good because > the volume (number of hosts) scales like the maximum number of hops > between hosts cubed. Ah. I see that. I looked at the diamond lattice picture in http://phycomp.technion.ac.il/~nika/diamond_structure.html and it did make my head twinge. With Kautz or deBruijn graphs, you get an exponential number of nodes, for node degree k >= 2, and diameter (hopcount D) you get O(k**D) nodes. However, you don't get any obvious mapping of 2D or 3D problems to the graph. You could do this with two NICs per node, if you can send the transmit data and the receive data to different places. Of course even on BlueGene/L they use simulated annealing to map the problem to the machine, because the obvious mapping is often not the best one. See "Optimizing Task Layout on the BlueGene/L Supercomputer" in IBM JSRD March 2005. -Larry From peter.st.john at gmail.com Tue May 22 09:19:54 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue, 22 May 2007 12:19:54 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4652FA78.7020600@sicortex.com> References: <001b01c79952$3a266030$0900a8c0@objection> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> Message-ID: A hypercube (http://en.wikipedia.org/wiki/Hypercube) also gets you exponential space; the max hops is the dimension (3 for a 3-dimensional cube) and the number of nodes is exp(base 2) of the dimension (8 vertices on a cube). To do a tesseract (4-cube), which looks like two cubes nested, you'd need 4 ports per node, 16 nodes, 32 cables, max hop 4. I've poked around and don't see a great 4 ports per node solution; I like the suggestion of putting a router on a motherboard. But you've made me curious about this Kautz and de Bruijn graphs, I'll go look, thanks. Peter On 5/22/07, Larry Stewart wrote: > > Robert G. Brown wrote: > > > On Tue, 22 May 2007, Larry Stewart wrote: > > > >> What would the advantages of a diamond lattice be? In terms of > >> bisection and diameter? > >> Ease of wiring? > > > > > > Four ports per system, probably, in a 3d lattice. 3d is good because > > the volume (number of hosts) scales like the maximum number of hops > > between hosts cubed. > > Ah. I see that. I looked at the diamond lattice picture in > > http://phycomp.technion.ac.il/~nika/diamond_structure.html > > and it did make my head twinge. > > With Kautz or deBruijn graphs, you get an exponential number of nodes, > for node degree > k >= 2, and diameter (hopcount D) you get O(k**D) nodes. However, you > don't get any > obvious mapping of 2D or 3D problems to the graph. You could do this > with two > NICs per node, if you can send the transmit data and the receive data to > different places. > > Of course even on BlueGene/L they use simulated annealing to map the > problem to the > machine, because the obvious mapping is often not the best one. > See "Optimizing Task Layout on the BlueGene/L Supercomputer" in IBM JSRD > March 2005. > > -Larry > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lindahl at pbm.com Tue May 22 13:50:41 2007 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 22 May 2007 13:50:41 -0700 Subject: [Beowulf] Re: the comparison between OpenMP and MPI In-Reply-To: <3A29FE3F-3DFA-4716-8756-5F958CD6CB6B@sanger.ac.uk> References: <464E0E03.802@sgi.com> <4651C69C.3060504@univ-lemans.fr> <3A29FE3F-3DFA-4716-8756-5F958CD6CB6B@sanger.ac.uk> Message-ID: <20070522205041.GB7730@bx9.net> On Mon, May 21, 2007 at 06:12:11PM +0100, Tim Cutts wrote: > When I looked at the code, I wasn't > surprised. At first glance, my initial reaction was the same as the > compiler's; "Oh, that's dead code, we can eliminate that". Wrong! > It was possible for the code to have an effect, but it was extremely > indirect, and I didn't blame the compiler for screwing up! I don't see why you weren't surprised. If "out" was never assigned, it would be dead code, but "out" is assigned. And though "in" and "out" are aliasing each other, they are of the same type. -- greg From gebhardt at hrz.uni-marburg.de Wed May 23 02:13:59 2007 From: gebhardt at hrz.uni-marburg.de (Gebhardt Thomas) Date: Wed, 23 May 2007 11:13:59 +0200 Subject: [Beowulf] SATA(?) errors locks up node Message-ID: <200705231114.00231.gebhardt@hrz.uni-marburg.de> Hi, we are running a cluster of 57 dual opteron nodes. Once or twice a week one of these nodes gets in an error state and can't connect to the I/O-subsystem anymore. I need to reboot that node. As far as I can see, the problem occurs randomly at any of our nodes, i.e., the MTBF of a single node is about 6-12 months. I still don't know whether this is a problem of the linux kernel sata driver, a hardware problem, a flaw of the disk firmware or something else. I'm looking for a possibilty to track down the problem without substantially interfering with the jobs on the cluster. This is our environment: TYAN S3992 motherboard with Serverworks HT1000+2000 chipset. 2 DualCore Opteron 2216 HE 2.4GHz, 16GByte Mem Maxtor 250GByte SATA disk, WDC WD2500YS-01SHB0, firmware rev. 20.06C03 Debian sarge amd64 (custom kernel) I tried several linux kernel versions (eg. 2.6.18.1, currently: 2.6.20.3 from kernel.org) which seems to make no difference. I also tried to reduce SATA bandwidth down to 150MB/s with a jumper at the disk. This does not help either. NCQ is disabled: # cat /sys/block/sda/device/queue_depth 1 Any ideas? Thanks, Thomas +++++++++++++++++++ Here is a typical console error log. As far as I can see, this means that the communication between the kernel and the disk suddenly get interupted. May 17 04:39:51 ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x40000000 action 0x2 frozen May 17 04:39:51 ata1.00: cmd ca/00:50:9a:32:7b/00:00:00:00:00/e0 tag 0 cdb 0x0 data 40960 out May 17 04:39:51 res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 17 04:39:58 ata1: port is slow to respond, please be patient (Status 0xd0) May 17 04:40:21 ata1: port failed to respond (30 secs, Status 0xd0) May 17 04:40:21 ata1: soft resetting port May 17 04:40:28 ata1: port is slow to respond, please be patient (Status 0xd0) May 17 04:40:51 ata1: port failed to respond (30 secs, Status 0xd0) May 17 04:40:51 ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 17 04:40:51 ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C May 17 04:40:51 ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C May 17 04:40:51 ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C May 17 04:40:51 ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C May 17 04:40:51 ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C May 17 04:40:51 ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C May 17 04:40:51 ATA: abnormal status 0xD0 on port 0xFFFFC2000000401C May 17 04:41:21 ata1.00: qc timeout (cmd 0xec) May 17 04:41:22 ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 17 04:41:22 ata1.00: revalidation failed (errno=-5) May 17 04:41:22 ata1: failed to recover some devices, retrying in 5 secs May 17 04:41:26 ata1: hard resetting port May 17 04:41:34 ata1: port is slow to respond, please be patient (Status 0xd0) May 17 04:41:57 ata1: port failed to respond (30 secs, Status 0xd0) May 17 04:41:57 ata1: COMRESET failed (device not ready) May 17 04:41:57 ata1: hardreset failed, retrying in 5 secs May 17 04:42:02 ata1: hard resetting port May 17 04:42:09 ata1: port is slow to respond, please be patient (Status 0xd0) May 17 04:42:32 ata1: port failed to respond (30 secs, Status 0xd0) May 17 04:42:32 ata1: COMRESET failed (device not ready) May 17 04:42:32 ata1: hardreset failed, retrying in 5 secs May 17 04:42:37 ata1: hard resetting port May 17 04:42:45 ata1: port is slow to respond, please be patient (Status 0xd0) May 17 04:43:08 ata1: port failed to respond (30 secs, Status 0xd0) May 17 04:43:08 ata1: COMRESET failed (device not ready) May 17 04:43:08 ata1: reset failed, giving up May 17 04:43:08 ata1.00: disabled May 17 04:43:08 ata1: EH complete May 17 04:43:08 sd 0:0:0:0: SCSI error: return code = 0x00040000 May 17 04:43:08 end_request: I/O error, dev sda, sector 8073882 May 17 04:43:08 Buffer I/O error on device sda2, logical block 9189 May 17 04:43:08 lost page write due to I/O error on sda2 May 17 04:43:08 sd 0:0:0:0: SCSI error: return code = 0x00040000 May 17 04:43:08 end_request: I/O error, dev sda, sector 16099660 May 17 04:43:08 Buffer I/O error on device sda3, logical block 12365 May 17 04:43:08 lost page write due to I/O error on sda3 May 17 04:43:08 sd 0:0:0:0: SCSI error: return code = 0x00040000 May 17 04:43:08 end_request: I/O error, dev sda, sector 73606884 May 17 04:43:08 Buffer I/O error on device sda3, logical block 7200768 May 17 04:43:08 lost page write due to I/O error on sda3 .... From hahn at mcmaster.ca Wed May 23 08:37:12 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Wed, 23 May 2007 11:37:12 -0400 (EDT) Subject: [Beowulf] SATA(?) errors locks up node In-Reply-To: <200705231114.00231.gebhardt@hrz.uni-marburg.de> References: <200705231114.00231.gebhardt@hrz.uni-marburg.de> Message-ID: > I still don't know whether this is a problem of the linux kernel sata driver, > a hardware problem, a flaw of the disk firmware or something else. I'm the logs show that a command times out, and defies recovery. I don't think your chipset is the most common - is the SATA controller integrated, or something like a Promise chip? do you have any guess about whether your disks are getting enough power? it seems to be a fairly common occurrance for people to report this kind of "stops working" bug to the list (linux-ide at vger.kernel.org), only later to discover that the problem was a marginal power supply. > looking for a possibilty to track down the problem without substantially > interfering with the jobs on the cluster. the sata developers hang out on linux-ide, and seem very responsive. quite a lot of work has been done on exception handling, but as always, it's the most common controllers which are best tested/supported. > I tried several linux kernel versions (eg. 2.6.18.1, currently: 2.6.20.3 > from kernel.org) which seems to make no difference. well, by kernel standards, 2.6.20.3 is fairly old; there have certainly been plenty of SATA updates this year. > I also tried to reduce SATA bandwidth down to 150MB/s with a jumper at > the disk. This does not help either. it wouldn't, unless you had a noise problem with the cable. > NCQ is disabled: > # cat /sys/block/sda/device/queue_depth > 1 such features wouldn't cause the fairly low-level hang in your logs - to me it looks like power, given that it appears to affect even the phy-level disk interface. it wouldn't hurt to see what smart says about it (health, metrics and even a self-test.) you might also try stressing the disk with IO to see whether you can repeatably trigger the problem. regards, mark hahn. From peter.st.john at gmail.com Wed May 23 09:08:38 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed, 23 May 2007 12:08:38 -0400 Subject: [Beowulf] SATA(?) errors locks up node In-Reply-To: References: <200705231114.00231.gebhardt@hrz.uni-marburg.de> Message-ID: Well just generally I was thinking about a block design, spending some money for extra 1) cooling, 2) shielding, and 3) power, for overlapping sections of the cluster, and see if the incidence rate of failures correlates with anything. You can imagine stacking your nodes in a (3-dimensional) cube; the top X percent get extra shielding, the front X get cooling, and the right X get power. If X is 50% you are spending alot of time and money on the experiment but would get a statistically meaningful result (which might be no correlation at all) in a few weeks; if X is tiny you would have to wait long enough for a random failure to occur in the uprgraded volumes, so you'd invest less but have a longer wait. If this is has been an issue for a long time and the expected working lifetime of the cluster is long into the future, it could be worth doing something like this for X fairly small. A side-benefit would be data for a broader cost-benefit analysis of plausible upgrades, if you can measure other performance characteristics besides the failures. Peter On 5/23/07, Mark Hahn wrote: > > > I still don't know whether this is a problem of the linux kernel sata > driver, > > a hardware problem, a flaw of the disk firmware or something else. I'm > > the logs show that a command times out, and defies recovery. I don't > think > your chipset is the most common - is the SATA controller integrated, or > something like a Promise chip? > > do you have any guess about whether your disks are getting enough power? > it seems to be a fairly common occurrance for people to report this kind > of > "stops working" bug to the list (linux-ide at vger.kernel.org), only later to > discover that the problem was a marginal power supply. > > > looking for a possibilty to track down the problem without substantially > > interfering with the jobs on the cluster. > > the sata developers hang out on linux-ide, and seem very responsive. > quite a lot of work has been done on exception handling, but as always, > it's the most common controllers which are best tested/supported. > > > I tried several linux kernel versions (eg. 2.6.18.1, currently: 2.6.20.3 > > from kernel.org) which seems to make no difference. > > well, by kernel standards, 2.6.20.3 is fairly old; there have certainly > been > plenty of SATA updates this year. > > > I also tried to reduce SATA bandwidth down to 150MB/s with a jumper at > > the disk. This does not help either. > > it wouldn't, unless you had a noise problem with the cable. > > > NCQ is disabled: > > # cat /sys/block/sda/device/queue_depth > > 1 > > such features wouldn't cause the fairly low-level hang in your logs - > to me it looks like power, given that it appears to affect even the > phy-level > disk interface. it wouldn't hurt to see what smart says about it (health, > metrics and even a self-test.) you might also try stressing the disk with > IO to see whether you can repeatably trigger the problem. > > regards, mark hahn. > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From James.P.Lux at jpl.nasa.gov Wed May 23 10:04:47 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 23 May 2007 10:04:47 -0700 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <4651A898.8070900@georgetown.edu> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> Message-ID: <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> At 09:19 AM 5/22/2007, Peter St. John wrote: >A hypercube >(http://en.wikipedia.org/wiki/Hypercube) >also gets you exponential space; the max hops is the dimension (3 >for a 3-dimensional cube) and the number of nodes is exp(base 2) of >the dimension (8 vertices on a cube). To do a tesseract (4-cube), >which looks like two cubes nested, you'd need 4 ports per node, 16 >nodes, 32 cables, max hop 4. I've poked around and don't see a great >4 ports per node solution; I like the suggestion of putting a router >on a motherboard. Mind you, this is what Intel started with on their iPSC/1 and iPSC/2 computers. The early ones had multiple NICs in the nodes, then, later, they had a 8 port (I think) router in each node. It's not clear that this saves anything over a simpler architecture (e.g. external switch with lots of ports in a crossbar) unless you can do circuit switched routing (so you don't have a one packet delay in the switch) AND your algorithm can take advantage of it. I spent quite some time in the late 80s trying to figure out clever ways to take advantage of a hypercube topology for a modeling application.. I'm sure there are algorithms which are a natural fit, but the ones I was using weren't. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.st.john at gmail.com Wed May 23 10:51:09 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed, 23 May 2007 13:51:09 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> References: <001b01c79952$3a266030$0900a8c0@objection> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> Message-ID: Well actually, I don't want to figure out how to take advantage of a network topology, I want to figure out a clever weay for my optimization software to figure out how to take advantage of the topology. That is, I want my AI to solve the problem for me; my design for AI, is for the AI to figure out how to design itself. So I look for problems that: 1. don't have obvious solutions from exisiting qualitative theory (e.g. genetic algorithms themselves; the best parameters, such as mutation rates, and selection of parameters, is still debated and the subject of experiments); 2. Can be interpreted by an algorithm (expressible in a finite representation, e.g. a network topology is a binary array, the question "what is the meaning of life?" is not so expressible) 3. the solution of the problem itself helps the method of solution; e.g., the optimization of a network topology, or the message passing system using a fixed topology, would itself improve the performance of the optimizing software running on the system; that is, the AI optimizes itself (indirectly). So part of the reason I want a beowulf is that my AI can optimize it's platform, in the course of optimizing itself; besides being a horrible RAM hog and CPU hog and being trivially parallelizable (gen algs). So I"m interested in **any** topology that offers choices to running processes (should I call this distant idle node or that nearby busy node?) so it has something to optimize. So that's why hypercubes attract me. Besides it sounds all abstract mathy, even though it really isn't :-) Peter On 5/23/07, Jim Lux wrote: > > At 09:19 AM 5/22/2007, Peter St. John wrote: > > A hypercube ( http://en.wikipedia.org/wiki/Hypercube) also gets you > exponential space; the max hops is the dimension (3 for a 3-dimensional > cube) and the number of nodes is exp(base 2) of the dimension (8 vertices on > a cube). To do a tesseract (4-cube), which looks like two cubes nested, > you'd need 4 ports per node, 16 nodes, 32 cables, max hop 4. I've poked > around and don't see a great 4 ports per node solution; I like the > suggestion of putting a router on a motherboard. > > > Mind you, this is what Intel started with on their iPSC/1 and iPSC/2 > computers. The early ones had multiple NICs in the nodes, then, later, they > had a 8 port (I think) router in each node. > > It's not clear that this saves anything over a simpler architecture (e.g. > external switch with lots of ports in a crossbar) unless you can do circuit > switched routing (so you don't have a one packet delay in the switch) AND > your algorithm can take advantage of it. I spent quite some time in the late > 80s trying to figure out clever ways to take advantage of a hypercube > topology for a modeling application.. I'm sure there are algorithms which > are a natural fit, but the ones I was using weren't. > > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.st.john at gmail.com Wed May 23 10:52:08 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed, 23 May 2007 13:52:08 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> References: <001b01c79952$3a266030$0900a8c0@objection> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> Message-ID: But oh and Jim if you recall any papers about this I could read that would be "Jim" Dandy. Peter On 5/23/07, Jim Lux wrote: > > At 09:19 AM 5/22/2007, Peter St. John wrote: > > A hypercube ( http://en.wikipedia.org/wiki/Hypercube) also gets you > exponential space; the max hops is the dimension (3 for a 3-dimensional > cube) and the number of nodes is exp(base 2) of the dimension (8 vertices on > a cube). To do a tesseract (4-cube), which looks like two cubes nested, > you'd need 4 ports per node, 16 nodes, 32 cables, max hop 4. I've poked > around and don't see a great 4 ports per node solution; I like the > suggestion of putting a router on a motherboard. > > > Mind you, this is what Intel started with on their iPSC/1 and iPSC/2 > computers. The early ones had multiple NICs in the nodes, then, later, they > had a 8 port (I think) router in each node. > > It's not clear that this saves anything over a simpler architecture (e.g. > external switch with lots of ports in a crossbar) unless you can do circuit > switched routing (so you don't have a one packet delay in the switch) AND > your algorithm can take advantage of it. I spent quite some time in the late > 80s trying to figure out clever ways to take advantage of a hypercube > topology for a modeling application.. I'm sure there are algorithms which > are a natural fit, but the ones I was using weren't. > > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmarkov at umcs.maine.edu Mon May 21 10:59:57 2007 From: lmarkov at umcs.maine.edu (Linda Markowsky) Date: Mon, 21 May 2007 13:59:57 -0400 (EDT) Subject: [Beowulf] Survey on Supercomputer Cluster Security Message-ID: To Cluster System Administrators: Our University has done some classified DoD work on various Beowulf clusters. As a result, we have gotten interested in the questions of securing supercomputer clusters. In particular, we are especially interested in better understanding the nature of the threats against supercomputer clusters, and the extent to which security measures are implemented . It would help us greatly if you would answer a few questions on this subject. Feel free to not answer any question that you do not wish to answer. Just select the no answer selection. A complete list of questions and possible answers are listed below in text form. If there are several system administrators for your cluster(s), please ensure that your group submits only one survey per cluster. You can answer the questions interactively at our website (our preferred method), by e-mail, by fax, and by regular mail. To answer on the web, please go to http://www.cs.umaine.edu/~markov/clustersurvey/survey.html and login with login ClusterSurvey password S3cur3Qu3st The login is just to keep random visitors to the website from filling out the questionnaire. The web questionnaire will only be available until June 1, 2007. There are two options when if you choose to by e-mail, fax, or regular mail. First, you can download a PDF version of the questionnaire from a link on the webpage referenced above. This is an interactive PDF file that permits you to answer the questions in the form providing you are using a new enough version of Adobe Acrobat Reader (Version 8 recommended). You can either print out the results and fax them or mail them, or you can e-mail the file or just the answers by hitting the e-mail button in the form. Alternatively, you can answer the questions on the form below and either e-mail it back or print out the results and fax or e-mail them back. If you wish to fax your answers, please fax them to 207-866-3050, which is a secure fax. We will collect whatever data we receive and organize the results. These results will be available on the web using the URL above starting July 15, 2007 in case you are interested. All data will be aggregated and in no way will we identify any respondents -- my goal is to have some general numbers and percentages that can help us better understand who is trying to crack into supercomputers and why. We apologize for sending you this request, but would appreciate any help that you might be able to give. We will only send out one request for information. If you know of other people who would be interested in the results or would be interested in providing data, please feel free to send them a copy of this letter. Sincerely yours, George Markowsky, Professor Department of Computer Science 5752 Neville Hall University of Maine Orono, ME 04469-5752 markov@ maine.edu http:/ / www.cs.umaine.edu/ ~ markov +1-207-581-3940 phone +1-207-581-4977 fax QUESTIONNAIRE 1. How frequently are your supercomputer clusters attacked relative to any desktops that might be in your laboratories? More Frequently About the Same Frequency Less Frequently No Answer 2. How sophisticated are the attacks against your clusters compared to the attacks against any desktops that might be in your laboratories? More Sophisticated About the Same Level of Sophistication Less Sophisticated No Answer 3. Are there any IP addresses that regularly try to break into your cluster? Yes No Not Sure No Answer 4. Has anyone ever tried a man-in-the-middle type of attack against any of your clusters? Yes No Not Sure No Answer 5. Have you ever been attacked from foreign IP addresses? Yes No Not Sure No Answer 6. Have your clusters ever been attacked by foreign interests? Yes No Not Sure No Answer 7. Has anyone ever tried a physical approach to either disrupt a computation or to steal data? Yes No Not Sure No Answer 8. Has anyone ever tried to bribe or otherwise co-opt one of the cluster staff into helping with compromising the security? Yes No Not Sure No Answer 9. How many times has security been breached on one of your supercomputer clusters over the past three years that resulted in either downtime or lost data? 11 or more 6-10 2-5 1 0 Not Sure No Answer 10. Does your center have a person whose primary responsibility is cluster security? Yes No Not Sure No Answer 11. Do you run an intrusion detection system on your clusters? Yes, on all No, not on any Mixed, on some and not on others Not Sure No Answer 12. How often do you check for rootkits? Multiple Times a Day Daily Weekly Monthly Annually Not at all Not Sure No Answer 13. How often do you run backups on your clusters? Multiple Times a Day Daily Weekly Monthly Annually Not at all Not Sure No Answer From toon at moene.indiv.nluug.nl Mon May 21 12:00:51 2007 From: toon at moene.indiv.nluug.nl (Toon Moene) Date: Mon, 21 May 2007 21:00:51 +0200 Subject: [Beowulf] Re: the comparison between OpenMP and MPI In-Reply-To: <4651C69C.3060504@univ-lemans.fr> References: <464E0E03.802@sgi.com> <4651C69C.3060504@univ-lemans.fr> Message-ID: <4651EC63.3050604@moene.indiv.nluug.nl> Florent Calvayrac wrote: > Without optimization no registers were used so the code was giving > correct results and at O1 code was optimized on a line per line basis > and not procedure-wide. > > So even commercial code can be wrong... As a former boss of mine put it, some 25 years ago: "You think the compiler is perfect ? Hah, it's just another large program with its own bugs !" -- Toon Moene - e-mail: toon at moene.indiv.nluug.nl - phone: +31 346 214290 Saturnushof 14, 3738 XG Maartensdijk, The Netherlands At home: http://moene.indiv.nluug.nl/~toon/ Who's working on GNU Fortran: http://gcc.gnu.org/ml/gcc/2007-01/msg00059.html From ruhollah.mb at gmail.com Tue May 22 06:21:36 2007 From: ruhollah.mb at gmail.com (Ruhollah Moussavi Baygi ) Date: Tue, 22 May 2007 16:51:36 +0330 Subject: [Beowulf] LAM, MPICH, OpenMP? Message-ID: <1bef2ce30705220621v1e134b86s620fe4d46e2b179f@mail.gmail.com> Dear all at Beowulf, As a conceptual question in the realm of parallel programming, what is the difference between LAM/MPI, MPICH, and OpenMP? Could one say that all of these are something like 'different compilers' for 'one programming language'? My second doubt is about their ease of use as well as their efficiency AND their compatibility with different distros. Thanks for any coming guidance and explanations -- Best, Ruhollah Moussavi Baygi -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucas.schnorr at imag.fr Tue May 22 07:17:28 2007 From: lucas.schnorr at imag.fr (Lucas Schnorr) Date: Tue, 22 May 2007 16:17:28 +0200 Subject: [Beowulf] CFP SBAC-PAD 2007: Extended Deadline In-Reply-To: References: Message-ID: Our apologies if you receive multiple copies of this message. Please distribute this CFP to those who might be interested. FYI: SBAC-PAD is currently classified by CAPES (the postgraduate educational agency of the Brazilian government) as a 'Qualis International A' Conference. =========================================================== CALL FOR PAPERS SBAC-PAD 2007 19th International Symposium on Computer Architecture and High Performance Computing Gramado, Brazil October 24-27, 2007 http://www.sbc.org.br/sbac/2007/ Co-sponsors: Brazilian Computer Society IEEE Computer Society (pending) IFIP WG 10.3 ** Papers must be registered together with an abstract ** ** by June 04, 2007 ** ** Paper submission deadline extended to June 11, 2007 ** =========================================================== SBAC-PAD is an annual international conference series, the first of which was held 20 years ago, in 1987. Each conference has traditionally presented new developments and high performance applications, as well as the latest trends in computer architecture and parallel and distributed technologies. This year, the symposium returns to Gramado, Rio Grande do Sul, Brazil. The conference will be held at the Hotel "Serra Azul," located in the heart of downtown Gramado and within walking distance of the main tourist attractions. SBAC_PAD 2007 will be sponsored by the Brazilian Computer Society, the IEEE Computer Society (approval pending) through the Technical Committees on Computer Architecture (TCCA) and Scalable Computing (TCSC), and IFIP. Authors are invited to submit manuscripts that present original unpublished research in all areas of computer architecture and high performance computing. Work focusing on applications or emerging technologies is especially welcome. Topics of interest include, but are not limited to: - Application-specific Architectures - Benchmarking, Performance Measurements and Analysis - Cache and Memory Architectures - Embedded Systems - Fault-Tolerant Architectures and Systems - Grid, Cluster, and Peer-to-Peer Computing - High Performance Applications and Parallel and Distributed Algorithms - Interconnection Networks, Routing, and Communication - Languages, Compilers and Tools for Parallel and Distributed Programming - Load Balancing and Scheduling - Microarchitecture - Operating Systems and Virtualization - Parallel and Distributed Architectures - Pervasive and Heterogeneous Computing - Real World Applications and Case Studies - Reconfigurable Systems SBAC-PAD invites manuscripts of original research, written in English and not exceeding 8 pages in double column following the IEEE Conference style, to be submitted in PDF format. The proceeding will be published by the IEEE Computer Society Press. A blind submission and review process will be used, therefore papers should be submitted without the authors' names on the manuscript. Moreover, authors should take every reasonable precautions to hide their identity. Citations to the authors' own prior work can be included, but references should be in the third person. Please check the SBAC-PAD 2007 web site for further paper submission and format information: http://www.sbc.org.br/sbac/2007/ At least one of the authors of each accepted paper must register and present the paper at the conference. Simultaneous submission to other journals or conferences with published proceedings is not allowed. An award will be given to the best paper at the symposium and NEC do Brasil will present the NEC AWARD, with a value of R$6000, to the authors of the best paper in the general area of high performance applications. The authors of selected papers will also be invited to submitted extended versions of their work for possible publication in a special issue of the International Journal of Parallel Programming. Important Dates --------------- - Paper registration and abstract submission deadline June 04, 2007 - Paper upload deadline (firm) June 11, 2007 - Author Notification July 16, 2007 - Conference October 24-27, 2007 From James.P.Lux at jpl.nasa.gov Wed May 23 12:08:51 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 23 May 2007 12:08:51 -0700 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> Message-ID: <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> At 10:52 AM 5/23/2007, Peter St. John wrote: >But oh and Jim if you recall any papers about this I could read that >would be "Jim" Dandy. I was working off memory, and the iPSC/1 and iPSC/2 manuals I have in my office as a historical artifact. I seem to recall that if you google hypercube and intel, you'll turn up some of the papers that were written early on. The guys who started with the hypercube interconnect were at CalTech, as I recall, and spun off to form a supercomputer company embodying that, which Intel also adopted. >Peter > > >On 5/23/07, Jim Lux ><James.P.Lux at jpl.nasa.gov> wrote: >At 09:19 AM 5/22/2007, Peter St. John wrote: >>A hypercube ( http://en.wikipedia.org/wiki/Hypercube) also gets you >>exponential space; the max hops is the dimension (3 for a >>3-dimensional cube) and the number of nodes is exp(base 2) of the >>dimension (8 vertices on a cube). To do a tesseract (4-cube), which >>looks like two cubes nested, you'd need 4 ports per node, 16 nodes, >>32 cables, max hop 4. I've poked around and don't see a great 4 >>ports per node solution; I like the suggestion of putting a router >>on a motherboard. > >Mind you, this is what Intel started with on their iPSC/1 and iPSC/2 >computers. The early ones had multiple NICs in the nodes, then, >later, they had a 8 port (I think) router in each node. > >It's not clear that this saves anything over a simpler architecture >(e.g. external switch with lots of ports in a crossbar) unless you >can do circuit switched routing (so you don't have a one packet >delay in the switch) AND your algorithm can take advantage of it. I >spent quite some time in the late 80s trying to figure out clever >ways to take advantage of a hypercube topology for a modeling >application.. I'm sure there are algorithms which are a natural >fit, but the ones I was using weren't. > > >James Lux, P.E. >Spacecraft Radio Frequency Subsystems Group >Flight Communications Systems Section >Jet Propulsion Laboratory, Mail Stop 161-213 >4800 Oak Grove Drive >Pasadena CA 91109 >tel: (818)354-2075 >fax: (818)393-6875 > James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgb at phy.duke.edu Wed May 23 12:51:29 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Wed, 23 May 2007 15:51:29 -0400 (EDT) Subject: [Beowulf] LAM, MPICH, OpenMP? In-Reply-To: <1bef2ce30705220621v1e134b86s620fe4d46e2b179f@mail.gmail.com> References: <1bef2ce30705220621v1e134b86s620fe4d46e2b179f@mail.gmail.com> Message-ID: On Tue, 22 May 2007, Ruhollah Moussavi Baygi wrote: > Dear all at Beowulf, > > As a conceptual question in the realm of parallel programming, what is the > difference between LAM/MPI, MPICH, and OpenMP? Could one say that all of > these are something like 'different compilers' for 'one programming > language'? No. They are more like "different libraries" that share (more or less) "one API". In fact, that is indeed what they are. Remember, things like sin(x) aren't part of the compiler per se, although they may appear to be. They are library functions. I might implement the actual computation of sin(x) one way, somebody else might implement it another way -- all a user worries about is that when he or she calls sin(x) for any float or double x, that the result is a float or double that is equal in value to the (discretized) value of the sine of the (discretized) argument, ideally to machine precision. So the functions and operations defined in all of these MPIs are implemented in different ways, but they are (supposedly) share a common library reference, and code written for one "should" (within some allowed leeway associated usually with interacting with the higher level implementation framework) have the same result -- move data from one node to another node, for example -- in a reproducible and well-defined way. One may do the movement more EFFICIENTLY than another, but the final result should be the same. Note that this is distinct from an ABI -- if they shared an ABI many parallel programmers would be very happy because a binary compiled with one of these (shared) libraries could be moved to a system with a different (shared) library and it would still execute. The API is "almost" an ABI as it is, but it doesn't specify enough about how the data is stored and moved, global variables and so on can be different, so code compiled with one library in general is NOT so portable. > My second doubt is about their ease of use as well as their efficiency AND > their compatibility with different distros. I'll let others with experience address this, because one part of this question is theoretical and one is practical. On the theoretical side, they are all open source, all packaged, and all available or easy to build on pretty much any linux distro. Many distros currently have anywhere from one to all three of them ready to install with an apt-get or yum install straight from the primary distro support repos and their mirrors, or installable/maintainable from auxiliary repos that one can easily add post install. Regarding efficiency, they are all different. There are different design trade-offs. They are probably different per architecture as well. They have different things supported in terms of peripherals, because another problem with the lack of an ABI is that "cluster devices" e.g. advanced networking support tend to work only for one of them at a time. And they are likely to have only limited binary compatibility across different distros depending on their library support requirements, same as all the OTHER binaries or libraries in linux because LINUX still lacks a coherent binary compatibility layer. HOWEVER, it is very probable that source developed for and with LAM/MPI under one distro can be cleanly recompiled and will just work for LAM/MPI for any other distro that has roughly the same versioning, with some code being less demanding and more portable than other code as one might expect. The binary probably won't move because of the lack of ABI, but the code will almost certainly move to the same MPI on different distros without porting, and will move to different MPIs with only fairly minimal tweaking in a lot of cases (again depending on how much one uses calls that aren't in the API, if any). Is that clear enough? The PRACTICAL side of things is that everything I just said can still be true or false because of bugs, per specific release per specific distributions per specific MPI program. YMMV. Caveat Emptor. Don't say I didn't warn you. Others may have anecdotes of just where the sleeping dog bit them. However, there are plenty of anecdotes where it didn't, so don't be discouraged. rgb > > Thanks for any coming guidance and explanations > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From peter.st.john at gmail.com Wed May 23 13:32:30 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed, 23 May 2007 16:32:30 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> References: <001b01c79952$3a266030$0900a8c0@objection> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> Message-ID: Mostly I was thinking of TMC (famous for the animation in Jurassic Park), 1982-1994, from MIT, mostly acquired by Sun; so something from CalTech maybe predating that, or competing with it, would be very interesting. I'll look, thanks. My only artifacts are DOS 3.2 and SVr4 manuals :-) Peter On 5/23/07, Jim Lux wrote: > > At 10:52 AM 5/23/2007, Peter St. John wrote: > > But oh and Jim if you recall any papers about this I could read that would > be "Jim" Dandy. > > > I was working off memory, and the iPSC/1 and iPSC/2 manuals I have in my > office as a historical artifact. > > I seem to recall that if you google hypercube and intel, you'll turn up > some of the papers that were written early on. The guys who started with > the hypercube interconnect were at CalTech, as I recall, and spun off to > form a supercomputer company embodying that, which Intel also adopted. > > Peter > > > On 5/23/07, *Jim Lux* wrote: > At 09:19 AM 5/22/2007, Peter St. John wrote: > > A hypercube ( > http://en.wikipedia.org/wiki/Hypercube) also gets you exponential space; > the max hops is the dimension (3 for a 3-dimensional cube) and the number of > nodes is exp(base 2) of the dimension (8 vertices on a cube). To do a > tesseract (4-cube), which looks like two cubes nested, you'd need 4 ports > per node, 16 nodes, 32 cables, max hop 4. I've poked around and don't see a > great 4 ports per node solution; I like the suggestion of putting a router > on a motherboard. > > > Mind you, this is what Intel started with on their iPSC/1 and iPSC/2 > computers. The early ones had multiple NICs in the nodes, then, later, they > had a 8 port (I think) router in each node. > > It's not clear that this saves anything over a simpler architecture (e.g. > external switch with lots of ports in a crossbar) unless you can do circuit > switched routing (so you don't have a one packet delay in the switch) AND > your algorithm can take advantage of it. I spent quite some time in the late > 80s trying to figure out clever ways to take advantage of a hypercube > topology for a modeling application.. I'm sure there are algorithms which > are a natural fit, but the ones I was using weren't. > > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry.stewart at sicortex.com Wed May 23 14:20:02 2007 From: larry.stewart at sicortex.com (Larry Stewart) Date: Wed, 23 May 2007 17:20:02 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> Message-ID: <4654B002.7040207@sicortex.com> Peter St. John wrote: > Mostly I was thinking of TMC (famous for the animation in Jurassic > Park), 1982-1994, from MIT, mostly acquired by Sun; so something from > CalTech maybe predating that, or competing with it, would be very > interesting. I'll look, thanks. > My only artifacts are DOS 3.2 and SVr4 manuals :-) > Peter > > > On 5/23/07, *Jim Lux* > wrote: > > At 10:52 AM 5/23/2007, Peter St. John wrote: > >> But oh and Jim if you recall any papers about this I could read >> that would be "Jim" Dandy. > > > I was working off memory, and the iPSC/1 and iPSC/2 manuals I have > in my office as a historical artifact. > > I seem to recall that if you google hypercube and intel, you'll > turn up some of the papers that were written early on. The guys > who started with the hypercube interconnect were at CalTech, as I > recall, and spun off to form a supercomputer company embodying > that, which Intel also adopted. > > The problem with hypercubes is that the number of NICs per node grows with the machine size. Want to double your machine size? Then add a NIC to every node you already have. The number of nics, or links, or cables, grows O(NlogN) for N nodes. You get a machine diameter which is log N, which is nice, but there are ways to do that with a fixed number of NICs per node such as fat trees or the Kautz/deBruijn family. TMC had, among other luminaries, Richard Feynman to help work out the routing software. -Larry -------------- next part -------------- An HTML attachment was scrubbed... URL: From elken at pathscale.com Wed May 23 15:10:04 2007 From: elken at pathscale.com (Tom Elken) Date: Wed, 23 May 2007 15:10:04 -0700 Subject: [Beowulf] LAM, MPICH, OpenMP? Message-ID: <4654BBBC.7010408@pathscale.com> > From: "Robert G. Brown" > Subject: Re: [Beowulf] LAM, MPICH, OpenMP? > To: Ruhollah Moussavi Baygi > Cc: beowulf at beowulf.org > Message-ID: > Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed > > On Tue, 22 May 2007, Ruhollah Moussavi Baygi wrote: > >> > Dear all at Beowulf, >> > >> > As a conceptual question in the realm of parallel programming, what is the >> > difference between LAM/MPI, MPICH, and OpenMP? Could one say that all of >> > these are something like 'different compilers' for 'one programming >> > language'? > > No. They are more like "different libraries" that share (more or less) > "one API". In fact, that is indeed what they are. I think RGB was answering as though you typed "Open MPI" rather than "OpenMP." If you did mean to type "Open MPI," then ignore the following. His answer was very appropriate for several MPI's. OpenMP is an API that is implemented through directives (a special kind of statement) that has to be interpreted by the compiler to generate multi-threaded code. With the OpenMP compiler will be provided an OpenMP runtime library to manage the threading at runtime. Typically MPI is the API for running parallel applications across a cluster and OpenMP for running parallel applications within a node (an SMP multiprocessor machine). You may want to get a book that covers both topics like "Parallel Programming in C with MPI and OpenMP" by Michael Quinn to get at the conceptual differences and similarities between the two. Or google searching can provide a lot of free references to both. -Tom -- ~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Tom Elken Manager, Performance Engineering tom.elken at qlogic.com QLogic Corporation Host Solutions Group From James.P.Lux at jpl.nasa.gov Wed May 23 15:59:48 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Wed, 23 May 2007 15:59:48 -0700 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4654B002.7040207@sicortex.com> References: <001b01c79952$3a266030$0900a8c0@objection> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> <4654B002.7040207@sicortex.com> Message-ID: <6.2.3.4.2.20070523155721.0326fe20@mail.jpl.nasa.gov> At 02:20 PM 5/23/2007, Larry Stewart wrote: >Peter St. John wrote: >>Mostly I was thinking of TMC (famous for the animation in Jurassic >>Park), 1982-1994, from MIT, mostly acquired by Sun; so something >>from CalTech maybe predating that, or competing with it, would be >>very interesting. I'll look, thanks. >>My only artifacts are DOS 3.2 and SVr4 manuals :-) >>Peter >> >> >The problem with hypercubes is that the number of NICs per node >grows with the machine size. Want to double your machine size? >Then add a NIC to every node you already have. The number of nics, >or links, or cables, grows O(NlogN) for N nodes. You get a machine >diameter which is log N, which is nice, but there are ways to do >that with a fixed number of NICs per node such as fat trees or the >Kautz/deBruijn family. Which is why Intel went to the "8 port switch in each node" sort of architecture with the iPSC/2. The iPSC/1 got icky with having to keep adding ethernet cards into each node. The iPSC/2 went to a packaging with each node on a single card, with two flavors of nodes, depending on whether it had a numeric/vector coprocessor or not. You could put 128 nodes in a single rack, as I recall. (iPSC => intel Personal Super Computer) >TMC had, among other luminaries, Richard Feynman to help work out >the routing software. > >-Larry James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From gebhardt at hrz.uni-marburg.de Thu May 24 01:19:03 2007 From: gebhardt at hrz.uni-marburg.de (Gebhardt Thomas) Date: Thu, 24 May 2007 10:19:03 +0200 Subject: [Beowulf] SATA(?) error locks up node In-Reply-To: References: <200705231114.00231.gebhardt@hrz.uni-marburg.de> Message-ID: <200705241019.04234.gebhardt@hrz.uni-marburg.de> Dear Mr. Hahn, > the logs show that a command times out, and defies recovery. I don't think > your chipset is the most common - is the SATA controller integrated, or > something like a Promise chip? The HT1000 is an integrated controller for USB, IDE and SATA. As far as I understand, it is the same chip as the Broadcom BCM5785. > do you have any guess about whether your disks are getting enough power? > it seems to be a fairly common occurrance for people to report this kind of > "stops working" bug to the list (linux-ide at vger.kernel.org), only later to > discover that the problem was a marginal power supply. 24 of the 57 nodes have an additional infiniband HA. If power were marginal I would expect that this subset of nodes had a higher error rate than the other nodes. But there seems to be no difference that is statistically significant. > > I also tried to reduce SATA bandwidth down to 150MB/s with a jumper at > > the disk. This does not help either. > > it wouldn't, unless you had a noise problem with the cable. it has been an advise from our hardware vendor. Eoin McHugh gave me a hint that our disks might have a firmware bug and there is an update available. (For whatever reason I affiliated our disks with Maxtor. So I hadn't found any firmware update on their website. But of course the disks are from Western Digital). This is the most promising trace I'm following now. Thanks for your advice! SY, Th. Gebhardt From deadline at eadline.org Thu May 24 07:43:50 2007 From: deadline at eadline.org (Douglas Eadline) Date: Thu, 24 May 2007 10:43:50 -0400 (EDT) Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> References: <001b01c79952$3a266030$0900a8c0@objection> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> Message-ID: <46342.192.168.1.1.1180017830.squirrel@mail.eadline.org> > At 10:52 AM 5/23/2007, Peter St. John wrote: >>But oh and Jim if you recall any papers about this I could read that >>would be "Jim" Dandy. > > I was working off memory, and the iPSC/1 and iPSC/2 manuals I have in > my office as a historical artifact. > > I seem to recall that if you google hypercube and intel, you'll turn > up some of the papers that were written early on. The guys who > started with the hypercube interconnect were at CalTech, as I recall, > and spun off to form a supercomputer company embodying that, which > Intel also adopted. nCUBE was a commercial hypercube machine. The first version used software routing, so if a CPU crashed, so did the whole machine. Version 2 had HW routing and after a hefty investment by Larry Ellison. It was destined to become a parallel database machine only it took a long time to get the Oracle working. As a result nCUBE sold into the HPC market against TMC for a while. I used to have an 8 processor board that plugged into a Sun 386i. The last I read they were making video on-demand servers. -- Doug > >>Peter >> >> >>On 5/23/07, Jim Lux >><James.P.Lux at jpl.nasa.gov> wrote: >>At 09:19 AM 5/22/2007, Peter St. John wrote: >>>A hypercube ( http://en.wikipedia.org/wiki/Hypercube) also gets you >>>exponential space; the max hops is the dimension (3 for a >>>3-dimensional cube) and the number of nodes is exp(base 2) of the >>>dimension (8 vertices on a cube). To do a tesseract (4-cube), which >>>looks like two cubes nested, you'd need 4 ports per node, 16 nodes, >>>32 cables, max hop 4. I've poked around and don't see a great 4 >>>ports per node solution; I like the suggestion of putting a router >>>on a motherboard. >> >>Mind you, this is what Intel started with on their iPSC/1 and iPSC/2 >>computers. The early ones had multiple NICs in the nodes, then, >>later, they had a 8 port (I think) router in each node. >> >>It's not clear that this saves anything over a simpler architecture >>(e.g. external switch with lots of ports in a crossbar) unless you >>can do circuit switched routing (so you don't have a one packet >>delay in the switch) AND your algorithm can take advantage of it. I >>spent quite some time in the late 80s trying to figure out clever >>ways to take advantage of a hypercube topology for a modeling >>application.. I'm sure there are algorithms which are a natural >>fit, but the ones I was using weren't. >> >> >>James Lux, P.E. >>Spacecraft Radio Frequency Subsystems Group >>Flight Communications Systems Section >>Jet Propulsion Laboratory, Mail Stop 161-213 >>4800 Oak Grove Drive >>Pasadena CA 91109 >>tel: (818)354-2075 >>fax: (818)393-6875 >> > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > > !DSPAM:46549197244853022792082! > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > > !DSPAM:46549197244853022792082! > -- Doug From peter.st.john at gmail.com Thu May 24 08:16:25 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Thu, 24 May 2007 11:16:25 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <6.2.3.4.2.20070523155721.0326fe20@mail.jpl.nasa.gov> References: <001b01c79952$3a266030$0900a8c0@objection> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> <4654B002.7040207@sicortex.com> <6.2.3.4.2.20070523155721.0326fe20@mail.jpl.nasa.gov> Message-ID: This led me to SUNMOS (OS for parallel processing, Sandia's alternative for the aforementioned intel PSC descendants) http://en.wikipedia.org/wiki/SUNMOS. Sandia's page looks like real progress for inertial containment sustainable thermonuclear fusion, which really wouuld be super duper cool, fuel your zeppelin with seawater, but I can't find latter-day references to SUNMOS. Anybody know what became of it? Thanks, Peter On 5/23/07, Jim Lux wrote: > > At 02:20 PM 5/23/2007, Larry Stewart wrote: > >Peter St. John wrote: > >>Mostly I was thinking of TMC (famous for the animation in Jurassic > >>Park), 1982-1994, from MIT, mostly acquired by Sun; so something > >>from CalTech maybe predating that, or competing with it, would be > >>very interesting. I'll look, thanks. > >>My only artifacts are DOS 3.2 and SVr4 manuals :-) > >>Peter > >> > >> > >The problem with hypercubes is that the number of NICs per node > >grows with the machine size. Want to double your machine size? > >Then add a NIC to every node you already have. The number of nics, > >or links, or cables, grows O(NlogN) for N nodes. You get a machine > >diameter which is log N, which is nice, but there are ways to do > >that with a fixed number of NICs per node such as fat trees or the > >Kautz/deBruijn family. > > > Which is why Intel went to the "8 port switch in each node" sort of > architecture with the iPSC/2. The iPSC/1 got icky with having to > keep adding ethernet cards into each node. The iPSC/2 went to a > packaging with each node on a single card, with two flavors of nodes, > depending on whether it had a numeric/vector coprocessor or not. You > could put 128 nodes in a single rack, as I recall. > > (iPSC => intel Personal Super Computer) > > > >TMC had, among other luminaries, Richard Feynman to help work out > >the routing software. > > > >-Larry > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From larry.stewart at sicortex.com Thu May 24 09:05:04 2007 From: larry.stewart at sicortex.com (Larry Stewart) Date: Thu, 24 May 2007 12:05:04 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <001b01c79952$3a266030$0900a8c0@objection> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> <4654B002.7040207@sicortex.com> <6.2.3.4.2.20070523155721.0326fe20@mail.jpl.nasa.gov> Message-ID: <4655B7B0.8020901@sicortex.com> Peter St. John wrote: > This led me to SUNMOS (OS for parallel processing, Sandia's > alternative for the aforementioned intel > PSC descendants) http://en.wikipedia.org/wiki/SUNMOS. Sandia's page > looks like real progress for inertial containment sustainable > thermonuclear fusion, which really wouuld be super duper cool, fuel > your zeppelin with seawater, but I can't find latter-day references to > SUNMOS. Anybody know what became of it I don't know what happened to SUNMOS, but the intellectual descendents of it are the microkernels like L4 and particularly the FASTOS project, see http://www.cs.unm.edu/~fastos/ There's a group who think that full function OSs like Linux are unnecessary or wasteful for clusters. The arguments include virtual memory being unneeded or slow and OS activity ("OS noise") limiting scaling of applications. As a consequence, you see things like BlueGene/L with a microkernel on the compute nodes and Linux on the I/O nodes. I don't necessarily believe the arguments, but I like tinkering with a new tricked out OS as well as the next guy. Just for fun, I tracked down who said "When you hear 'virtual', you should think 'slow'." -- it was Dave Clark of MIT. -Larry From peter.st.john at gmail.com Thu May 24 09:24:58 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Thu, 24 May 2007 12:24:58 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <4655B7B0.8020901@sicortex.com> References: <001b01c79952$3a266030$0900a8c0@objection> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> <4654B002.7040207@sicortex.com> <6.2.3.4.2.20070523155721.0326fe20@mail.jpl.nasa.gov> <4655B7B0.8020901@sicortex.com> Message-ID: Well I remember when people were talking about distributed computing and they said, "well, this would be trivial if you could just boot unix on all the nodes, haha". So I'm still drawn to the idea of something small, like embedded ROMDOS, on a compute node. But RAM is free and IBM put linux on a wristwatch (honkin' big power suplly, by the standards of wrist watches, but it ran X :-) Peter On 5/24/07, Larry Stewart wrote: > > Peter St. John wrote: > > > This led me to SUNMOS (OS for parallel processing, Sandia's > > alternative for the aforementioned intel > > PSC descendants) http://en.wikipedia.org/wiki/SUNMOS. Sandia's page > > looks like real progress for inertial containment sustainable > > thermonuclear fusion, which really wouuld be super duper cool, fuel > > your zeppelin with seawater, but I can't find latter-day references to > > SUNMOS. Anybody know what became of it > > I don't know what happened to SUNMOS, but the intellectual descendents > of it are the microkernels like L4 > and particularly the FASTOS project, see http://www.cs.unm.edu/~fastos/ > > There's a group who think that full function OSs like Linux are > unnecessary or wasteful for clusters. The arguments > include virtual memory being unneeded or slow and OS activity ("OS > noise") limiting scaling of applications. > > As a consequence, you see things like BlueGene/L with a microkernel on > the compute nodes and Linux on the I/O nodes. > > I don't necessarily believe the arguments, but I like tinkering with a > new tricked out OS as well > as the next guy. > > Just for fun, I tracked down who said "When you hear 'virtual', you > should think 'slow'." -- it was Dave Clark of MIT. > > -Larry > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From 3lucid at gmail.com Thu May 24 11:06:26 2007 From: 3lucid at gmail.com (Kyle Spaans) Date: Thu, 24 May 2007 14:06:26 -0400 Subject: [Beowulf] Parallel Programming with MPI, anyone read the book? Message-ID: <5a1205b30705241106x76fb7360n71a6b9ac701b1d4b@mail.gmail.com> I've found this Mogran Kaufmann Publishers, Inc. book by Peter S. Pacheco in my University's Computational Mathematics Club. It's (c) 1997, but I figure it should still be mostly relevant. Has anyone else read this, and/or have some comments on it's validity/applicability? So far, the only thing I've been disappointed with is this sole reference to Beowulfery that I could find in Chapter 2, under the headings Distributed Memory MIMD, and "Bus-based Networks" where they say: "The last, and probably the simplest, network is a bus. A cluster of workstations on an ethernet provides a popular example. Of course, busses tend to be fairly slow, and even worse, busses, especially ethernets, soon become saturated if there are more than a few nodes or more than absolutely minimal communication. Thus, although they are very useful for program development, currently available bus-based system don't show much promise for very large-scale applications." This paragraph sounds a little dated to me. Isn't Ethernet more prevalent than that? From lindahl at pbm.com Thu May 24 11:27:32 2007 From: lindahl at pbm.com (Greg Lindahl) Date: Thu, 24 May 2007 11:27:32 -0700 Subject: [Beowulf] Parallel Programming with MPI, anyone read the book? In-Reply-To: <5a1205b30705241106x76fb7360n71a6b9ac701b1d4b@mail.gmail.com> References: <5a1205b30705241106x76fb7360n71a6b9ac701b1d4b@mail.gmail.com> Message-ID: <20070524182732.GA9748@bx9.net> On Thu, May 24, 2007 at 02:06:26PM -0400, Kyle Spaans wrote: > "The last, and probably the simplest, network is a bus. A cluster of > workstations on an ethernet provides a popular example. This is a blast from the past: in the good old days, ethernet networks were bridged/hubbed together instead of putting each host on its own switch port. In 1997 this was already often not true. Today, nobody would build a cluster without a switch. -- greg From tmalas at ee.bilkent.edu.tr Thu May 24 08:05:03 2007 From: tmalas at ee.bilkent.edu.tr (Tahir Malas) Date: Thu, 24 May 2007 18:05:03 +0300 Subject: [Beowulf] Sharing an array in an MPI program? Message-ID: <005d01c79e14$e83cc620$d80cb38b@bs> Hi all, We have an 8-node cluster of SMP nodes, which have dual-quad core processors. The network is Infiniband. Each process in our parallel FORTRAN 90 program holds an identical array that is used in all parts of the program. However, when the size of the problem gets larger and larger, this memory cost has started to become a memory bottleneck for us. If all 8 processes in the same node could just read from the same memory instead of holding their arrays, we would have significant memory gain. This would be natural in a node if were to use OpenMP, but I wonder whether this is somehow possible with only MPI? Distributing this array among the processes is too expensive for us. We also know that passing to hybrid programming (MPI + OpenMP) is a choice, but we look for simpler choices for the time being. Thanks, Tahir Malas Bilkent University Electrical and Electronics Engineering Department Phone: +90 312 290 1385 From rbbrigh at sandia.gov Thu May 24 09:14:46 2007 From: rbbrigh at sandia.gov (Ron Brightwell) Date: Thu, 24 May 2007 10:14:46 -0600 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: References: <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> <4654B002.7040207@sicortex.com> <6.2.3.4.2.20070523155721.0326fe20@mail.jpl.nasa.gov> Message-ID: <20070524161446.GB15756@ratbert.sandia.gov> > This led me to SUNMOS (OS for parallel processing, Sandia's > alternative for the aforementioned intel PSC descendants) > http://en.wikipedia.org/wiki/SUNMOS. Sandia's page looks like real > progress for inertial containment sustainable thermonuclear fusion, > which really wouuld be super duper cool, fuel your zeppelin with > seawater, but I can't find latter-day references to SUNMOS. Anybody > know what became of it? It evolved into the Puma/Cougar lightweight kernel on the ASCI/Red machine and then into the Catamount lightweight kernel that currently runs on the Cray XT3/4 machine. -Ron From lindahl at pbm.com Thu May 24 14:02:29 2007 From: lindahl at pbm.com (Greg Lindahl) Date: Thu, 24 May 2007 14:02:29 -0700 Subject: [Beowulf] Sharing an array in an MPI program? In-Reply-To: <005d01c79e14$e83cc620$d80cb38b@bs> References: <005d01c79e14$e83cc620$d80cb38b@bs> Message-ID: <20070524210229.GA13326@bx9.net> On Thu, May 24, 2007 at 06:05:03PM +0300, Tahir Malas wrote: > Each process in our parallel FORTRAN > 90 program holds an identical array that is used in all parts of the > program. However, when the size of the problem gets larger and larger, this > memory cost has started to become a memory bottleneck for us. This is actually a fairly frequent question. Some people use hybrid MPI+OpenMP in this situation. However, another way to attack it is to create a shared memory segment and put this array into it. Or, alternately, you can mmap() a file into all the processes with this data. -- greg From peter.st.john at gmail.com Thu May 24 15:05:43 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Thu, 24 May 2007 18:05:43 -0400 Subject: [Beowulf] Sharing an array in an MPI program? In-Reply-To: <20070524210229.GA13326@bx9.net> References: <005d01c79e14$e83cc620$d80cb38b@bs> <20070524210229.GA13326@bx9.net> Message-ID: Greg, would it be feasible to compile the array into a DLL? Peter On 5/24/07, Greg Lindahl wrote: > > On Thu, May 24, 2007 at 06:05:03PM +0300, Tahir Malas wrote: > > > Each process in our parallel FORTRAN > > 90 program holds an identical array that is used in all parts of the > > program. However, when the size of the problem gets larger and larger, > this > > memory cost has started to become a memory bottleneck for us. > > This is actually a fairly frequent question. > > Some people use hybrid MPI+OpenMP in this situation. However, another > way to attack it is to create a shared memory segment and put this > array into it. Or, alternately, you can mmap() a file into all the > processes with this data. > > -- greg > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lindahl at pbm.com Thu May 24 16:16:22 2007 From: lindahl at pbm.com (Greg Lindahl) Date: Thu, 24 May 2007 16:16:22 -0700 Subject: [Beowulf] Sharing an array in an MPI program? In-Reply-To: References: <005d01c79e14$e83cc620$d80cb38b@bs> <20070524210229.GA13326@bx9.net> Message-ID: <20070524231152.GA7420@bx9.net> On Thu, May 24, 2007 at 06:05:43PM -0400, Peter St. John wrote: > would it be feasible to compile the array into a DLL? In Linux, DLLs don't really do that. I think you'd have to coerce it into being read-only data, and that's simply not simple. -- greg From peter.st.john at gmail.com Fri May 25 07:35:10 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Fri, 25 May 2007 10:35:10 -0400 Subject: [Beowulf] Sharing an array in an MPI program? In-Reply-To: <20070524231152.GA7420@bx9.net> References: <005d01c79e14$e83cc620$d80cb38b@bs> <20070524210229.GA13326@bx9.net> <20070524231152.GA7420@bx9.net> Message-ID: yeah I'm not picturing doing that without introducing the overhead of a wrapper lookup function separate from your application. Peter On 5/24/07, Greg Lindahl wrote: > > On Thu, May 24, 2007 at 06:05:43PM -0400, Peter St. John wrote: > > > would it be feasible to compile the array into a DLL? > > In Linux, DLLs don't really do that. I think you'd have to coerce it > into being read-only data, and that's simply not simple. > > -- greg > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter at usfca.edu Thu May 24 16:27:24 2007 From: peter at usfca.edu (Peter Pacheco) Date: Thu, 24 May 2007 16:27:24 -0700 Subject: [Beowulf] Re: Parallel Programming with MPI, anyone read the book? In-Reply-To: <200705241900.l4OJ05YX012140@bluewest.scyld.com> References: <200705241900.l4OJ05YX012140@bluewest.scyld.com> Message-ID: <22F3AE8E-2598-47F0-A8B5-5072E0794335@usfca.edu> Hi Kyle, I seem to recall reading a book of that title. If my memory serves, it was an excellent book. ;) Alas, pretty much anything written about machines of 10 years ago is going to be very out of date: when I wrote the book, the teraflops barrier had yet to be broken. On the other hand I think the material on MPI is still useful. Best wishes, Peter Pacheco > Date: Thu, 24 May 2007 14:06:26 -0400 > From: "Kyle Spaans" <3lucid at gmail.com> > Subject: [Beowulf] Parallel Programming with MPI, anyone read the > book? > To: "Beowulf List" > > I've found this Mogran Kaufmann Publishers, Inc. book by Peter S. > Pacheco in my University's Computational Mathematics Club. It's (c) > 1997, but I figure it should still be mostly relevant. > Has anyone else read this, and/or have some comments on it's > validity/applicability? > > So far, the only thing I've been disappointed with is this sole > reference to Beowulfery that I could find in Chapter 2, under the > headings Distributed Memory MIMD, and "Bus-based Networks" where they > say: > > "The last, and probably the simplest, network is a bus. A cluster of > workstations on an ethernet provides a popular example. Of course, > busses tend to be fairly slow, and even worse, busses, especially > ethernets, soon become saturated if there are more than a few nodes or > more than absolutely minimal communication. Thus, although they are > very useful for program development, currently available bus-based > system don't show much promise for very large-scale applications." > > > This paragraph sounds a little dated to me. Isn't Ethernet more > prevalent than that? From cluster.security at gmail.com Thu May 24 22:17:00 2007 From: cluster.security at gmail.com (Cluster Security) Date: Fri, 25 May 2007 01:17:00 -0400 Subject: [Beowulf] Survey on Supercomputer Cluster Security Message-ID: <6f974da50705242217q3d4b01f1sb73c7c2bf020c1d8@mail.gmail.com> To Cluster System Administrators: Our University has done some classified DoD work on various Beowulf clusters. As a result, we have gotten interested in the questions of securing supercomputer clusters. In particular, we are especially interested in better understanding the nature of the threats against supercomputer clusters, and the extent to which security measures are implemented . It would help us greatly if you would answer a few questions on this subject. Feel free to not answer any question that you do not wish to answer. Just select the no answer selection. A complete list of questions and possible answers are listed below in text form. If there are several system administrators for your cluster(s), please ensure that your group submits only one survey per cluster. You can answer the questions interactively at our website (our preferred method), by e-mail, by fax, and by regular mail. To answer on the web, please go to http://www.cs.umaine.edu/~markov/clustersurvey/survey.html and login with login ClusterSurvey password S3cur3Qu3st The login is just to keep random visitors to the website from filling out the questionnaire. The web questionnaire will only be available until June 1, 2007. There are two options when if you choose to by e-mail, fax, or regular mail. First, you can download a PDF version of the questionnaire from a link on the webpage referenced above. This is an interactive PDF file that permits you to answer the questions in the form providing you are using a new enough version of Adobe Acrobat Reader (Version 8 recommended). You can either print out the results and fax them or mail them, or you can e-mail the file or just the answers by hitting the e-mail button in the form. Alternatively, you can answer the questions on the form below and either e-mail it back or print out the results and fax or e-mail them back. If you wish to fax your answers, please fax them to 207-866-3050, which is a secure fax. We will collect whatever data we receive and organize the results. These results will be available on the web using the URL above starting July 15, 2007 in case you are interested. All data will be aggregated and in no way will we identify any respondents -- my goal is to have some general numbers and percentages that can help us better understand who is trying to crack into supercomputers and why. If you know of other people who would be interested in the results or would be interested in providing data, please feel free to send them a copy of this letter. Sincerely yours, George Markowsky, Professor Department of Computer Science 5752 Neville Hall University of Maine Orono, ME 04469-5752 QUESTIONNAIRE 1. How frequently are your supercomputer clusters attacked relative to any desktops that might be in your laboratories? More Frequently About the Same Frequency Less Frequently No Answer 2. How sophisticated are the attacks against your clusters compared to the attacks against any desktops that might be in your laboratories? More Sophisticated About the Same Level of Sophistication Less Sophisticated No Answer 3. Are there any IP addresses that regularly try to break into your cluster? Yes No Not Sure No Answer 4. Has anyone ever tried a man-in-the-middle type of attack against any of your clusters? Yes No Not Sure No Answer 5. Have you ever been attacked from foreign IP addresses? Yes No Not Sure No Answer 6. Have your clusters ever been attacked by foreign interests? Yes No Not Sure No Answer 7. Has anyone ever tried a physical approach to either disrupt a computation or to steal data? Yes No Not Sure No Answer 8. Has anyone ever tried to bribe or otherwise co-opt one of the cluster staff into helping with compromising the security? Yes No Not Sure No Answer 9. How many times has security been breached on one of your supercomputer clusters over the past three years that resulted in either downtime or lost data? 11 or more 6-10 2-5 1 0 Not Sure No Answer 10. Does your center have a person whose primary responsibility is cluster security? Yes No Not Sure No Answer 11. Do you run an intrusion detection system on your clusters? Yes, on all No, not on any Mixed, on some and not on others Not Sure No Answer 12. How often do you check for rootkits? Multiple Times a Day Daily Weekly Monthly Annually Not at all Not Sure No Answer 13. How often do you run backups on your clusters? Multiple Times a Day Daily Weekly Monthly Annually Not at all Not Sure No Answer -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.perea at gmail.com Fri May 25 00:05:14 2007 From: jaime.perea at gmail.com (Jaime Perea) Date: Fri, 25 May 2007 09:05:14 +0200 Subject: [Beowulf] Sharing an array in an MPI program? In-Reply-To: <20070524231152.GA7420@bx9.net> References: <005d01c79e14$e83cc620$d80cb38b@bs> <20070524231152.GA7420@bx9.net> Message-ID: <200705250905.14512.jaime.perea@gmail.com> One alternative that I like and it integrates well with mpi is the global arrays toolkit http://www.emsl.pnl.gov/docs/global/ -- Jaime D. Perea Duarte. Linux registered user #10472 Dep. Astrofisica Extragalactica. Instituto de Astrofisica de Andalucia (CSIC) Apdo. 3004, 18080 Granada, Spain. From victux at gmail.com Fri May 25 08:04:44 2007 From: victux at gmail.com (Victor Gomez) Date: Fri, 25 May 2007 10:04:44 -0500 Subject: [Beowulf] Special queue PBS Message-ID: <48f9d1380705250804x7ef148f4nebd7672c7c3a446d@mail.gmail.com> Hi, I have a little problem... My cluster (12 nodes) have four special nodes, How do you create a queue thats permits run only in the special nodes??? I use Torque/ PBS Thanks!!! -------------- next part -------------- An HTML attachment was scrubbed... URL: From kyron at neuralbs.com Sun May 27 22:09:24 2007 From: kyron at neuralbs.com (Eric Thibodeau) Date: Mon, 28 May 2007 01:09:24 -0400 Subject: [Beowulf] Sharing an array in an MPI program? In-Reply-To: <005d01c79e14$e83cc620$d80cb38b@bs> References: <005d01c79e14$e83cc620$d80cb38b@bs> Message-ID: <200705280109.24237.kyron@neuralbs.com> You might want to take a look at the openMPI implementations. They have all sorts of neat tricks wrt detecting local Vs remote memory access and there might be native MPI ways of optimising your memory usage... I am not talking from experience but stating that I wouldn't be surprised if they had already implemented this really "nice to have" feature in their latest MPI release. Le jeudi 24 mai 2007 11:05, Tahir Malas a ?crit?: > Hi all, > We have an 8-node cluster of SMP nodes, which have dual-quad core > processors. The network is Infiniband. Each process in our parallel FORTRAN > 90 program holds an identical array that is used in all parts of the > program. However, when the size of the problem gets larger and larger, this > memory cost has started to become a memory bottleneck for us. > If all 8 processes in the same node could just read from the same memory > instead of holding their arrays, we would have significant memory gain. This > would be natural in a node if were to use OpenMP, but I wonder whether this > is somehow possible with only MPI? Distributing this array among the > processes is too expensive for us. We also know that passing to hybrid > programming (MPI + OpenMP) is a choice, but we look for simpler choices for > the time being. > Thanks, > Tahir Malas > Bilkent University > Electrical and Electronics Engineering Department > Phone: +90 312 290 1385 > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > > -- Eric Thibodeau Neural Bucket Solutions Inc. T. (514) 736-1436 C. (514) 710-0517 From Bogdan.Costescu at iwr.uni-heidelberg.de Tue May 29 05:37:48 2007 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Tue, 29 May 2007 14:37:48 +0200 (CEST) Subject: [Beowulf] Sharing an array in an MPI program? In-Reply-To: <200705250905.14512.jaime.perea@gmail.com> References: <005d01c79e14$e83cc620$d80cb38b@bs> <20070524231152.GA7420@bx9.net> <200705250905.14512.jaime.perea@gmail.com> Message-ID: On Fri, 25 May 2007, Jaime Perea wrote: > One alternative that I like and it integrates well with mpi > is the global arrays toolkit > > http://www.emsl.pnl.gov/docs/global/ I disagree with the "integrates well with mpi" part of the statement. Global Arrays works on top of a communication layer called ARMCI which only uses an MPI implementation for setting up the job and tearing it apart (only calling MPI_Init and MPI_Finalize from what I remember), the communication itself is done directly via lower level protocols (TCP, GM, etc.) - I know because some years ago I wanted to use Global Arrays on a SCore cluster and discovered that I have to port ARMCI to PM (the low level communication protocol of SCore)... This sometimes creates problems due to limitations imposed by the low level protocols (for example, the MPI implementation would open a GM port while ARMCI would need to open a second one, so the per-node limit of GM ports would be reached much faster). [ I don't want the above statement to sound negative towards Global Arrays or ARMCI, my intention was only to bring to discussion a fact that was missing. ] -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu at IWR.Uni-Heidelberg.De From laytonjb at charter.net Tue May 29 08:49:27 2007 From: laytonjb at charter.net (laytonjb at charter.net) Date: Tue, 29 May 2007 8:49:27 -0700 Subject: [Beowulf] HDTV video file sizes Message-ID: <1088659434.1180453767761.JavaMail.root@fepweb03> Good morning, I was doing some thinking over the weekend (while cooking ribs on the grill :) ). Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for now and perhaps 1080. I'm looking for the file size if you store the whole thing in a single file. Thanks! Jeff "I won't give up my sauce recipe" Layton From rgb at phy.duke.edu Tue May 29 09:28:03 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 29 May 2007 12:28:03 -0400 (EDT) Subject: [Beowulf] HDTV video file sizes In-Reply-To: <1088659434.1180453767761.JavaMail.root@fepweb03> References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: On Tue, 29 May 2007, laytonjb at charter.net wrote: > Good morning, > > I was doing some thinking over the weekend (while cooking ribs on the grill :) ). > Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for > now and perhaps 1080. I'm looking for the file size if you store the whole thing > in a single file. Well, I didn't have any idea ten seconds ago, but now I know that one hour should be roughly 3 GB. (So a movie should be 5-6 GB.) > > Thanks! > > Jeff "I won't give up my sauce recipe" Layton > You're welcome! Rob "I'll trade you a lesson in Googlifying for your sauce recipe" Brown -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From hahn at mcmaster.ca Tue May 29 09:57:10 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 29 May 2007 12:57:10 -0400 (EDT) Subject: [Beowulf] HDTV video file sizes In-Reply-To: References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: >> Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for >> now and perhaps 1080. I'm looking for the file size if you store the whole >> thing >> in a single file. > > Well, I didn't have any idea ten seconds ago, but now I know that one > hour should be roughly 3 GB. (So a movie should be 5-6 GB.) hmm, that's normal DVD, isn't it? the newfangled flavors (BD, etc) seem to be 5-10 higher capacity. compressed data rates appear to be 20-50 Mbps (lower than 20 probably doesn't count as HD. funny how all the HD stuff seems very fuzzy ;) From atchley at myri.com Tue May 29 10:07:19 2007 From: atchley at myri.com (Scott Atchley) Date: Tue, 29 May 2007 13:07:19 -0400 Subject: [Beowulf] HDTV video file sizes In-Reply-To: References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: <469B15F1-7FBD-4F88-9882-2DA7EFF10ACC@myri.com> On May 29, 2007, at 12:28 PM, Robert G. Brown wrote: > On Tue, 29 May 2007, laytonjb at charter.net wrote: > >> Good morning, >> >> I was doing some thinking over the weekend (while cooking ribs on >> the grill :) ). >> Does anyone know who much data 1 hr. of HDTV produces? Let's try >> 720 for >> now and perhaps 1080. I'm looking for the file size if you store >> the whole thing >> in a single file. > > Well, I didn't have any idea ten seconds ago, but now I know that one > hour should be roughly 3 GB. (So a movie should be 5-6 GB.) ATSC allows rates up to 20 Mb/s, so 1 hour would roughly by 9 GB (base 10). Raw data from a mini-DV camera uses 25 Mb/s, but it is not compressed. Scott From juliarachelhoward at googlemail.com Fri May 25 12:14:35 2007 From: juliarachelhoward at googlemail.com (julia howard) Date: Fri, 25 May 2007 20:14:35 +0100 Subject: [Beowulf] cold cathode fluorescent backlighting Message-ID: <6394934c0705251214r4ec157f6o882f391ec8c0e7d9@mail.gmail.com> Not having an Electronics background my questions may seem naive. However as the following issues give me concern I should very much appreciate it if they could be sorted out with some reliable knowledge. Firstly, do Liquid Crystal Display TV or computer monitors emit any ionizing radiation? I f the LCD screenbecomes damaged through the inadvertent use of the wrong typed of cleaner or by using any abrasive cloth could it expose one to increased ionizing radiation? Regarding the cold cathode fluorescent backlights of monitors I read in the Wikipedia encyclopedia under Cold Cathode that some ccfls use a source of beta radiation to start the ionization process. I f this is the case then could LCD televisions expose us to beta or gamma radiation. I should like to replace my CRT TV with a LCD TV, but the thought of a radioactive material being present causes me much anxiety. Looing forward to your informed response, Julia Howard email: juliarachel_howard at yahoo.co.uk -------------- next part -------------- An HTML attachment was scrubbed... URL: From lucas.schnorr at imag.fr Fri May 25 13:52:23 2007 From: lucas.schnorr at imag.fr (Lucas Schnorr) Date: Fri, 25 May 2007 22:52:23 +0200 Subject: [Beowulf] SBAC-PAD 2007: Call For Workshop Proposal (Deadline Extended) In-Reply-To: References: Message-ID: Our apologies if you receive multiple copies of this message. Please distribute this Call For Workshop to those who might be interested. FYI: SBAC-PAD is currently classified by CAPES (the postgraduate educational agency of the Brazilian government) as a 'Qualis International A' Conference. ========================================================= | SBAC-PAD 2007 | | 19th International Symposium on Computer Architecture | | and High Performance Computing | | | | CALL FOR WORKSHOP PROPOSALS | | (Extended Deadline: June 13, 2007) | | | | Gramado, Brazil - October 24-27, 2007 | | http://www.sbc.org.br/sbac/2007/ | ========================================================= General Information =================== Proposals for workshops are invited for affiliation with SBAC-PAD 2007. The purpose of these workshops is to create a forum for discussing themes that not only are associated with the conference topics, but also promote the integration with other research areas. The workshops may also serve as a platform for presenting novel ideas in a less formal and possibly more focused way than the symposium itself. As such, they also offer a good opportunity for young researchers to present their work and to obtain feedback from an interested community. The format of each workshop is to be determined by the organizers, but it is expected that they contain significant time for general discussion. Workshops may be held in Portuguese or English and may have a duration of half day or full day. The number of attendees per workshop is subject to the availability of meeting rooms. All workshop attendees must be registered to SBAC-PAD. The SBAC-PAD organizers will be responsible for the logistics of the workshops, including projection and computational resources. The meeting rooms capacity range from 60 to 200 persons. The workshop organizers will be responsible for publishing the electronic proceedings in a workshop Web page that contains all accepted paper in PDF format. The SBAC-PAD organization will also publish the electronic proceedings in CD. Any additional costs, such as travel and lodging for invited speakers are workshop organizers responsibility. The workshops will be held in Gramado, RS, Brazil. Proposal Submission =================== Researchers and practitioners who are interested in submitting proposals must send mail to the address . Please look forward for a confirmation message, which assures you that your submission has been received. The submission message should have the following format: * Message subject: "Workshop Proposal to SBAC-PAD" * Message Body: link to the proposal Web page. * Message signature: proponent There is not a format for the Web page, but we recommend that it contains: a) Proposed workshop denomination. b) Description of workshop focus, justifying its relevance. c) Workshop history, if it is not the first edition. d) Topics of interest. e) Workshop format, including duration (half/full day). f) Expected audience, including number of attendees. g) Submission process. h) Program committee. i) Equipment necessary. j) Contact information. k) Link to coodinator CV. Important Dates =============== Submission deadline (EXTENDED): June 13, 2007 Selection notification: June 27, 2007 Organizing committee ==================== Adenauer Yamin (UCPEL, Brazil) Gerson Cavalheiro (UFPEL, Brazil) Nicolas Maillard (UFRGS, Brazil) Philippe Navaux (UFRGS, Brazil) Vinod Rebello (UFF, Brazil) Wagner Meira Jr. (UFMG, Brazil) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rokrau at yahoo.com Fri May 25 17:19:35 2007 From: rokrau at yahoo.com (Roland Krause) Date: Fri, 25 May 2007 17:19:35 -0700 (PDT) Subject: [Beowulf] Re: Parallel Programming with MPI, anyone read the book? In-Reply-To: <22F3AE8E-2598-47F0-A8B5-5072E0794335@usfca.edu> Message-ID: <855203.95861.qm@web81101.mail.mud.yahoo.com> So Peter, this summer, you, your laptop, on the beach in Santa Cruz and a new edition should be out in no time ? Greetings :-) Roland --- Peter Pacheco wrote: > Hi Kyle, > > I seem to recall reading a book of that title. If my memory serves, > > it was an excellent book. ;) Alas, pretty much anything written > about machines of 10 years ago is going to be very out of date: > when I wrote the book, the teraflops barrier had yet to be broken. > > On the other hand I think the material on MPI is still useful. > > Best wishes, > Peter Pacheco > From marc at klingon.uab.cat Fri May 25 18:51:50 2007 From: marc at klingon.uab.cat (Marc Noguera Julian) Date: Sat, 26 May 2007 03:51:50 +0200 Subject: [Beowulf] Special queue PBS In-Reply-To: <200705251900.l4PJ0AbO024133@bluewest.scyld.com> References: <200705251900.l4PJ0AbO024133@bluewest.scyld.com> Message-ID: <20070526014118.M79593@klingon.uab.cat> Hi Victor, You can use acl_hosts. In qmgr : set queue your_queue acl_hosts="special.node01,special.node02,special.node.03" while setting acl_hosts_enable to false These will enable queue to node mapping, see: http://www.clusterresources.com/products/mwm/docs/12.1nodelocation.shtml#open another option you have, which I prefer, is to set a special property to the nodes that are special. Again, using qmgr: set node special.node01 properties = special . . . set node special.nodeNN properties = special Once that is done, do, in qmgr: set queue special.queue resources_default.neednodes= special where special.queue is the queue you want to be used. Note that this will only work in later versions of Torque/OpenPBS (2.1.6 and later but I am not sure about this). Hope this helps. BTW, are you subscribed to the torque/PBS userlist? I guess it would be useful for you. Marc On Fri, 25 May 2007 12:00:34 -0700, beowulf-request wrote > From: "Victor Gomez" > Subject: [Beowulf] Special queue PBS > To: beowulf at beowulf.org > Message-ID: > <48f9d1380705250804x7ef148f4nebd7672c7c3a446d at mail.gmail.com> > Content-Type: text/plain; charset="iso-8859-1" > > Hi, > > I have a little problem... > > My cluster (12 nodes) have four special nodes, > > How do you create a queue thats permits run only in the special nodes??? > > I use Torque/ PBS > > Thanks!!! > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://www.scyld.com/pipermail/beowulf/attachments/20070525/cbcc420e/attachment-0001.html > > ------------------------------ > > _______________________________________________ > Beowulf mailing list > Beowulf at beowulf.org > http://www.beowulf.org/mailman/listinfo/beowulf > > End of Beowulf Digest, Vol 39, Issue 35 > *************************************** ------------------------------------------------------ Marc Noguera i Julian, PhD T?cnic de suport a la recerca Despatx C7-149. Edifici Cn. Campus UAB. Bellaterra 08193. Barcelona email: marc_at_klingon.uab.es web: http://klingon.uab.es/marc Tlf/Phone: 00 34 935812173 ------------------------------------------------------- From ruhollah.mb at gmail.com Sun May 27 01:47:54 2007 From: ruhollah.mb at gmail.com (Ruhollah Moussavi Baygi ) Date: Sun, 27 May 2007 12:17:54 +0330 Subject: [Beowulf] ssh connection problem Message-ID: <1bef2ce30705270147k430800b5x303e56410aba640b@mail.gmail.com> Hi everybody at Beowulf, I have a serious problem with ssh connection to our cluster. Every hint/help/suggestion, which can help me to solve it, is highly appreciated. Most of the time, when users want to connect and run their programs from their own PCs, the ssh connection failed, especially during transfer files from/to head-node. Our user's PCs are mainly WindowsXP, so they use packages like SSH Secure Shell for connection and file transfer, or Putty for connection and WinSCP for file transfer. The error massage is as follows: 'Disconnecting: Corrupted MAC on input' or 'Disconnecting: bad packet length...', followed by a long integer. This problem has practically made our cluster unusable. So, I would be thankful for any coming advice. -- Best, Ruhollah Moussavi Baygi -------------- next part -------------- An HTML attachment was scrubbed... URL: From camilo.hernandez at gmail.com Mon May 28 09:43:19 2007 From: camilo.hernandez at gmail.com (Juan Camilo Hernandez) Date: Mon, 28 May 2007 11:43:19 -0500 Subject: [Beowulf] Beowulf on Demand Message-ID: <4d2c60b30705280943wf36acdg9c9c1d006e7f4426@mail.gmail.com> Hi list we are starting a project about an assembly of a Beowilf cluster between several research groups at the University where I am studying, We have had some prblems, but mainly for the quantity of money that every group have to contribute for administrative and manteniance cost. The idea that we have is that every gruop contribute with a quantity of money depending of their use of the flops in the cluster, someting as "computacion bajo demanda" I would like to ask you: Do you think is this one, the best option to mesure the cluster user use to be able to proced to charge to every user? Could you let me knoe which software can I use to be able to carry on this kinf of control? Do you have any suggest or recommendation? Thak you very much. Best regards. -- Juan Camilo Hernandez D. Investigador Asistente GIGAX - http://www.gigax.org From supercomputer at gmail.com Mon May 28 11:28:28 2007 From: supercomputer at gmail.com (Chris Vaughan) Date: Mon, 28 May 2007 19:28:28 +0100 Subject: [Beowulf] Special queue PBS In-Reply-To: <48f9d1380705250804x7ef148f4nebd7672c7c3a446d@mail.gmail.com> References: <48f9d1380705250804x7ef148f4nebd7672c7c3a446d@mail.gmail.com> Message-ID: <216ee070705281128s2de400e3n7531946fe94b6c3c@mail.gmail.com> Hi Victor, Try Moab or Maui http://www.clusterresources.com/products/maui/docs/a.fparameters.shtml classcfg and node features is what you should be looking at. If you need some help configuring it you can email me off the list. Regards, On 5/25/07, Victor Gomez wrote: > Hi, > > I have a little problem... > > My cluster (12 nodes) have four special nodes, > > How do you create a queue thats permits run only in the special nodes??? > > > I use Torque/ PBS > > > Thanks!!! > > > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- ------------------------------ Christopher Vaughan From fahadsaeed11 at hotmail.com Mon May 28 17:46:38 2007 From: fahadsaeed11 at hotmail.com (fahad saeed) Date: Tue, 29 May 2007 00:46:38 +0000 Subject: [Beowulf] mpirun on node problem Message-ID: An HTML attachment was scrubbed... URL: From ruhollah.mb at gmail.com Mon May 28 23:20:15 2007 From: ruhollah.mb at gmail.com (Ruhollah Moussavi Baygi ) Date: Tue, 29 May 2007 09:50:15 +0330 Subject: [Beowulf] 2nd request for ssh connection problem!! Message-ID: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> -------------------Dear all at Beowulf, As I mentioned in my previous email, I have a problem with ssh connection to our Beowulf cluster. I did put an email in Beowulf mailing list group, but with no answer. It is hard to believe that nobody knows anything about this problem, so, I sent this email for the second time. Following please find the content of my previous email. I WOULD BE GRATEFUL FOR ANY COMING ANSWER! Best, Ruhollah Moussavi Baygi------------------- Hi everybody at Beowulf, I have a serious problem with ssh connection to our cluster. Every hint/help/suggestion, which can help me to solve it, is highly appreciated. Most of the time, when users want to connect and run their programs from their own PCs, the ssh connection failed, especially during transfer files from/to head-node. Our user's PCs are mainly WindowsXP, so they use packages like SSH Secure Shell for connection and file transfer, or Putty for connection and WinSCP for file transfer. The error massage is as follows: ' Disconnecting: Corrupted MAC on input' or ' Disconnecting: bad packet length...', followed by a long integer. This problem has practically made our cluster unusable. So, I would be thankful for any coming advice. -- Best, Ruhollah Moussavi Baygi -- Best, Ruhollah Moussavi Baygi -------------- next part -------------- An HTML attachment was scrubbed... URL: From joelja at bogus.com Tue May 29 09:46:16 2007 From: joelja at bogus.com (Joel Jaeggli) Date: Tue, 29 May 2007 09:46:16 -0700 Subject: [Beowulf] HDTV video file sizes In-Reply-To: <1088659434.1180453767761.JavaMail.root@fepweb03> References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: <465C58D8.9000009@bogus.com> laytonjb at charter.net wrote: > Good morning, > > I was doing some thinking over the weekend (while cooking ribs on the grill :) ). > Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for > now and perhaps 1080. I'm looking for the file size if you store the whole thing > in a single file. At cable tv rates (~19Mb/s) about 8.5GB an hour, as broadcast 720p signals are progressive and 1080i signals are interlaced... The single link-dvi-interface between the decoder and the display on the other hand is around 3.7Gb/s . > Thanks! > > Jeff "I won't give up my sauce recipe" Layton > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From James.P.Lux at jpl.nasa.gov Tue May 29 10:14:27 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 29 May 2007 10:14:27 -0700 Subject: [Beowulf] HDTV video file sizes In-Reply-To: <1088659434.1180453767761.JavaMail.root@fepweb03> References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: <6.2.3.4.2.20070529095304.0322c328@mail.jpl.nasa.gov> At 08:49 AM 5/29/2007, laytonjb at charter.net wrote: >Good morning, > >I was doing some thinking over the weekend (while cooking ribs on >the grill :) ). >Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for >now and perhaps 1080. I'm looking for the file size if you store the >whole thing >in a single file. Are you asking about "as generated in the studio" or "as recorded" or "as broadcast" the raw data rate is >1 Gbps (142.18 Mb/s for NSTC sampled at 14.318 Ms/s up to 1.486 Gbps for SMTPE 292M sampled at 74.25 Ms/s) There's several compression/redundancy removal steps in the chain, and different HD broadcast media (over the air in US (ATSC), over the air in Europe (DVB-T), direct broadcast satellite (DVB-S, and others) , cable) use different bit rates, and different compression schemes. And, of course, the DVD (including the new BluRay and HD-DVD) have their own encodings as well. In the US, HD is broadcast over the air in a 6MHz wide channel at between 19-20 Mbps (3 bits/symbol). However, that 20 Mbps stream can be divvied up in lots of ways: 1 really HD channel, 5 SD channels, 2 SD channels plus a medium rate HD channel. Wikipedia has a lot of info on this.. The appearance of the decoded output depends a LOT on how good the encoding was. You can cheap out and just do simple frame encoding, with no frame-to-frame encoding, in which case you get high resolution with lots of artifacts. Or, you can spend a lot more effort on the encoding, and make use of the frame to frame redundancy, and get a lot less artifacts. The telling difference is if you have something like a panning shot over a complex, but fixed, background (e.g. a forest in the distance). A good encoder will be able to make use of the fact that big swaths of the image are actually the same from frame to frame, just displaced. A cheap encoder will not. Cable TV and direct broadcast satellite use somewhat different data rates (since they have different heritage), and different encodings, sometimes. Compressed digital video that is intended for further editing is also compressed differently, because the "broadcast" compressions tend to have unsuitable artifacts in the editing process. Squeezing a raw data rate of >1 Gbps down into 20 Mbps or so always entails some compromises, and the broadcast compressions are designed to allow inexpensive decoders (and expensive encoders..you'll be making millions of decoders and dozens of encoders) and for artifacts that are visually unobjectionable to an end user. As you can imagine, there is much opportuntity for transcoding artifacts. These days, H.264/AVC is probably the leading candidate for compression So.. for over the air HD broadcasts, 20 Mbps should do you, which is well within the range of a variety of hard disks. Converting to GB/hr, I get 8-9 GB/hr James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From eugen at leitl.org Tue May 29 10:28:01 2007 From: eugen at leitl.org (Eugen Leitl) Date: Tue, 29 May 2007 19:28:01 +0200 Subject: [Beowulf] HDTV video file sizes In-Reply-To: <1088659434.1180453767761.JavaMail.root@fepweb03> References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: <20070529172801.GW17691@leitl.org> On Tue, May 29, 2007 at 08:49:27AM -0700, laytonjb at charter.net wrote: > Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for http://www.dv.com/columns/columns_item.php?articleId=55301757 says Originally, 1080 was designed as an interlaced format, and its highest frame rate at present is 60i. As shot by the Sony CineAlta, 1080p is recorded and displayed using interlaced hardware that uses segmented-frame techniques. To record slower rates, the tape transport is slowed down; 1080p24 is recorded as if it were 1080i48, with the pleasant side effect that you can record about 20 percent longer on the same size tape. The data rate for uncompressed 4:2:2 1080i60 or 1080p30 is about 1.2 Gbps. The highest frame rate for 720, which is always progressive, is 60 fps: 720p60. Sixty full frames a second provides smooth motion imaging for sports and allows slow-mo with no vertical bobble from deinterlacing. Slower frame rates are possible too, like 720p30 and 720p24, (although Panasonic's variable-frame-rate Varicam simply repeats frames as needed to convert whatever frame rate it's shooting into a steady 60p for recording on tape or spitting out as HD-SDI). The data rate for uncompressed 4:2:2 720p60 is also about 1.2 Gbps. http://en.wikipedia.org/wiki/1080p Production standards A new high-definition progressive scan format for picture creation is currently being developed to operate at 1080p at 50 or 60 frames per second.[2][3] This format will require a whole new range of studio equipment including cameras, storage, edit and contribution links as it has doubled the data rate of current 50 or 60 field interlace 1920 ? 1080 from 1.485 Gb/s to nominally 3 Gb/s. It is unable to be broadcast in a compressed transmission to current MPEG-2 based HD receivers. This format will improve final pictures because of the benefits of "oversampling" and removal of interlace artifacts. > now and perhaps 1080. I'm looking for the file size if you store the whole thing > in a single file. You were thinking about using a cluster for rendering or transcoding, right? From jamesjamiejones at aol.com Tue May 29 18:25:08 2007 From: jamesjamiejones at aol.com (matt jones) Date: Wed, 30 May 2007 02:25:08 +0100 Subject: [Beowulf] HDTV video file sizes In-Reply-To: References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: <465CD274.6090103@aol.com> >Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for now and perhaps 1080. I'm looking for the file size if you store the whole thing in a single file. >Well, I didn't have any idea ten seconds ago, but now I know that one hour should be roughly 3 GB. (So a movie should be 5-6 GB.) >hmm, that's normal DVD, isn't it? the newfangled flavors (BD, etc) seem to be 5-10 higher capacity. >compressed data rates appear to be 20-50 Mbps (lower than 20 probably doesn't count as HD.) >funny how all the HD stuff seems very fuzzy ;) 3GB for 1 hour seems reasonable, a movie in avi is only 700MB, and that's at PAL quality or higher. a DVD is roughly 5GB for augments sake, and that includes the .vob video files, audio files and any extras (which tend to be at a lower quality anyway.) so the size of the movie is say closer to 3.5/4GB than 5GB. the 'dvd' movie is not at PAL res, but something like 4 times the quality of PAL (3/4 way there to the lower end HD). the mid and high HD, i wld expect to take between 5-7GB for an hour. thus just fitting a 'HD film' on a dual layer DVD. blu-ray being the choice medium for 'HD films' in the near future. there is also quite a bit of confusion over what "HD" means. often frame rates, and colour depth are different on different 'HD' objects. so it's quite easy to fit many hours of HD film on a DVD at 5 fps. bit off topic... it's funny how VGA is directly* compatible with SCART, also how DVI is directly compatable with HDMI... interesting how in both cases the computer connector came first and yields better quality. just a case of changing connector's (shape and pin layout). *directly meaning no or little analogue electronics used. personally... HD is a marketing CON to get nieve people to buy 'HD' products when they would be better buying a computer monitor with a higher resolution, colour depth, and refresh rate. although a 42" 'HD' widescreen would look good on my comp. -- matt From laytonjb at charter.net Tue May 29 10:39:34 2007 From: laytonjb at charter.net (laytonjb at charter.net) Date: Tue, 29 May 2007 10:39:34 -0700 Subject: [Beowulf] HDTV video file sizes Message-ID: <1565837372.1180460374910.JavaMail.root@fepweb03> Uncle, Uncle!!! Actually that was a good answer. I see that I need to learn more :) So about 8-9 GB/hour.... What I have in mind is a large number of hours of HDTV being recorded to storage. I'm guessing that total number of hours, but I think the general number is over 4,000 hours (about 36,000 GB or 3.6 TB). Actually it's not that much data is it? Just a few hard drives and you've got it. Thanks! Jeff > At 08:49 AM 5/29/2007, laytonjb at charter.net wrote: > >Good morning, > > > >I was doing some thinking over the weekend (while cooking ribs on > >the grill :) ). > >Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for > >now and perhaps 1080. I'm looking for the file size if you store the > >whole thing > >in a single file. > > > Are you asking about "as generated in the studio" or "as recorded" or > "as broadcast" > > the raw data rate is >1 Gbps (142.18 Mb/s for NSTC sampled at 14.318 > Ms/s up to 1.486 Gbps for SMTPE 292M sampled at 74.25 Ms/s) > > > > There's several compression/redundancy removal steps in the chain, > and different HD broadcast media (over the air in US (ATSC), over the > air in Europe (DVB-T), direct broadcast satellite (DVB-S, and others) > , cable) use different bit rates, and different compression > schemes. And, of course, the DVD (including the new BluRay and > HD-DVD) have their own encodings as well. > > In the US, HD is broadcast over the air in a 6MHz wide channel at > between 19-20 Mbps (3 bits/symbol). However, that 20 Mbps stream can > be divvied up in lots of ways: 1 really HD channel, 5 SD channels, 2 > SD channels plus a medium rate HD channel. > > Wikipedia has a lot of info on this.. > > The appearance of the decoded output depends a LOT on how good the > encoding was. You can cheap out and just do simple frame encoding, > with no frame-to-frame encoding, in which case you get high > resolution with lots of artifacts. Or, you can spend a lot more > effort on the encoding, and make use of the frame to frame > redundancy, and get a lot less artifacts. The telling difference is > if you have something like a panning shot over a complex, but fixed, > background (e.g. a forest in the distance). A good encoder will be > able to make use of the fact that big swaths of the image are > actually the same from frame to frame, just displaced. A cheap > encoder will not. > > Cable TV and direct broadcast satellite use somewhat different data > rates (since they have different heritage), and different encodings, sometimes. > > Compressed digital video that is intended for further editing is also > compressed differently, because the "broadcast" compressions tend to > have unsuitable artifacts in the editing process. Squeezing a raw > data rate of >1 Gbps down into 20 Mbps or so always entails some > compromises, and the broadcast compressions are designed to allow > inexpensive decoders (and expensive encoders..you'll be making > millions of decoders and dozens of encoders) and for artifacts that > are visually unobjectionable to an end user. > > As you can imagine, there is much opportuntity for transcoding artifacts. > > These days, H.264/AVC is probably the leading candidate for compression > > > So.. for over the air HD broadcasts, 20 Mbps should do you, which is > well within the range of a variety of hard disks. Converting to > GB/hr, I get 8-9 GB/hr > > > James Lux, P.E. > Spacecraft Radio Frequency Subsystems Group > Flight Communications Systems Section > Jet Propulsion Laboratory, Mail Stop 161-213 > 4800 Oak Grove Drive > Pasadena CA 91109 > tel: (818)354-2075 > fax: (818)393-6875 > > From peter.st.john at gmail.com Tue May 29 12:06:56 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue, 29 May 2007 15:06:56 -0400 Subject: [Beowulf] 2nd request for ssh connection problem!! In-Reply-To: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> References: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> Message-ID: Ruhollah, I'm not an electronics guy either, but for starters (while you wait for a better answer): Two situations: 1. The user SSH comes into the Head through a single NIC which does little else. 1.1 First, swap out that NIC. 1.2 Swap which slot the NIC is in 1.3 Swap the cable 1.4 Check the environment for recent changes that might affect temperature or RF interference. 2. Not 1. above. I'd want more information about your layout, but an example would be a recent change at the desktops that you didn't know about (such as installation of a Service Pack); you might check with the desktop admins. Good luck. Peter On 5/29/07, Ruhollah Moussavi Baygi wrote: > > -------------------Dear all at Beowulf, > > As I mentioned in my previous email, I have a problem with ssh connection > to our Beowulf cluster. I did put an email in Beowulf mailing list group, > but with no answer. It is hard to believe that nobody knows anything about > this problem, so, I sent this email for the second time. Following please > find the content of my previous email. I WOULD BE GRATEFUL FOR ANY COMING > ANSWER! > > Best, > Ruhollah Moussavi Baygi------------------- > > Hi everybody at Beowulf, > I have a serious problem with ssh connection to our cluster. Every > hint/help/suggestion, which can help me to solve it, is highly appreciated. > Most of the time, when users want to connect and run their programs from > their own PCs, the ssh connection failed, especially during transfer files > from/to head-node. Our user's PCs are mainly WindowsXP, so they use packages > like SSH Secure Shell for connection and file transfer, or Putty for > connection and WinSCP for file transfer. > > The error massage is as follows: > ' Disconnecting: Corrupted MAC on input' > or > ' Disconnecting: bad packet length...', followed by a long integer. > > This problem has practically made our cluster unusable. So, I would be > thankful for any coming advice. > -- > Best, > Ruhollah Moussavi Baygi > > > -- > Best, > Ruhollah Moussavi Baygi > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lindahl at pbm.com Tue May 29 12:16:18 2007 From: lindahl at pbm.com (Greg Lindahl) Date: Tue, 29 May 2007 12:16:18 -0700 Subject: [Beowulf] ssh connection problem In-Reply-To: <1bef2ce30705270147k430800b5x303e56410aba640b@mail.gmail.com> References: <1bef2ce30705270147k430800b5x303e56410aba640b@mail.gmail.com> Message-ID: <20070529191618.GA10221@bx9.net> On Sun, May 27, 2007 at 12:17:54PM +0330, Ruhollah Moussavi Baygi wrote: > 'Disconnecting: Corrupted MAC on input' > 'Disconnecting: bad packet Well, as the links you posted said, you need to: 1) Make sure everyone has the latest version of ssh installed, and 2) Figure out if there is a bad ethernet NIC somewhere ssh works over tcp, so, if there is corrupted data being sent, it is probably because the NIC on your front-end is bad. This is a pretty rare problem, normally a bad component (e.g. a bad cable) causes the IP checksum to be bad, and the packets are discarded. You can look on the front-end at the output of "netstat -i" and "netstat -s" to see if lots of bad checksums are being caused, too. -- greg From peter.st.john at gmail.com Tue May 29 12:19:48 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue, 29 May 2007 15:19:48 -0400 Subject: [Beowulf] Beowulf on Demand In-Reply-To: <4d2c60b30705280943wf36acdg9c9c1d006e7f4426@mail.gmail.com> References: <4d2c60b30705280943wf36acdg9c9c1d006e7f4426@mail.gmail.com> Message-ID: Hola Juan, I am sure you will get interesting and helpful answers soon from the experts, but I want to clarify (hacerse mas claro) some language. "...the quantity of money that every group have to contribute..." I think you mean, your concern is the quanity of money that each group should contribute; your concern is charging accounts equitably. Your sentence could also be read, "the quantity of money that each group **must** contribute" as if your concern were minimizing cost. You intend "contrubir igualmente" and not "contrubir a lo menos", correct? I'm sorry my Spanish is not idiomatic. Also, by "computacion bajo demanda" I think you mean "minimum compuation demand" but not "maximum compuation allocated"; I think I would charge something for the maximum requested (if job uses more CPU then it is stopped) as well as actual CPU used, but I am no expert. I hope I have made the language a little clearer, instead of worse :-) Por favor, pregunteme cerca de la lengua si puedo ayudar, pero hay varios aqui' quienes escribir los ambos mejor que yo. Claro :-) Buena suerte, Peter On 5/28/07, Juan Camilo Hernandez wrote: > > Hi list > > we are starting a project about an assembly of a Beowilf cluster > between several research groups at the University where I am studying, > > We have had some prblems, but mainly for the quantity of money that > every group have to contribute for administrative and manteniance > cost. > > The idea that we have is that every gruop contribute with a quantity > of money depending of their use of the flops in the cluster, someting > as "computacion bajo demanda" > > I would like to ask you: Do you think is this one, the best option to > mesure the cluster user use to be able to proced to charge to every > user? > > Could you let me knoe which software can I use to be able to carry on > this kinf of control? > > Do you have any suggest or recommendation? > > Thak you very much. > > > Best regards. > > -- > Juan Camilo Hernandez D. > Investigador Asistente > GIGAX - http://www.gigax.org > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgb at phy.duke.edu Tue May 29 13:27:37 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 29 May 2007 16:27:37 -0400 (EDT) Subject: [Beowulf] ssh connection problem In-Reply-To: <1bef2ce30705270147k430800b5x303e56410aba640b@mail.gmail.com> References: <1bef2ce30705270147k430800b5x303e56410aba640b@mail.gmail.com> Message-ID: On Sun, 27 May 2007, Ruhollah Moussavi Baygi wrote: > Hi everybody at Beowulf, > > I have a serious problem with ssh connection to our cluster. Every > hint/help/suggestion, which can help me to solve it, is highly appreciated. > > Most of the time, when users want to connect and run their programs from > their own PCs, the ssh connection failed, especially during transfer files > from/to head-node. Our user's PCs are mainly WindowsXP, so they use packages > like SSH Secure Shell for connection and file transfer, or Putty for > connection and WinSCP for file transfer. > > > The error massage is as follows: > > 'Disconnecting: Corrupted MAC on input' This sounds to me like hardware problems. What does your physical network look like? Is it built with the right cables, within spec, with decent switches? Do you see other evidence of network packet corruption? > > > or > > 'Disconnecting: bad packet Yes, sounds like bad hardware. Perhaps your cables aren't cat 5? Perhaps your electrical power has noise? Perhaps your switch(es) are broken or have been taken over by trolls? This sounds like you're failing packet checksum tests or experiencing pretty serious TCP collision problems. What do the network statistics look like on the interfaces in question? rgb > length...', > followed by a long integer. > > > This problem has practically made our cluster unusable. So, I would be > thankful for any coming advice. > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From James.P.Lux at jpl.nasa.gov Tue May 29 13:41:42 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 29 May 2007 13:41:42 -0700 Subject: [Beowulf] HDTV video file sizes In-Reply-To: <465CD274.6090103@aol.com> References: <1088659434.1180453767761.JavaMail.root@fepweb03> <465CD274.6090103@aol.com> Message-ID: <6.2.3.4.2.20070529133715.0322c990@mail.jpl.nasa.gov> At 06:25 PM 5/29/2007, matt jones wrote: > >Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 >for now and perhaps 1080. I'm looking for the file size if you store the >whole thing in a single file. > >personally... > >HD is a marketing CON to get nieve people to buy 'HD' products when they >would be better buying a computer monitor with a higher resolution, >colour depth, and refresh rate. although a 42" 'HD' widescreen would >look good on my comp. somewhat ot, although people do build video walls with Beowulf clusters... TV/Video displays and computer monitors actually have different specifications and ways things are measured. For instance, the typical computer CRT monitor is specified for 3 year life to 50% brightness. The typical television CRT is specified for 10-15 year life. TVs are MUCH brighter than typical monitors (at the tradeoff in CRT days of a bigger spot, so the resolution is poorer) Yes, they're both display technologies, but they sell into very different markets with very different expectations of life, user fiddling, viewing angles, background light, etc. The challenges in HD video are more than just the display.. it's things like bandwidth efficiency, digital rights management, etc. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From rgb at phy.duke.edu Tue May 29 13:48:26 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 29 May 2007 16:48:26 -0400 (EDT) Subject: [Beowulf] Beowulf on Demand In-Reply-To: <4d2c60b30705280943wf36acdg9c9c1d006e7f4426@mail.gmail.com> References: <4d2c60b30705280943wf36acdg9c9c1d006e7f4426@mail.gmail.com> Message-ID: On Mon, 28 May 2007, Juan Camilo Hernandez wrote: > The idea that we have is that every gruop contribute with a quantity > of money depending of their use of the flops in the cluster, someting > as "computacion bajo demanda" > > I would like to ask you: Do you think is this one, the best option to > mesure the cluster user use to be able to proced to charge to every > user? This is a very tricky model to manage, because cluster utilization isn't just "FLOPs". A user's programs can block access to the full value of the resource by consuming network, memory, disk channels, and other bottleneck/limiting resources as well as just "CPU cycles". > Could you let me knoe which software can I use to be able to carry on > this kinf of control? There are several approaches you can take. One is to turn on process accounting on the nodes. This is done as root by e.g. accton /var/log/acct You will need to rotate and groom /var/log/acct if you do this -- it grows rapidly on a busy system, although "rapidly" was really written with respect to disk scales that are much smaller than today's. Once you've run this, then per node a command like: sa -m /var/log/acct will yield a table like: rgb at lilith|B:904#sa -m /var/log/acct 173 7.49re 0.00cp 876k root 156 7.49re 0.00cp 801k rgb 16 0.00re 0.00cp 1540k smmsp 1 0.00re 0.00cp 1887k The first field is raw seconds, the last field is cpu-time in averaged core units (probably the one you want). You can see that from when I turned on accounting I did a tiny amount of work -- less than some automated background tasks -- in terms of total CPU, even though my "time" is much higher. You'll have to really study to learn what these fields are and whatever other fields are available. There are (or used to be) tools out there that can take the output of sa and turn it into "cooked" reports. There are also typically accounting tools built into some of the batch job schedulers, I believe, but you haven't indicated whether or not you're using one (if you were, I would have expected that you'd already found them). You can also use a tool like xmlsysd and write your own small application to take snapshots of system utilization and average them out somehow. Obviously RTM is a good idea, and GIYF, and so on, but you CAN set up microscopic accounting if you like. There are probably toolsets out there that will do all of this for you, if you look for them. rgb > > Do you have any suggest or recommendation? > > Thak you very much. > > > Best regards. > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From rgb at phy.duke.edu Tue May 29 13:57:35 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 29 May 2007 16:57:35 -0400 (EDT) Subject: [Beowulf] HDTV video file sizes In-Reply-To: <1565837372.1180460374910.JavaMail.root@fepweb03> References: <1565837372.1180460374910.JavaMail.root@fepweb03> Message-ID: On Tue, 29 May 2007, laytonjb at charter.net wrote: > Uncle, Uncle!!! > > Actually that was a good answer. I see that I need to learn more :) > > So about 8-9 GB/hour.... > > What I have in mind is a large number of hours of HDTV being recorded to > storage. I'm guessing that total number of hours, but I think the general > number is over 4,000 hours (about 36,000 GB or 3.6 TB). Actually it's not You're as good at arithmetic as I am, Jeff... That would be 36 TB. Not QUITE so easy or cheap to get. BTW, I pulled my 3 GB answer from sites that indicated how video will be delivered over the web, where bw is a scarce resource. So one can assume that it is highly compressed and similar in scheme to standard movie-length DVD recording (HD or not). You got more/better answers, of course, but you could PROBABLY drop that to 12 TB or so if you use a good recording/compression scheme. Depending, of course, on what the data looks like and what your image tolerances are. rgb BTW, if I sent you my google string, you'd have to send me the recipe for your ribs, right? Although all the spoilsports on the list have fixed it up so that there's no point anymore... > that much data is it? Just a few hard drives and you've got it. > > Thanks! > > Jeff > >> At 08:49 AM 5/29/2007, laytonjb at charter.net wrote: >>> Good morning, >>> >>> I was doing some thinking over the weekend (while cooking ribs on >>> the grill :) ). >>> Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 for >>> now and perhaps 1080. I'm looking for the file size if you store the >>> whole thing >>> in a single file. >> >> >> Are you asking about "as generated in the studio" or "as recorded" or >> "as broadcast" >> >> the raw data rate is >1 Gbps (142.18 Mb/s for NSTC sampled at 14.318 >> Ms/s up to 1.486 Gbps for SMTPE 292M sampled at 74.25 Ms/s) >> >> >> >> There's several compression/redundancy removal steps in the chain, >> and different HD broadcast media (over the air in US (ATSC), over the >> air in Europe (DVB-T), direct broadcast satellite (DVB-S, and others) >> , cable) use different bit rates, and different compression >> schemes. And, of course, the DVD (including the new BluRay and >> HD-DVD) have their own encodings as well. >> >> In the US, HD is broadcast over the air in a 6MHz wide channel at >> between 19-20 Mbps (3 bits/symbol). However, that 20 Mbps stream can >> be divvied up in lots of ways: 1 really HD channel, 5 SD channels, 2 >> SD channels plus a medium rate HD channel. >> >> Wikipedia has a lot of info on this.. >> >> The appearance of the decoded output depends a LOT on how good the >> encoding was. You can cheap out and just do simple frame encoding, >> with no frame-to-frame encoding, in which case you get high >> resolution with lots of artifacts. Or, you can spend a lot more >> effort on the encoding, and make use of the frame to frame >> redundancy, and get a lot less artifacts. The telling difference is >> if you have something like a panning shot over a complex, but fixed, >> background (e.g. a forest in the distance). A good encoder will be >> able to make use of the fact that big swaths of the image are >> actually the same from frame to frame, just displaced. A cheap >> encoder will not. >> >> Cable TV and direct broadcast satellite use somewhat different data >> rates (since they have different heritage), and different encodings, sometimes. >> >> Compressed digital video that is intended for further editing is also >> compressed differently, because the "broadcast" compressions tend to >> have unsuitable artifacts in the editing process. Squeezing a raw >> data rate of >1 Gbps down into 20 Mbps or so always entails some >> compromises, and the broadcast compressions are designed to allow >> inexpensive decoders (and expensive encoders..you'll be making >> millions of decoders and dozens of encoders) and for artifacts that >> are visually unobjectionable to an end user. >> >> As you can imagine, there is much opportuntity for transcoding artifacts. >> >> These days, H.264/AVC is probably the leading candidate for compression >> >> >> So.. for over the air HD broadcasts, 20 Mbps should do you, which is >> well within the range of a variety of hard disks. Converting to >> GB/hr, I get 8-9 GB/hr >> >> >> James Lux, P.E. >> Spacecraft Radio Frequency Subsystems Group >> Flight Communications Systems Section >> Jet Propulsion Laboratory, Mail Stop 161-213 >> 4800 Oak Grove Drive >> Pasadena CA 91109 >> tel: (818)354-2075 >> fax: (818)393-6875 >> >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From hahn at mcmaster.ca Tue May 29 14:27:58 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue, 29 May 2007 17:27:58 -0400 (EDT) Subject: [Beowulf] ssh connection problem In-Reply-To: References: <1bef2ce30705270147k430800b5x303e56410aba640b@mail.gmail.com> Message-ID: >> 'Disconnecting: Corrupted MAC on input' > > This sounds to me like hardware problems. What does your physical actually, I suspect the MAC here is Msg Auth Code, not Media Access Control. I think I've heard of this sort of ssh problem occurring when your network path is unclean - like a router trying to be too smart, modifying tcp header fields, etc. From James.P.Lux at jpl.nasa.gov Tue May 29 14:50:30 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 29 May 2007 14:50:30 -0700 Subject: fast pipes to the house Re: [Beowulf] HDTV video file sizes In-Reply-To: <1084075179.1180472834288.JavaMail.root@fepweb03> References: <1084075179.1180472834288.JavaMail.root@fepweb03> Message-ID: <6.2.3.4.2.20070529143834.031f6c98@mail.jpl.nasa.gov> At 02:07 PM 5/29/2007, laytonjb at charter.net wrote: >- > >I really, really wish they would actually do GigE to each house instead of the >infernal DSL or Cable modem. GigE has some distance issues too. I have Verizon FiOS - fiber to the house, and it carries all the phone, potentially the TV, AND (potentially) 50-100 Mbps data. I actually have about 15 Mbps data service. This is the big "triple play" thing. Phone, TV, data all from one provider. As it happens, I get my TV over coax from another provider, but my neighbors have the triple play. It costs about $900 per house to provision in my area, which is pretty new, dense, has decent underground conduits, etc. > I met a youg man who used to be a VP at >AT&T and he did a study about bringing GigE to each house. It turned out to >be cheap to do and AT&T had a good start at the infrastructure to handle >that kind of bandwidth. >At this rate we'll never see full streaming to the desktop. My only hope at >this point is either Google will figure it out or perhaps Steve Jobs >will force the >communications knuckleheads to actually do the right thing (not that I'm >a Jobs fan by any means). Or perhaps Jim Lux will develop a new high-speed >downlink from satellites to get streaming video (maybe some kind of P2P >cluster in space set up where you get feeds from multiple satellites). What >the hell, I can dot my roof with small dishes if it means I can watch Grey's >Anatomy when I want to. There's an interesting aspect to this stuff.. do you provide a big pipe to the house that carries all channels and do the selection in a distributed fashion OR do you provide a skinny pipe to the house, carrying only the content they've selected at a given time. Very different network traffic models, particularly at higher levels of aggregation. Sure, it's no problem for them to put a fiber with >10 Gbps bandwidth from my house to somewhere down the street, but if they have to aggregate 1000 houses, that's 10 Tbps, which is moderately impressive, and if they aggregate all the 50,000 houses fed by the current central office, that's 500 Tb/s.... Just the packet memory alone if you do store and forward routing would be crippling... Clearly, something other than "give everyone a DS3 to the backbone" is appropriate. And that's where the thinking (and market forces) come to bear. After all, these days, if you're forking out the bucks for building the infrastructure, one wants to figure out how to "monetize" that transaction (e.g. watching Grays when you want to). And once the transaction is monetized, there's a whole raft of folks who ask "where's my share?" rapidly devolving to "how come his share is bigger than mine?" followed by "I'm taking my ball and going home". Viz, the interaction between MS and Disney: MS: We have this fabulous distribution platform, you should be happy to pay us to distribute your content. Disney: We have this fabulous content, you should be happy to pay us for it so you can distribute it. So how's Verizon going to amortize the $900 installation cost of my FiOS? James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From rgb at phy.duke.edu Tue May 29 15:00:03 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 29 May 2007 18:00:03 -0400 (EDT) Subject: [Beowulf] HDTV video file sizes In-Reply-To: <1084075179.1180472834288.JavaMail.root@fepweb03> References: <1084075179.1180472834288.JavaMail.root@fepweb03> Message-ID: On Tue, 29 May 2007, laytonjb at charter.net wrote: > I really, really wish they would actually do GigE to each house instead of the > infernal DSL or Cable modem. I met a youg man who used to be a VP at > AT&T and he did a study about bringing GigE to each house. It turned out to > be cheap to do and AT&T had a good start at the infrastructure to handle > that kind of bandwidth. So he wrote up the report, talked it over with some > people and of course it was killed. I'm beginning to think that I should > wear a vertically gray striped shirt with a number of the back as part of the > "Bandwidth Gulag" that the communications have put us in. We suffer and > they laugh. Don't be so pessimistic. It will happen, because all the major players want to be first and biggest to score the "triple play" -- delivery of phone, internet and media to the household. Because data is delivered in bursts -- it isn't enough to deliver a movie at playing speed (on umpty channels they need to be able to deliver a MOVIE that you can watch or not watch until later in a few seconds -- they'll get to gigabits to the household in -- at a guess, three to five more years (anybody with better numbers?). At least in some cities, towns, venues. They're running fiber down the country road that leads past my neighborhood as I type this -- I've been watching them bury fiber throughout Durham for the last three or four years. The phone company isn't even bothering to upgrade its DSL equipment (which is totally obsolete) because I'm pretty sure that they're never going to replace it. It won't happen all at once because it is expensive, it may make a stop or two at 1.5 Mbps or 45 Mbps along the way, but it won't be that long now. As you say, once the fiber is there (even if it isn't all the way to "the household") there isn't much point in being stingy. > At this rate we'll never see full streaming to the desktop. My only hope at > this point is either Google will figure it out or perhaps Steve Jobs will force the > communications knuckleheads to actually do the right thing (not that I'm > a Jobs fan by any means). Or perhaps Jim Lux will develop a new high-speed > downlink from satellites to get streaming video (maybe some kind of P2P > cluster in space set up where you get feeds from multiple satellites). What > the hell, I can dot my roof with small dishes if it means I can watch Grey's > Anatomy when I want to. No, I think that they'll have fiber running to households in decent numbers starting in maybe 2010. And whoever gets there first "wins". rgb > > Jeff >> >> rgb >> >> BTW, if I sent you my google string, you'd have to send me the recipe >> for your ribs, right? Although all the spoilsports on the list have >> fixed it up so that there's no point anymore... > > Well, I can perhaps send you the recipe, but we have to be careful so that > the Kanasas Cit Barbecue Masters don't come looking for me. I see those > guys that lokk the "Da Bears" fan club from the old SNL routines tracking > me down. > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From rgb at phy.duke.edu Tue May 29 15:11:43 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue, 29 May 2007 18:11:43 -0400 (EDT) Subject: fast pipes to the house Re: [Beowulf] HDTV video file sizes In-Reply-To: <6.2.3.4.2.20070529143834.031f6c98@mail.jpl.nasa.gov> References: <1084075179.1180472834288.JavaMail.root@fepweb03> <6.2.3.4.2.20070529143834.031f6c98@mail.jpl.nasa.gov> Message-ID: On Tue, 29 May 2007, Jim Lux wrote: > So how's Verizon going to amortize the $900 installation cost of my FiOS? As an investment. So that they "win". Because if they don't, others will and they'll lose. And yeah, ultimately YOU will pay for it, but maybe not all at once and up front. Speaking of your other issues -- one possible solution is media replication and media servers. OK, so to be able to deliver (say) the 2000 top DVD titles on demand requires 10-20 TB of storage. That's a trivial investment, really. Disk (even RAID disk) is perhaps $0.25/GB. Storing a movie costs at MOST $5-10 -- small compared to the capital investment required to sell them or rent them on physical media. And disk is ever cheaper, servers ever faster. So if one locates "stores" of basically all the titles one might wish to deliver that auto-replicate on demand while solving an "interesting" problem in provisioning and optimization throughout communities (with a suitable tree structure or network) one can avoid a lot of resource contention on the aggregate backbone. There are already companies trying to move into this space using the limited bw already available -- however, centralized distribution models probably will not scale. All we REALLY need is for somebody to be working on the ware needed to heat up a direct neural interface to the information. I'm tired of typing. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu From patrick at myri.com Tue May 29 16:02:12 2007 From: patrick at myri.com (Patrick Geoffray) Date: Tue, 29 May 2007 19:02:12 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> References: <001b01c79952$3a266030$0900a8c0@objection> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> Message-ID: <465CB0F4.8070805@myri.com> Hi Jim, Jim Lux wrote: > At 10:52 AM 5/23/2007, Peter St. John wrote: >> But oh and Jim if you recall any papers about this I could read that >> would be "Jim" Dandy. > I seem to recall that if you google hypercube and intel, you'll turn up > some of the papers that were written early on. The guys who started > with the hypercube interconnect were at CalTech, as I recall, and spun > off to form a supercomputer company embodying that, which Intel also > adopted. There is a nice historical section about it on Netlib: http://www.netlib.org/utk/lsi/pcwLSI/text/node13.html Patrick From James.P.Lux at jpl.nasa.gov Tue May 29 16:56:01 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 29 May 2007 16:56:01 -0700 Subject: fast pipes to the house Re: [Beowulf] HDTV video file sizes In-Reply-To: References: <1084075179.1180472834288.JavaMail.root@fepweb03> <6.2.3.4.2.20070529143834.031f6c98@mail.jpl.nasa.gov> Message-ID: <6.2.3.4.2.20070529165004.02a3f5c0@mail.jpl.nasa.gov> At 03:11 PM 5/29/2007, Robert G. Brown wrote: >On Tue, 29 May 2007, Jim Lux wrote: > >>So how's Verizon going to amortize the $900 installation cost of my FiOS? > >As an investment. So that they "win". Because if they don't, others >will and they'll lose. And yeah, ultimately YOU will pay for it, but >maybe not all at once and up front. > >Speaking of your other issues -- one possible solution is media >replication and media servers. OK, so to be able to deliver (say) the >2000 top DVD titles on demand requires 10-20 TB of storage. That's a >trivial investment, really. Disk (even RAID disk) is perhaps $0.25/GB. >Storing a movie costs at MOST $5-10 -- small compared to the capital >investment required to sell them or rent them on physical media. And >disk is ever cheaper, servers ever faster. I suspect the problems are not technological, but business. Say you put a movie cache in a vault at the end of every block (or with whatever density is needed). Who owns the vault? Who owns the content stored in the vault? Does Verizon charge Sony Entertainment to store Sony's movies for future distribution? Who gets to bill the customer? Is Sony reimbursed when the customer actually watches it, when Verizon issues the invoice, when Verizon gets paid, or speculatively, up front. (The cable TV model has cable companies paying, essentially in advance, for the channels they carry, with a lot of bundling and package deals from vendor to cable co, who then repackages to the consumer.. e.g. We'll sell you ESPN, but you also have to carry Midnight Preschool channel and the Paint Drying channel, so you burn 3 channels worth of bandwidth to carry the most customers want to pay for.) Who's responsible for authenticating the users of the content in that vault? >So if one locates "stores" of basically all the titles one might wish to >deliver that auto-replicate on demand while solving an "interesting" >problem in provisioning and optimization throughout communities (with a >suitable tree structure or network) one can avoid a lot of resource >contention on the aggregate backbone. There are already companies >trying to move into this space using the limited bw already available -- >however, centralized distribution models probably will not scale. Distributed stores do work (e.g. Google) but have challenging non-technical aspects as described above. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From James.P.Lux at jpl.nasa.gov Tue May 29 16:57:32 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue, 29 May 2007 16:57:32 -0700 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <465CB0F4.8070805@myri.com> References: <001b01c79952$3a266030$0900a8c0@objection> <4651F696.6060803@georgetown.edu> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> <465CB0F4.8070805@myri.com> Message-ID: <6.2.3.4.2.20070529165713.031deba8@mail.jpl.nasa.gov> At 04:02 PM 5/29/2007, Patrick Geoffray wrote: >Hi Jim, > >Jim Lux wrote: >>At 10:52 AM 5/23/2007, Peter St. John wrote: >>>But oh and Jim if you recall any papers about this I could read >>>that would be "Jim" Dandy. > >>I seem to recall that if you google hypercube and intel, you'll >>turn up some of the papers that were written early on. The guys >>who started with the hypercube interconnect were at CalTech, as I >>recall, and spun off to form a supercomputer company embodying >>that, which Intel also adopted. > >There is a nice historical section about it on Netlib: >http://www.netlib.org/utk/lsi/pcwLSI/text/node13.html There you go.. I knew I had read it recently somewhere.. Jim From reuti at staff.uni-marburg.de Wed May 30 04:31:19 2007 From: reuti at staff.uni-marburg.de (Reuti) Date: Wed, 30 May 2007 13:31:19 +0200 Subject: [Beowulf] Beowulf on Demand In-Reply-To: <4d2c60b30705280943wf36acdg9c9c1d006e7f4426@mail.gmail.com> References: <4d2c60b30705280943wf36acdg9c9c1d006e7f4426@mail.gmail.com> Message-ID: <292878AB-33E8-4BBD-8A6C-E9572A163ADC@staff.uni-marburg.de> Hi, Am 28.05.2007 um 18:43 schrieb Juan Camilo Hernandez: > Hi list > > we are starting a project about an assembly of a Beowilf cluster > between several research groups at the University where I am studying, > > We have had some prblems, but mainly for the quantity of money that > every group have to contribute for administrative and manteniance > cost. > > The idea that we have is that every gruop contribute with a quantity > of money depending of their use of the flops in the cluster, someting > as "computacion bajo demanda" > > I would like to ask you: Do you think is this one, the best option to > mesure the cluster user use to be able to proced to charge to every > user? > > Could you let me knoe which software can I use to be able to carry on > this kinf of control? > > Do you have any suggest or recommendation? do you intend to use a queuingsystem for batch operation, e.g. SUN GridEngine? http://gridengine.sunsource.net/ This way you could balance on the one hand the amount of computing time each group/user will get at a certain point of time in the cluster, and OTOH also get some accouting data - either by the granted nodes/slots in walltime or the real consumed CPU time by user/ group/project. The latter is useful, in case you oversubscribe the nodes, as some parallel programs are not doing parallel work all the time, and this way the idling time on such nodes can still be used for some kind of "serial background job". -- Reuti From peter.st.john at gmail.com Wed May 30 07:15:38 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Wed, 30 May 2007 10:15:38 -0400 Subject: [Beowulf] Network considerations for new generation cheap beowulfcluster In-Reply-To: <465CB0F4.8070805@myri.com> References: <001b01c79952$3a266030$0900a8c0@objection> <20070522065206.GG17691@leitl.org> <4652E9EE.4010906@sicortex.com> <4652FA78.7020600@sicortex.com> <6.2.3.4.2.20070523100102.031d2ad8@mail.jpl.nasa.gov> <6.2.3.4.2.20070523120701.031c96a8@mail.jpl.nasa.gov> <465CB0F4.8070805@myri.com> Message-ID: Very interesting, thanks! The CalTech project started in 81; TMC was founded in 82 (based on MIT work that I think was more theoretical, than constructing an actual machine). So the cometitive nature of the chronology adds drama to the story :-) but it looks like the Cal project is more in the lines of Beowulfry, more of a forebear, and there's the connection to JPL too. Very cool. Peter On 5/29/07, Patrick Geoffray wrote: > > Hi Jim, > > Jim Lux wrote: > > At 10:52 AM 5/23/2007, Peter St. John wrote: > >> But oh and Jim if you recall any papers about this I could read that > >> would be "Jim" Dandy. > > > I seem to recall that if you google hypercube and intel, you'll turn up > > some of the papers that were written early on. The guys who started > > with the hypercube interconnect were at CalTech, as I recall, and spun > > off to form a supercomputer company embodying that, which Intel also > > adopted. > > There is a nice historical section about it on Netlib: > http://www.netlib.org/utk/lsi/pcwLSI/text/node13.html > > Patrick > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.hearns at streamline-computing.com Thu May 31 02:58:27 2007 From: john.hearns at streamline-computing.com (John Hearns) Date: Thu, 31 May 2007 10:58:27 +0100 Subject: [Beowulf] HDTV video file sizes In-Reply-To: <1565837372.1180460374910.JavaMail.root@fepweb03> References: <1565837372.1180460374910.JavaMail.root@fepweb03> Message-ID: <465E9C43.20605@streamline-computing.com> laytonjb at charter.net wrote: > Uncle, Uncle!!! > > Actually that was a good answer. I see that I need to learn more :) > > So about 8-9 GB/hour.... > > What I have in mind is a large number of hours of HDTV being recorded to > storage. I'm guessing that total number of hours, but I think the general > number is over 4,000 hours (about 36,000 GB or 3.6 TB). Actually it's not > that much data is it? Just a few hard drives and you've got it. Actually, this is very on topic for the Beowulf list. Think parallel filesystems - such as Panasas or Lustre. The company I used to work for (Framestore-CFC) are big Lustre users, from what I can gather. -- John Hearns Senior HPC Engineer Streamline Computing, The Innovation Centre, Warwick Technology Park, Gallows Hill, Warwick CV34 6UW Office: 01926 623130 Mobile: 07841 231235 From jim.windle at gmail.com Tue May 29 10:17:00 2007 From: jim.windle at gmail.com (Jim Windle) Date: Tue, 29 May 2007 13:17:00 -0400 Subject: [Beowulf] HDTV video file sizes In-Reply-To: References: <1088659434.1180453767761.JavaMail.root@fepweb03> Message-ID: <1932e3120705291017t4f11eed9gcc36cd120697e216@mail.gmail.com> On 5/29/07, Mark Hahn wrote: > > > Well, I didn't have any idea ten seconds ago, but now I know that one > > hour should be roughly 3 GB. (So a movie should be 5-6 GB.) > > hmm, that's normal DVD, isn't it? the newfangled flavors (BD, etc) > seem to be 5-10 higher capacity. So if Netflix isn't lying when they say they have shipped over a billion movies that means they have moved roughly 5 exabytes of data via the US mail. I wonder how that compares the amount moved over the internet during the same time period? compressed data rates appear to be 20-50 Mbps (lower than 20 > probably doesn't count as HD. > > funny how all the HD stuff seems very fuzzy ;) > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: From danderson at lnxi.com Tue May 29 12:10:06 2007 From: danderson at lnxi.com (David B. Anderson) Date: Tue, 29 May 2007 13:10:06 -0600 Subject: [Beowulf] 2nd request for ssh connection problem!! In-Reply-To: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> References: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> Message-ID: <465C7A8E.6050103@lnxi.com> Try turning off the Ethernet off load features of your NICs. I've seen offload be broken such that will it works with most stuff but SSH will complain for security reasons. Start with ethtool -K eth0 tso off see http://www.die.net/doc/linux/man/man8/ethtool.8.html Ruhollah Moussavi Baygi wrote: > -------------------Dear all at Beowulf, > > As I mentioned in my previous email, I have a problem with ssh > connection to our Beowulf cluster. I did put an email in Beowulf > mailing list group, but with no answer. It is hard to believe that > nobody knows anything about this problem, so, I sent this email for > the second time. Following please find the content of my previous > email. I WOULD BE GRATEFUL FOR ANY COMING ANSWER! > > Best, > Ruhollah Moussavi Baygi------------------- > > Hi everybody at Beowulf, > I have a serious problem with ssh connection to our cluster. Every > hint/help/suggestion, which can help me to solve it, is highly > appreciated. > Most of the time, when users want to connect and run their programs > from their own PCs, the ssh connection failed, especially during > transfer files from/to head-node. Our user's PCs are mainly WindowsXP, > so they use packages like SSH Secure Shell for connection and file > transfer, or Putty for connection and WinSCP for file transfer. > > The error massage is as follows: > ' Disconnecting: Corrupted MAC on input' > or > ' Disconnecting: bad packet length...', followed by a long integer. > > This problem has practically made our cluster unusable. So, I would be > thankful for any coming advice. > -- > Best, > Ruhollah Moussavi Baygi > > > -- > Best, > Ruhollah Moussavi Baygi > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- David B. Anderson Linux Networx Sr. Software Engineer Email: danderson at lnxi.com Phone: (801) 649-1311 From geoff at galitz.org Tue May 29 12:12:13 2007 From: geoff at galitz.org (Geoff Galitz) Date: Tue, 29 May 2007 12:12:13 -0700 Subject: [Beowulf] 2nd request for ssh connection problem!! In-Reply-To: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> References: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> Message-ID: <06AE5D19-7609-48CB-BA72-54B4D66FA665@galitz.org> Most likely your mail was held up by the moderators as both messages just came through. I'd look at your physical network or switch first. Have you run network stress tests? Doing a ping flood test between the PC's and the head node is a good first step. -geoff On May 28, 2007, at 11:20 PM, Ruhollah Moussavi Baygi wrote: > -------------------Dear all at Beowulf, > > As I mentioned in my previous email, I have a problem with ssh > connection to our Beowulf cluster. I did put an email in Beowulf > mailing list group, but with no answer. It is hard to believe that > nobody knows anything about this problem, so, I sent this email for > the second time. Following please find the content of my previous > email. I WOULD BE GRATEFUL FOR ANY COMING ANSWER! > > Best, > Ruhollah Moussavi Baygi------------------- > > Hi everybody at Beowulf, > I have a serious problem with ssh connection to our cluster. Every > hint/help/suggestion, which can help me to solve it, is highly > appreciated. > Most of the time, when users want to connect and run their programs > from their own PCs, the ssh connection failed, especially during > transfer files from/to head-node. Our user's PCs are mainly > WindowsXP, so they use packages like SSH Secure Shell for > connection and file transfer, or Putty for connection and WinSCP > for file transfer. > > The error massage is as follows: > ' Disconnecting: Corrupted MAC on input' > or > ' Disconnecting: bad packet length...', followed by a long integer. > > This problem has practically made our cluster unusable. So, I would > be thankful for any coming advice. > -- > Best, > Ruhollah Moussavi Baygi > > > -- > Best, > Ruhollah Moussavi Baygi > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From mkozikowski at LIO.AACISD.com Tue May 29 12:29:03 2007 From: mkozikowski at LIO.AACISD.com (Mark Kozikowski) Date: Tue, 29 May 2007 15:29:03 -0400 Subject: [Beowulf] HDTV video file sizes In-Reply-To: <465CD274.6090103@aol.com> References: <1088659434.1180453767761.JavaMail.root@fepweb03> <465CD274.6090103@aol.com> Message-ID: <465C7EFF.4030907@lio.aacisd.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 I have to say that as Matt wrote, HD is a marketing conn. But I believe that this is more an excuse to introduce new copy protection. The existing DVD has been broken - as seen by the industry. The new format gives virtually nothing as far as quality, but presents a completely new platform for protective development. A conn or a scam, either way, people are flocking to the technology. They are seeing the new clothes on the emperor. Mark matt jones wrote: > >Does anyone know who much data 1 hr. of HDTV produces? Let's try 720 > for now and perhaps 1080. I'm looking for the file size if you store the > whole thing in a single file. > > >Well, I didn't have any idea ten seconds ago, but now I know that one > hour should be roughly 3 GB. (So a movie should be 5-6 GB.) > > >hmm, that's normal DVD, isn't it? the newfangled flavors (BD, etc) > seem to be 5-10 higher capacity. > > >compressed data rates appear to be 20-50 Mbps (lower than 20 probably > doesn't count as HD.) > > >funny how all the HD stuff seems very fuzzy ;) > > 3GB for 1 hour seems reasonable, a movie in avi is only 700MB, and > that's at PAL quality or higher. a DVD is roughly 5GB for augments sake, > and that includes the .vob video files, audio files and any extras > (which tend to be at a lower quality anyway.) so the size of the movie > is say closer to 3.5/4GB than 5GB. the 'dvd' movie is not at PAL res, > but something like 4 times the quality of PAL (3/4 way there to the > lower end HD). > > the mid and high HD, i wld expect to take between 5-7GB for an hour. > thus just fitting a 'HD film' on a dual layer DVD. blu-ray being the > choice medium for 'HD films' in the near future. > > there is also quite a bit of confusion over what "HD" means. often frame > rates, and colour depth are different on different 'HD' objects. so it's > quite easy to fit many hours of HD film on a DVD at 5 fps. > > bit off topic... > > it's funny how VGA is directly* compatible with SCART, also how DVI is > directly compatable with HDMI... interesting how in both cases the > computer connector came first and yields better quality. just a case of > changing connector's (shape and pin layout). > > *directly meaning no or little analogue electronics used. > > personally... > > HD is a marketing CON to get nieve people to buy 'HD' products when they > would be better buying a computer monitor with a higher resolution, > colour depth, and refresh rate. although a 42" 'HD' widescreen would > look good on my comp. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFGXH7/iCbOcYAMPlYRCgMaAJwO3+e8XfjrpeYzfj1ZUepzUwoovACeNBq2 u5tPs+1sZW3o1ASIthBgwQc= =B+Gf -----END PGP SIGNATURE----- From d.katz at ieee.org Wed May 30 03:28:41 2007 From: d.katz at ieee.org (Daniel S. Katz) Date: Wed, 30 May 2007 11:28:41 +0100 Subject: [Beowulf] e-Science 2007 1st call for Posters and Research Demos Message-ID: <465D51D9.5080002@ieee.org> Call For Posters and Research Demos The 3rd IEEE International Conference on e-Science and Grid Computing (Bangalore, India, 10-13 December 2007, http://www.escience2007.org/), intends to organize one session for the presentation of posters and a limited number of research demonstration, with each poster or demo focusing on an e-Science project that can be better communicated either as a poster or demo than as a paper. We invite submissions of both posters and research demonstrations. In either case, an extended abstract should be submitted for a limited peer review. Accepted abstracts will be posted on the e-Science 2007 web site. There will be one poster session, where poster authors will be requested to remain near their posters for discussions with attendees. There will also be one demo sessions, held concurrently with the poster session, where the demos will be presented. Topics of interest concerning e-Science and Grid computing include, but not limited to, the following: * Enabling Technologies: Internet and Web Services * Collaborative Science Models and Techniques * Service-Oriented Grid Architectures * Problem Solving Environments * Application Development Environments * Programming Paradigms and Models * Resource Management and Scheduling * Grid Economy and Business Models * Autonomic, Real-Time and Self-Organising Grids * Virtual Instruments and Data Access Management * Sensor Networks and Environmental Observatories in e-Science * Security Challenges * e-Science & Grid applications in Physics, Biology, Astronomy, Chemistry, Finance, Engineering, and the Humanities * Web 2.0 Technology and Services for e-Science ********** Submission ********** Please, send proposals to present posters or research demonstrations to: Daniel S. Katz e-mail : d.katz at ieee.org Proposals should contain: 1. An abstract of the work be presented, as an text file with less than 1000 words. The file should be named XXXXXXXX.txt, where XXXXXXXX is a set of 8 characters that contains some portion of the lead author's name and is unique from any other submission by the same lead author. 2. Up to two figures may also be supplied if there are referenced in the abstract text. Figures should be named XXXXXXXX_1.jpg or XXXXXXXX_2.png, etc, where XXXXXXXX is the same as in the text file, 1 or 2 indicates the figure number, and the extension indicates the figure type. jpg, gif, and png are acceptable figure types. *************** Important Dates *************** Poster and Demo proposals due : September 23, 2007 Poster and Demo acceptance : October 14, 2007 Final versions of abstracts due : October 28, 2007 (final versions may require limited corrections/changes based on reviewer comments, and will have additional formatting requirements for the eScience 2007 website.) -- Daniel S. Katz http://www.cct.lsu.edu/~dsk/ Louisiana State University (225) 578-2750 (voice) (225) 578-5362 (fax) d.katz at ieee.org From TPierce at rohmhaas.com Wed May 30 05:55:14 2007 From: TPierce at rohmhaas.com (Thomas H Dr Pierce) Date: Wed, 30 May 2007 08:55:14 -0400 Subject: [Beowulf] cold cathode fluorescent backlighting In-Reply-To: <6394934c0705251214r4ec157f6o882f391ec8c0e7d9@mail.gmail.com> Message-ID: Dear Julia, your question: "Firstly, do Liquid Crystal Display TV or computer monitors emit any ionizing radiation? " CRT computer monitors emit a small amount of radiation. LCD display do not emit radiation (beyond light). from http://en.wikipedia.org/wiki/Cathode_ray_tube Ionizing radiation: CRTs emit a small amount of X-ray band radiation as a result of the electron beam's bombardment of the shadow mask/aperture grille and phosphors. Almost all of this radiation is blocked by the thick leaded glass in the screen, so the amount of radiation escaping the front of the monitor is widely considered harmless. The Food and Drug Administration regulations in 21 C.F.R. 1020.10 are used to strictly limit, for instance, television receivers to 0.5 milliroentgens per hour (mR/h) (0.13 ?C/(kg?h) or 36 pA/kg) at a distance of 5 cm from any external surface; most CRT emissions fall well below this limit [1]. Your question "If the LCD screen becomes damaged through the inadvertent use of the wrong typed of cleaner or by using any abrasive cloth could it expose one to increased ionizing radiation?" No. In a LCD display damaging the display results in poor viewing of the liquid crystals. You can only make the display murky or dim by using an abrasive on the screen and over-cleaning it... http://en.wikipedia.org/wiki/Lcd_display There is a small fluorescent light in a display but it is very safe. All fluorescent lights need a mechanism to ionize the plasma to make the fluorescent tube light up. The usual methods involve electrons (beta radiation) and those are low energy. http://en.wikipedia.org/wiki/Fluorescent_lighting There are occasional lighting methods that can have a health issue. This was more common in the past with radium paint, and less so today with tritum lighting. http://en.wikipedia.org/wiki/Self-powered_lighting ------ Sincerely, Tom Pierce "julia howard" Sent by: beowulf-bounces at beowulf.org 05/25/2007 03:14 PM To beowulf at beowulf.org cc Subject [Beowulf] cold cathode fluorescent backlighting Not having an Electronics background my questions may seem naive. However as the following issues give me concern I should very much appreciate it if they could be sorted out with some reliable knowledge. Firstly, do Liquid Crystal Display TV or computer monitors emit any ionizing radiation? I f the LCD screenbecomes damaged through the inadvertent use of the wrong typed of cleaner or by using any abrasive cloth could it expose one to increased ionizing radiation? Regarding the cold cathode fluorescent backlights of monitors I read in the Wikipedia encyclopedia under Cold Cathode that some ccfls use a source of beta radiation to start the ionization process. I f this is the case then could LCD televisions expose us to beta or gamma radiation. I should like to replace my CRT TV with a LCD TV, but the thought of a radioactive material being present causes me much anxiety. Looing forward to your informed response, Julia Howard email: juliarachel_howard at yahoo.co.uk _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -------------- next part -------------- An HTML attachment was scrubbed... URL: From merc4krugger at gmail.com Wed May 30 07:19:33 2007 From: merc4krugger at gmail.com (Krugger) Date: Wed, 30 May 2007 15:19:33 +0100 Subject: [Beowulf] 2nd request for ssh connection problem!! In-Reply-To: References: <1bef2ce30705282320m42c4805ka9f6341675fb2924@mail.gmail.com> Message-ID: I have encountered this problem before and it was because of bad RAM. Packets are getting corrupted somewhere, either in the network or due to local hardware problems. From fahadsaeed11 at hotmail.com Tue May 29 17:14:39 2007 From: fahadsaeed11 at hotmail.com (fahad saeed) Date: Wed, 30 May 2007 00:14:39 +0000 Subject: [Beowulf] tftp permission denied Message-ID: An HTML attachment was scrubbed... URL: