[Beowulf] hpl size problems
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduTue Sep 27 07:09:34 PDT 2005
- Previous message: [Beowulf] hpl size problems
- Next message: [Beowulf] hpl size problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Greg M. Kurtzer writes: > If someone else also has thoughts as to what would have caused the > speedup, I would be very interested. ... >> > hours) running on Centos-3.5 and saw a pretty amazing speedup of the >> > scientific code (*over* 30% faster runtimes) then with the previous >> > RedHat/Rocks build. Warewulf also makes the cluster rather trivial to >> >> such a speedup is indeed impressive; what changed? > > Actually, we used the same kernel (recompiled from RHEL), and exactly the > same compilers, mpi and IB (literally the same RPMS). The only thing > that changed was the cluster management paradigm. The tests were done > back to back with no hardware changes. This is more than a bit scary, if you use the same kernels etc. MOST tasks one might run are fairly clearly bounded at the level of one resource or another. Even on a "fat" system (a desktop workstation with X11 actually running , for example) that isn't being worked on at the console the load average in the absence of a background task or screensaver is typically 0.00-0.01. That is, a modern system consumes less than 1% of its capacity handling ALL the demands of a fat configuration. For this reason this conversation about how much faster things run on minimally configured nodes interests me. For CPU bound tasks I just don't believe it -- if you get 30% speedup running identical, long running binaries on identical kernels and libraries with the only difference being what tasks or daemons you are running in the background something else is seriously wrong, as the system is 99% idle except for the task you are running. For complex network-parallel tasks with barriers and everything I CAN believe it, but think that it is very important to analyze the task to understand WHY such a large speedup occurs. Is the system e.g. paging to disk a lot? Is the task crossing some sort of superlinear speedup threshold? Note that it pretty much cannot be a CPU-based issue -- the speedup has to come from interference and binding in some non-CPU resource. >From Don's remarks the other day I'd guess that it is POSSIBLE that memory organization could be at fault, although I don't see that much state variability in most of the tasks I've ever benchmarked to suggest that 30% variation is reasonable given state differences (e.g. between a boot into single user mode with NOTHING to speak of running but the kernel and only the base libraries in cache and init 5 with the kitchen sink running). 5% is more like it, or sometimes even nothing above noise as linux is usually pretty efficient although I'm sure that <sigh> YMMV and there are tasks that do worse. So I'd guess offhand that the network or some network daemon is the culprit, assuming that your task doesn't whack on disk (local or remote, another likely culprit) too much. I'd also guess that the problem is one of misconfiguration (probably in OSCAR:-( and not even a fundamental bug in the subsystem in question. But of course you SHOULD be able to find out. Linux, while running, is pretty much an open book. top, ps auxrww, vmstat, netstat, free, the entire contents of /proc and /sysfs, xmlsysd/wulfstat -- you can watch fairly precisely what all the relevant processes are doing on a running system and can see a whole lot of what's going on for a full cluster at a time. If the task is running at 99% of the CPU on one system and not paging and 69% of the CPU on the other system and paging like hell, well, that's a pretty big difference. Counting the number of CPU cycles (per second) devoted to the task is one big piece of data -- then all you have to do is look for whatever it is that is blocking the task or fairly systematically destroying its memory access pattern (if it is e.g. a memory bound task) or network access pattern (if it is a network bound task) or causing it to swap/page or... Seems worthwhile -- learning the answer to this might show you how to further optimize the task on ALL hardware/software/distro configurations. rgb -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://www.scyld.com/pipermail/beowulf/attachments/20050927/2ff9d6f1/attachment.bin
- Previous message: [Beowulf] hpl size problems
- Next message: [Beowulf] hpl size problems
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
