[Beowulf] Moores Law is dying
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Joe Landman landman at scalableinformatics.comTue Apr 14 14:55:09 PDT 2009
- Previous message: [Beowulf] Moores Law is dying
- Next message: [Beowulf] Moores Law is dying
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jon Forrest wrote: > Joe Landman wrote: > >> ... so I see you have never used an interprocedural analysis (-ipa) >> switch :) >> >> Allows you do do things like, I dunno, inline one whole routine inside >> another ... > > I've never used this but from your description I don't > see how it leads to larger text sizes at runtime. After all, if you have > routine A which is 10 bytes, and routine B which is 20 bytes, > it would seem that they collectively take 30 bytes no matter > if they stand alone or one inside the other. I might not > be understanding this right, though. More like N*20 bytes ... use the routine more than once :) > >> Usually leads to much larger program text sizes. >> >> This said, I have seen very large programs from RISC days hitting well >> more than 1 GB of text. I haven't played with any recently though. > > Let's say this is about right. Do you see such programs getting > even larger in the future? Sadly, yes. >>> Why is sharing expensive in performance? It might take a little >>> overhead to setup and manage, but why is having multiple virtual >>> addresses map to the same physical memory expensive? >> >> Contention. Memory hot spots. Been there, done that. We are about >> to do this all over again (collectively). > > Naively I would think that text memory hot spots would be a good > thing, because then all the benefits of caching would kick in. > There would be no cache coherence overhead since text is read-only. > Why is this a bad thing? Ohhhh.... You *really* don't want your system brought to its knees over false sharing. Its a great way to turn a large expensive machine into a very slow large expensive machine. Listen to Greg Lindahl, and he'll likely point to this as one of the great fallicies of 'why shared memory is better' than distributed memory :) (not shoving words into his mouth, so if he has changed his mind or thinks differently ... thats ok) Imagine you are a processor, and you have written to a location in ram. So now your cache line is dirty, and waiting in queue to be flushed out. In your parallel program, along comes someone else who really, really wants to read that cache line. Ok, so this forces you to a) flush it now, b) mark that line as clean. Then the next CPU gets that cache line, does it's write, and whammo, some other CPU wants to do the same thing to it as you did. Sadly enough this is a common programming error in shared memory programming. Think of it like you have a bunch of loops operating in parallel, all trying up update the same counter, at once. In parallel. Each update has to wait until it can grab the cache line, and then it proceeds. The more updaters you have, the more contention for that resource you have. Your performance scales as 1/N rather than (constant)*N. Now do this with a page at a time, say a buffer. Like, I dunno, an Infiniband MPI buffer, or a 10 GbE MPI buffer. Throw more CPUs behind this buffer, and force them to get in line to shoot data over to their counterparts. The IB or 10 GbE resource becomes contended for, and as you increase Ncpu, the contention and performance loss gets worse and worse (this is basically what Doug Eadline is worried about). There are ways you can work around some of this stuff. Share nothing is one way, though this is hard to do at an OS level where you share IO devices etc. Allocate some private memory queues, a scheduler, and other bits (you have to do this with Cuda systems and most accelerators to get reasonable performance). I know you might postulate that 32 bit text is effectively the CS equivalent of "C" in physics ... you may approach it asymptotically, but never actually get there ... but unlike in physics, there isn't really an underlying reason why you might not get there. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman at scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 x121 fax : +1 866 888 3112 cell : +1 734 612 4615
- Previous message: [Beowulf] Moores Law is dying
- Next message: [Beowulf] Moores Law is dying
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
