[Beowulf] Mersenne primes?
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Vincent Diepeveen diep at xs4all.nlSun Jun 15 04:05:56 PDT 2008
- Previous message: [Beowulf] Powersave on Beowulf nodes
- Next message: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
hi Mark, Your question is not so much about lucaslehmer, but rather about FFT and DWT. No doubt some attempts have been made to parallellize FFT already to multiply big numbers. DWT, as used by GIMPS, is also taking implicit modulo, basically speeding up a factor 2 nearly. That's what you lose when using some generic approach to multiply. Additionally the DWT from GIMPS has been written in SSE2 assembler, so it only works at x86 hardware that has some sort of SSE2 support. This is very bloody fast code optimized for intel P4 at the time. You won't be able to approach that speed writing C code. There is some code that parallellizes all this, yet it would prove more numbers to be composite when running it embarrassingly parallel. Not because it can't be done, but because that Woltman assembler is so bloody fast. In itself such FFT's can be done a lot faster in integers, just on paper. However all the cpu's seem to be faster for floating point. AMD is by far fastest right now for integer multiplication (oldie K8 beating core2 amazingly). See GMP mailing list for explanation. Intel seems to be faster for running the Woltman code. See benchmarks at mersenneforum.org Those x86/x64 cpu's are rather slow for multiplication. To be honest, finding a mersenne is not so interesting now, already too many join GIMPS. A single test single core is like a month with odds like 1 in a 100k it is a prime. Personally i'm trying to find some numbers p = (2^n + 1) / 3 busy around 1 million bits now. Note those won't win you the $100k, for 2 reasons. First of all they can't get proven prime easily, as they are "industry grade primes", secondly you need that army of 30 + Tflop or so that GIMPS is using. For future it would make sense IMHO to parallellize DWT within 1 socket. That allows better cache reusage. Bandwidth from RAM to the caches already is a problem. Let's not start discussing bandwidth that the highend network cards can deliver :) There is some C codes there that do this (note i have not managed to download it) and there is rumours Woltman also parallellized his own code one day, but most of those codes are indeed only well usable within 1 socket, already losing quite some throughput there. If your intention is to just do a single FFT as fast as possible, i'm sure there must be something out there. Would be fun to optimize code like that, yet unpaid without cluster access it is hard to get it done :) Right now the quickest way to find a real big 100% prime is to join primegrids search for 321 (p = k * 2^n - 1 where k = 3). They recently found another one. It also can use the same transform. For k's above 31, the assembler code runs a lot slower, so you lose big throughput already. Because you've got to test that many numbers to find a single prime, nearly all the dudes in that prime world, when at home, are busy with throughput anyway and not so much highend clusters. Highend clusters in that domain get used for other purposes, those busy with that will shut up and not post here data that is interesting to read. For me it is easy to shout something there as i'm currently unemployed looking for a job (a job abroad outside Netherlands would be cool too, feel free to ask my CV/resume) and in meantime am busy with an innocent computer chess engine. Where GUI's sell real well in computer games, engines are not commercial at all, they just search for the holy grail massively parallel, thereby combining hundreds of algorithms/enhancements. Vincent On May 1, 2008, at 10:10 PM, Mark Reynolds wrote: > I'm fairly new to Beowulfery but am reading TFM from > http://www.phy.duke.edu/~rgb/Beowulf/beowulf_book.php (great book by > the way). Does anyone know of any programs to find Mersenne primes > that are suited to parallelised environments such as a cluster made up > of nodes with multi-core processors? I've found mprimes from > http://www.mersenne.org/ but the FAQ states: > Although, a program could be written for dual-CPU systems (it would be > quite time-consuming), the machine will still get more throughput by > working on separate exponents. > Has anybody tried this and can Lucas-Lehmer testing be effectively > parallalised? Currently, you have to run an instance for each core for > it to get any benefit. Also I'd like to modify it so it could, for > example, search between certain numerical ranges independent of the > work units sent out through the website or to take advantage of 64-bit > hardware. Can someone more knowledgeable about this sort of thing tell > me whether this is practical or if it has been done already? > > Thanks. > > -- > "To win one hundred victories in one hundred battles is not the > highest skill. To subdue the enemy without fighting is the highest > skill."— Sun-Tzu > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf >
- Previous message: [Beowulf] Powersave on Beowulf nodes
- Next message: [Beowulf] Nvidia, cuda, tesla and... where's my double floating point?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
