[Beowulf] Open source prime number application
Peter St. John
peter.st.john at gmail.com
Thu Aug 16 13:10:47 PDT 2007
I thought about this some. I don't have a cluster, but I can program in C;
you have a cluster, but don't want to program too much. It's possible we can
help each other.
A toy application I thought of is finding numbers that can written as a sum
of two cubes in two different ways (there's a famous story about Ramanujan
recognizing the four-digit number of a taxicab as "the smallest number than
can be expressed as a sum of two cubes in two distinct ways").
So I had this plan. I went to wiki and got the source of an MPI "hello
world" application, http://en.wikipedia.org/wiki/Message_Passing_Interface
Then I wrote a C program to take a range of numbers, say 1000 to 1999, and
find the numbers in that interval that have the taxicab quality (it turns
out that's what such numbers are called now!).
Since I don't have the MPI library or a cluster I can't test the integrated
thing, so at the moment all I have is the wiki example (which just fills a
buffer with "I am node #<whatever>" for later reporting), which presumably
is an OK example of MPI, and my own code (which just does arithmetic) which
I tested myself. So if you wanted, you could try and integrate these, and
see if you can find bigger reults by compiling for 64 bit arithmetic.
My program wastes a huge amount of redundant calculation; it's only virtue
is to split the memory required among nodes. So it computes x^3 + y^3 for
silly huge ranges of x and y, at each node, but only stores results for
comparisons in limited ranges which can be split up among nodes.
I was disappointed at first because I only got 1729 (which was Ramanujan's
number, the smallest), and 4,104:
Deubg: i = 0
01729 = 12^3 + 1^3,
01729 = 10^3 + 9^3
04104 = 16^3 + 2^3,
04104 = 15^3 + 9^3
Deubg: i = 1000
DEBUG: i = 1588 is too big.
Findrama made 2 hits
= 16 + 2
= 15 + 9
but I checked, putting those two numbers, 1729 and 4104, into the Online
Encyclopedia of Integer Sequences
and that got me the "Taxi Cab Sequence" and the next value is 13,832. My
program is naive, using an exhaustive and redundant search, with only 32
bit, so I get stuck when 1588 is too big becaues it's cube is bigger than
about 4 billion, the extent of a 32 bit int. Also I didn't take the time
even to accommodate the signum bit.
I suggest you look at the wiki link. If you want to hack that code (which
could be confusing because the address of something in this node's memory
may not make sense when passed to some other node, but the MPI library makes
some accommodation) let me know and I'll send you my C program, to call from
inside that Wiki program.
You might start by just running the wiki example, and see if it works and
what you have to do to make it work. If that seems like fun then we can try
doing math with it.
I think I'll integrate them myself, but I wouldn't be able to test it
directly. If you think this toy project is the kind of thing you could use
let me know. Maybe some day you will run my genetic algorithm app, which is
not a toy, for me, and I won't need a cluster :-)
Well, here. It's not that big a program so I'll just paste it in. Serious
software engineers may avert their eyes.
// Toy parallelizable math application
// Pete St.John 8/2007
#define BIGGEST 4000000000
// sloppy way to deal with 32-bit limitation
// a pair of integers; the sum of their cubes matters to us.
struct pair first;
struct pair second;
// any pair you want to keep and report, e.g. the largest summand found in
int findrama(int myid, long z1, long z2, struct winner *winnerp);
// findrama finds numbers which are sums of cubes in two distince ways,
// e.g. 1729 = 12^3 + 1^3 but also = 10^3 + 9^3
// "myid" is the hypothetical node number of the MPI instance that would
// z1 and z2 are the range of numbers to search, e.g. 1729 might be found
// between 1000 and 19999;
// winner is the "best" result found, which in this example is the largest.
int main(int argc, char **argv)
// I'm not doing much in this main. We want to use the wiki MPI example
// main to call "findrama" with a range of numbers, perhaps based on myid;
// node 1 does z = 1 to 1999, node 2 does 2000 to 2999, etc, based on the
// of nodes you have and the largest arithmetic you can do.
struct winner mywinner;
myid = 0;
hits = findrama(myid, 1, 10000, &mywinner);
printf("Findrama made %d hits\n", hits);
if(hits < 0)
printf("Error, probably malloc error.\n");
printf("Winner %d:\n", mywinner.win);
printf("= %d, %d\n", mywinner.first.x, mywinner.first.y);
printf("= %d, %d\n", mywinner.second.x, mywinner.second.y);
int findrama(int myid, long z1, long z2, struct winner *winnerp)
struct pair *p;
long i, j, k;
long bigz, z;
long xi, xj;
//struct pair newpair;
struct winner mywinner;
bigz = 0;
hits = 0;
range = (z2 - z1);
p = malloc( range * sizeof(struct pair));
// malloc should return a pointer to a range of allocated memory
// large enough for "range" pairs, and pair takes up space for two long
fprintf(stderr, "Error: unable to malloc at myid = %d\n", myid);
// in the zth offset we will store the pair x, y st x^3 + y^3 = z^3
// for z in the range from z1 <= z < z2
// zero out
for(i=0; i < range; i++)
p[i].x = 0;
p[i].y = 0;
//z2cube = z2 * z2 * z2;
for(i = 0; i < z2; i++)
xi = i*i*i;
if(xi > BIGGEST)
printf("DEBUG: i = %d is too big.\n", i);
if(i%1000 == 0)
printf("Deubg: i = %d\n", i);
for(j = 0; j <= i; j++)
xj = j*j*j;
xsum = xi + xj;
if(xsum > z2)
// we've exceeded our range
for(k = 0; k< range; k++)
z = z1 + k;
if(z == xsum)
if(p[z].x == 0 && p[z].y == 0)
// found this sum of cubes for the first time
p[z].x = i;
p[z].y = j;
// we have discovered a number that
// can be written as a sum of two cubes
// in two distince ways.
printf("DEBUG: HIT:\n %05d = %d^3 + %d^3,\n %05d = %d^3 + %d^3\n",
z, i, j, z, p[z].x, p[z].y
mywinner.win = z;
mywinner.first.x = i;
mywinner.first.y = j;
mywinner.second.x = p[z].x;
mywinner.second.y = p[z].y;
winnerp->win = mywinner.win;
winnerp->first.x = mywinner.first.x;
winnerp->first.y = mywinner.first.y;
winnerp->second.x = mywinner.second.x;
winnerp->second.y = mywinner.second.y;
On 8/16/07, Robert G. Brown <rgb at phy.duke.edu> wrote:
> On Tue, 14 Aug 2007, Tim Simon wrote:
> > I would like to know if there is any open source software that will
> > run on a beowulf, which will do something like find prime numbers, or
> > something simillar. I cant afford a commercial program.
> > I am guessing that there is not "one size fits all" type thing - what
> > fits your beowulf wont fit mine. Is there anything that can be easily
> > made to fit? I have searched all over the net, and all i can find are
> > educational or high research type applications. I just would like
> > something reasonably simple, (I thought prime numbers but hey,
> > anything is good).
> Writing your own is good. Hunting for primes is actually something that
> will parallelize decently simply because the Sieve of Erasthones for a
> proposed number N only requires that you try all the primes found up to
> N/2. Therefore a collection of nodes can contain and share all the
> primes found up to N-1, and distribute the testing of N, N+2, N+4,
> N+6... N+M (obviously skipping all even numbers) as long as N+M < 2N.
> For a small number M of nodes, if you precompute and load all the primes
> less than M -- trivially done -- then you're in business.
> You will have a few longer term problems. A single processor can sieve
> all the unsigned int primes from 0-(2^32-1) out fairly quickly anyway.
> To go beyond this, you'll need to use symbolic division instead of
> binary computer arithmetic, I think. You'll also run into
> storage/memory problems as you start to get a significant number of
> primes in your running table.
> But it is a fun problem, for sure. Go for it.
> Beyond that, there are a number of programs people use to play with or
> demonstrate a beowulf's speedup. The "funnest" is any of the
> rubberbandable Mandelbrot set exploration tools parallelized on top of
> MPI or PVM -- makes nifty graphics. It isn't quite as cool as it used
> to be because Moore's Law has overwhelmed the computational difficulty
> of the task. When I started in this business, it might have taken
> minutes to compute a rubberbanding down into the MS, depending on where
> one was (and how deep one had to go on all those pixels to terminate).
> Parallelize that on sixty or so nodes and it drops DRAMATICALLY to
> Now CPUs are so fast that a SINGLE CPU can rubberband down in a second
> or two, and networks haven't kept pace so that computing the pixels in a
> single thread might actually be faster than splitting them. So speedup
> is a bit less dramatic, but still very cool.
> > I am running Red Hat FC4.
> Hmmm, might at least TRY 6 or 7.
> > Tim
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> Robert G. Brown http://www.phy.duke.edu/~rgb/
> Duke University Dept. of Physics, Box 90305
> Durham, N.C. 27708-0305
> Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf