Replacing Quad proc SMP multi node DEC Alpha Cluster with Linux Dual P4 cluster?

We have an old cluster setup that has 3 Alpha 4100 nodes (each node
has 4x466 processors) connected with memory channel (first version),
1Gb Ram per node. The cluster is used to run internal code which is
mostly CFD (fine grain synchronous) problems. The code is parallized
and currently uses dec's mpi implementation.

We now need to replicate this system at a remote site, and with an
eye on keeping the cost down, so the idea is to go with a bunch of
dual processor P4 (2GHz xeon?) systems with 2Gb ram each and myrinet

We expect to want to scale up to at least 8 of these dual nodes

I need to look into the performance of various aspects of the
proposed system as we have no experience in this type of setup.

Disclaimer: I dont' necessarily know what I'm talking about - I'm
the hardware/admin guy; the parallel guys do all the coding! Sorry.

I'd appreciate any answers anyone could offer on:

1. In terms of the floating point performance, looking at CFP2000 on and the Xeon should offer much better FP performace
that the older alphas we have. I could only find results for a 4100
5/533 (which is the closest to our current setup) and these were
much lower than the results from Dell Precision Workstation 530 with
2.0Ghz proc.

So I assume this won't be an issue - we'll get fast processors. Is
there a mboard that really sticks out here for offering best support
to these processors - or should we even be looking at AMD MP systems
now. I'm not sure I have the timescale to get in test systems and
test anything out.

2. Quad systems seem to be way more expensive than duals and I could
only find quad systems running at 900Mhz per proc instead of 2GHz in
the duals - so I assume the quads are out on cost and proc. speed

3. One of my concerns was the use of mpi across 8xdual Xeon nodes
versus 3xquad alpha nodes. I'm assuming that mpi(ch) will look after
all the necessary for us in terms of communication between
processors within a node and communication across nodes - but is the
speed of memory, throughput etc a limiting factor on this type of PC
architecture? Will we hit latency issues within a node that we're
not currently hitting?

What sort of memory is recommended? DDR/SDRAM/other?

However having ruled out the quads above - will they offer better
memory performance than the duals - on a par with the quad alpha
nodes? (I appreciate it's not a like for like comparison).

3. I think an entry level myrinet switch will enable me to connect 8
nodes - at a cost of approx 2400 USD for a switch and 1700 USD per
myrinet card per node? And it will offer better performance than our
MC - so I'm assuming that the choice of myrinet is ok. 

4. In terms of cache - we believe that the large cache on the
alpha's helps our performance quite significantly - as far as I can
determine the cache on the xeons is still 256/512K? Presumably this
won't make that much of a difference as we're scaling out across 8
nodes instead of 3?

Many thanks in advance,

