[Beowulf] single machine with 500 GB of RAM
j.sassmannshausen at ucl.ac.uk
Wed Jan 9 08:29:58 PST 2013
many thanks for the quick reply and all the suggestions.
The code we want to use is that one here:
Feel free to download and dig into the code. I am no expert in Fortran so I
won't be able to help you much if you got specific questions to the code :-(
However, my understanding is that it will only run on one core/thread.
As for the budget: That is where it is getting a bit tricky. The ceiling is
10k GBP. I know that machines with less memory, say 256 GB, are cheaper, so
one solution would be to get two of the beast so we can do two calculations at
the same time. If there are enough slots free, we could upgrade to 500 GB once
we got another pot of money.
I guess I would go for DDR3, simply as it is faster. Waiting 2 weeks for a
calculation is no fun, so if we can save a bit of time here (faster RAM) we
gain actually quite a bit here.
I am not convinced with the AMD Bulldozer to be honest. From what I understand
the Sandybridge has the faster memory access (higher bandwidth). Is that
correct or do I miss out something here.
I gather that the idea of just using one CPU is not a good one. So we need to
have a dual CPU machine, which is fine with me.
I am wondering about the vSMP / ScaleMP suggestion from Joe. If I am using an
InfiniBand network here, would I be able to spread the 'bottlenecks' a bit
better? What I am after is, when I tested out the InfiniBand on the new cluster
we got, I noticed that if you are running a job in parallel between nodes, the
same amount of cores are marginally faster. At the time I put that down due to
a slightly faster memory access as there was no bottleneck to the RAM.
I am not familiar with vSMP (i.e. I never used it), but is it possible to
aggregate RAM from a number of nodes (say 40) and use it as a large virtual
SMP? So one node would be slaving away with the calculations and the other
nodes are only doing memory IO. Is that possible with vSMP?
In a related context, how about NUMAScale?
The idea of the aggregates SDD is nice as well. I know some storage vendors
are using a mixture of RAM and SDD for their meta-data (fast access) and that
seems to work quite well. So that would be a large swap file / partition or is
there another way to use disc-space as RAM? I need to read the paper of
NVMalloc I suppose. Is that actually used or is that just a good idea and we
got a working example here?
I don't think there is much disc IO here. There is most certainly no network
bound traffic as it is a single thread. A fast CPU would be of advantage as
well, however, I gut the feeling the trade-off would be the memory access speed
I have tried to answer the questions raised. Let me know whether there are
still some unclear points.
Thanks for all your help and suggestions so far. I will need to digest that.
All the best from a sunny London
University College London
Department of Chemistry
email: j.sassmannshausen at ucl.ac.uk
Please avoid sending me Word or PowerPoint attachments.
More information about the Beowulf