[Beowulf] single machine with 500 GB of RAM

Wed Jan 9 09:00:57 PST 2013

As its a single thread I doubt that faster memory is going to help you much. It's going to suck whatever you do.

Am 9 Jan 2013 um 17:29 schrieb Jörg Saßmannshausen <j.sassmannshausen at ucl.ac.uk>:

> Dear all,
> 
> many thanks for the quick reply and all the suggestions.
> 
> The code we want to use is that one here:
> 
> http://www.cpfs.mpg.de/~kohout/dgrid.html
> 
> Feel free to download and dig into the code. I am no expert in Fortran so I 
> won't be able to help you much if you got specific questions to the code :-(
> However, my understanding is that it will only run on one core/thread. 
> 
> As for the budget: That is where it is getting a bit tricky. The ceiling is 
> 10k GBP. I know that machines with less memory, say 256 GB, are cheaper, so 
> one solution would be to get two of the beast so we can do two calculations at 
> the same time. If there are enough slots free, we could upgrade to 500 GB once 
> we got another pot of money. 
> 
> I guess I would go for DDR3, simply as it is faster. Waiting 2 weeks for a 
> calculation is no fun, so if we can save a bit of time here (faster RAM) we 
> gain actually quite a bit here. 
> 
> I am not convinced with the AMD Bulldozer to be honest. From what I understand 
> the Sandybridge has the faster memory access (higher bandwidth). Is that 
> correct or do I miss out something here.
> 
> I gather that the idea of just using one CPU is not a good one. So we need to 
> have a dual CPU machine, which is fine with me. 
> 
> I am wondering about the vSMP / ScaleMP suggestion from Joe. If I am using an 
> InfiniBand network here, would I be able to spread the 'bottlenecks' a bit 
> better? What I am after is, when I tested out the InfiniBand on the new cluster 
> we got, I noticed that if you are running a job in parallel between nodes, the 
> same amount of cores are marginally faster. At the time I put that down due to 
> a slightly faster memory access as there was no bottleneck to the RAM. 
> I am not familiar with vSMP (i.e. I never used it), but is it possible to 
> aggregate RAM from a number of nodes (say 40) and use it as a large virtual 
> SMP? So one node would be slaving away with the calculations and the other 
> nodes are only doing memory IO. Is that possible with vSMP?
> In a related context, how about NUMAScale?
> 
> The idea of the aggregates SDD is nice as well. I know some storage vendors 
> are using a mixture of RAM and SDD for their meta-data (fast access) and that 
> seems to work quite well. So that would be a large swap file / partition or is 
> there another way to use disc-space as RAM? I need to read the paper of 
> NVMalloc I suppose. Is that actually used or is that just a good idea and we 
> got a working example here?
> 
> I don't think there is much disc IO here. There is most certainly no network 
> bound traffic as it is a single thread. A fast CPU would be of advantage as 
> well, however, I gut the feeling the trade-off would be the memory access speed 
> (bandwidth).
> 
> I have tried to answer the questions raised. Let me know whether there are 
> still some unclear points. 
> 
> Thanks for all your help and suggestions so far. I will need to digest that.
> 
> All the best from a sunny London
> 
> Jörg
> 
> -- 
> *************************************************************
> Jörg Saßmannshausen
> University College London
> Department of Chemistry
> Gordon Street
> London
> WC1H 0AJ 
> 
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
> 
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf