[Beowulf] First cluster in 20 years - questions about today

Renfro, Michael Renfro at tntech.edu
Sun Feb 2 10:23:31 PST 2020


I don’t see anything wrong in Jonathan’s advice. I’d also add that for computational chemistry, depending on model size and CPU, a gigabit network could be a bottleneck if you use a multi-node solver. I’ve seen bottlenecking on small models even with FDR Infiniband. GPUs could also be useful for compatible solvers.
[cid:B2011C99-E424-4E05-8D0A-085065E03EB0]

[cid:6F2D7F18-0750-45A4-9218-9E4559C022B6]

(From https://its.tntech.edu/display/MON/HPC+Sample+Job%3A+NAMD)

So heterogeneity may not be a problem if you stick with large-core OpenMP jobs. A scheduler like Slurm could be useful if you want to stack up a bunch of jobs in advance. Cluster management software might not be strictly required for a few-node single-user cluster, but something like OpenHPC isn’t too hard to get running.

If you were going to keep the old (2-core?) nodes around, I’d probably turn them into a storage cluster (Ceph, Gluster, etc). No idea if their power draw is too high to be worthwhile, though.

--
Mike Renfro, PhD  / HPC Systems Administrator, Information Technology Services
931 372-3601<tel:931%20372-3601>      / Tennessee Tech University

On Feb 1, 2020, at 9:21 PM, Mark Kosmowski <mark.kosmowski at gmail.com> wrote:

I've been out of computation for about 20 years since my master degree.  I'm getting into the game again as a private individual.  When I was active Opteron was just launched - I was an early adopter of amd64 because I needed the RAM (maybe more accurately I needed to thoroughly thrash my swap drives).  I never needed any cluster management software with my 3 node, dual socket, single core little baby Beowulf.  (My planned domain is computational chemistry and I'm hoping to get to a point where I can do ab initio catalyst surface reaction modeling of small molecules (not biomolecules).)

I'm planning to add a few nodes and it will end up being fairly heterogenous.  My initial plan is to add two or three multi-socket, multi-core nodes as well as a 48 port gigabit switch.  How should I assess whether to have one big heterogenous cluster vs. two smaller quasi-homogenous clusters?

Will it be worthwhile to learn a cluster management software?  If so, suggestions?

Should I consider Solaris or illumos?  I do plan on using ZFS, especially for the data node, but I want as much redundancy as I can get, since I'm going to be using used hardware.  Will the fancy Solaris cluster tools be useful?

Also, once I get running, while I'm getting current with theory and software may I inquire here about taking on a small, low priority academic project to make sure the cluster side is working good?

Thank you all for still being here!
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200202/9b1c5e8e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ibverbs-speedup.png
Type: image/png
Size: 66543 bytes
Desc: ibverbs-speedup.png
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200202/9b1c5e8e/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 3M.png
Type: image/png
Size: 84068 bytes
Desc: 3M.png
URL: <http://beowulf.org/pipermail/beowulf/attachments/20200202/9b1c5e8e/attachment-0003.png>


More information about the Beowulf mailing list