[Beowulf] What is rdma, ofed, verbs, psm etc?

Christopher Samuel samuel at unimelb.edu.au
Wed Sep 20 19:09:43 PDT 2017


On 21/09/17 01:03, Faraz Hussain wrote:

> Thanks Peter for the high level overview! A few followup questions. What
> if I am using a non-Infiniband cluster, i.e something with 10gigE.  Or
> even slower like at my home I have a raspbery pi cluster with 100 Mbps
> ethernet. Is ofed/psm/verbs all irrelevant?

Pretty much, yes, unless you've got fancy switches that can do RoCE.

> If so, what would their equivalents be? I assume RDMA is still applicable
> since I can run openmpi on these clusters.

No, Open-MPI will be using TCP/IP for communications on those, so you'll
pay the extra latency overhead for that.

> Another question, who is typically responsible for tuning ofed/psm/verbs
> etc on an Infiniband cluster? Is it generally the vendor who builds the
> cluster or the sys.admin?

It depends on the site & the install I suspect. We do all the OS
installs on our systems and so we (the sysadmin team) get to deal with that.

> My role has always been more user-facing
> application support. But I am wondering how much time I should invest in
> learning the inner workings of ofed/psm/verbs etc?

If you have the gear that can use it then it is worth understanding the
basics, even just doing some performance comparisons can be educational.
But of course you have to have the gear that can do this in the first place!

Best of luck,
Chris
-- 
 Christopher Samuel        Senior Systems Administrator
 Melbourne Bioinformatics - The University of Melbourne
 Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545



More information about the Beowulf mailing list