[Beowulf] TOE on Linux?

Prentice Bisbal prentice at ias.edu
Mon May 19 12:29:07 PDT 2008


Prentice Bisbal wrote:
> Does anyone know of any network cards/drivers that support TOE (TCP
> Offload Engine) for Linux? A hardware vendor just told me that Linux
> does not support the TOE features of *any* network card.
> 
> Given Linux's strong presence in HPC and the value of having TOE in a
> cluster, I find that hard to believe.
> 
> 

Thanks to everyone who replied to my initial inquiry above. It would
take me all day to reply to all the feedback I got individually, so let
me do a group reply to all of you, and explain my situation:

We are in the process of purchasing a new computing cluster with dual
quad-core processors for 8 cores/node. We will be using InfiniBand for
message passing, and then using 10 Gb Ethernet to access the NFS file
system. This will be my first experience with IB and using multiple
networking technologies in parallel like this.

I was reading a white paper on InfiniBand vs. 10 GbE ( I should have
known better, because as we all know, "white paper" = "covert
advertising propaganda disguised as legitimate, unbiased discussion").
The white paper in question was this one:

http://www.linux-mag.com/id/4921

While there was no bias for one networking technology over the other (of
course not - Cisco sells both!), RDMA and TOE kept coming repeatedly as
desirable features to have. (Yes, I know that RDMA and TOE are
completely different things. Just saying bother were mentioned
repeatedly as performance enhancements). This quote in particular,
caught my attention:

"When used in combination with InfiniBand, or Ethernet NICs that have
RDMA and TCP offload engines (TOE), nearly all trasport protocol
processing and data movement can be offloaded from the central CPU to
the interface hardware, thereby realizing signficant performance gains"

And that's what made me think I needed TOE in my new cluster. Of course,
this paper was written entirely from a networking point of view, not
from the host, O/S or sysadmin points of view. All those points of view
seem to say "TOE on Linux = bad", based on my reading this past week.

I am now convinced that TOE is unnecessary and probably a PITA to
administer.

My next questions:

1. Is having 10 GbE and Inifiniband in the same cluster overkill, or at
least unorthodox?  This cluster will be used by a variety of users
running a variety of different codes, so the response "depends on your
app" is meaningless. We're trying to build the best "one-size-fits-all"
cluster to accomodate a very wide variety of applications.

2. I've read some about RDMA. Is it difficult to setup? What do I need
to us it? Certain MPI implementations?  Certain kernel modules? Certain
NICs/NIC drivers? A URL to a how-to would be sufficient. I'm sure I
could find one on my own, but I'm interested in the discussion here.

--
Prentice




More information about the Beowulf mailing list