[Beowulf] building Infiniband 4x cluster questions
prentice at ias.edu
Fri Nov 11 06:18:37 PST 2011
On 11/11/2011 02:10 AM, Jon Tegner wrote:
>>>> Drivers for cards now. Are those all open source, or does it require
>>>> payment? Is the source released of all those cards drivers, and do
>>>> they integrate into linux?
>>> You should get everything you need from the Linux kernel and / or OFED.
>> You can also find the drivers on the vendors sites. Not sure about the rest, but for the Mellanox case it is open source and free - both for Linux and Windows
> I'm using Qlogic drivers, it works well, but has the drawback that I'm
> limited to the kernel required for those drivers (which since I'm using
> CentOS means that I can only use CentOS-5.5).
> Would there be any disadvantages involved in instead use the stuff from
> the kernel/OFED directly?
The Mellanox OFED drivers come as an .iso file. Inside that ISO image is
a script that will rebuild all the mellanox packages for newer/different
kernels. It's a couple of extra steps (mount iso image, extract script,
run it, yada, yada, yada), but it works very well.
But that's Mellanox, and your concerned about QLogic. I don't know how
the QLogic drivers are bundled, but look around the files provided to
see if there's a utility script that does the same thing for QLogic, or
at least instructions on how to recompile against different kernel
versions. Since OFED is open source, QLogic should provide the source
code to their drivers.
The only disadvantage with using stuff directly from the kernel/OFED is
that if you have newer cards with new features, the software to support
those new features may not have trickled down into the official OFED
distro or kernel, and then into your Linux distro of choice.
For example, my current cluster was installed in the fall of 2008, using
RHEL 5. The software/drivers in RHEL 5 worked just fine. Last year. I
added a couple of new nodes with GPUs that had newer Mellanox HBAs. They
wouldn't work with RHEL 5. I needed the OFED software provided from the
Mellanox site. I was hoping Mellanox's additions in that OFED distro
made it into RHEL 6, but I just upgraded my cluster nodes and still
needed to download the Mellanox OFED package. I'm sure by RHEL 7 or 8,
those HBAs will be supported directly by the distro.
This is a problem of multiple lags. The vendor makes changes to the
kernel/OFED software to support their latest technology, and then you
have your first lag as the vendor tries to get the changes merged into
the Linux kernel and the offiial OFED distro. Then there's a second lag
as those changes get merged into the different Linux distros. For a
conservative, stability-as-first-priority distro like RHEL, that second
lag can be really long. :(
More information about the Beowulf