[Beowulf] ibswinfo, a tool to monitor unmanaged Infiniband switches
darren at wisecorp.co.uk
Thu Apr 30 14:24:07 PDT 2020
Nice one, I own an HP Voltaire 4036 which is managed but am still happy to checkout the github link.
Thanks very much for informing us as I'm sure it will be of huge use to others users, including myself.
On 30 April 2020 21:57:46 BST, Kilian Cavalotti <kilian.cavalotti.work at gmail.com> wrote:
>If your clusters use Infiniband, you know there are only two types of
>switches: managed or unmanaged. The former come with SSH, a web
>interface, SNMP and everything ; the latter come with LEDs.
>The only (and officially recommended) way to monitor unmanaged
>switches is to go take a physical look at their PSU and fan LEDs from
>time to time. Which is obviously not ideal for remote administration,
>monitoring or getting an alert when something's wrong.
>To solve that problem, we made a little shell script that does just
>that: get inventory data, status info, and metrics like fan speeds,
>temperatures or power usage from unmanaged Infiniband switches:
>It took a little reverse-engineering and a good amount of guessing,
>but it seems to work, it fits the need, and well... it's free. So
>we're happy to share it with everyone, in case it could be useful to
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>To change your subscription (digest mode or unsubscribe) visit
Sent from my Android device with K-9 Mail. Please excuse my brevity.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf