[Beowulf] cluster authentication part II
Lux, Jim (337K)
james.p.lux at jpl.nasa.gov
Tue Jan 16 17:20:25 PST 2018
From: Beowulf [mailto:beowulf-bounces at beowulf.org] On Behalf Of Tony Brian Albers
Sent: Monday, January 15, 2018 10:04 PM
To: beowulf at beowulf.org
Subject: Re: [Beowulf] cluster authentication part II
On 01/16/2018 12:35 AM, Jörg Saßmannshausen wrote:
> Dear all,
> reading the Cluster Authentication (LDAP,AD) thread which was posted
> at the end of last year reminds me of a problem we are having.
> For our Ubuntu 14 virtual machines we are authenticating against AD
> and I am using the nslcd daemon to do that.
> This is working very well in a shell, i.e. when I am doing this in a shell:
> Does anybody has some ideas of where to look at? It somehow puzzles me.
> I am a bit inclined to say the problem is within Ubuntu 14 as the
> cluster is running CentOS and my Debian chroot environment ist Stretch.
> All the best from London
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing To change your subscription (digest mode or unsubscribe)
> visit http://www.beowulf.org/mailman/listinfo/beowulf
This might be caused by latency in hostname lookups. How do the machines know one another? DNS is generally fine, but to check I'd try to put the client in the server's (AD server that is) hosts file and put the AD server and any other machines called during login(maybe for autofs or something like that) in the client's hosts file. At least that will tell you whether the thing is DNS related.
Also, when ssh'ing in from another machine, try to put both machines'
fqdn and shortnames in their hosts files.
I know that this might seem odd, since stuff just works when logged in, but there's a lot of stuff going on during login that depends on hostname resolution if you have external services (AD authentication etc.)
In my "beaglebone cluster", I've found that this kind of thing has a huge effect even with vanilla ssh. By default, the Debian distro supports the zeroconf network name resolution ".local" hostname stuff (bonjour in apple-land), so you can get fooled into thinking that it knows how to resolve names (because it works sometimes), but then it mysteriously takes longer (e.g. some local cache of IP address to hostname is obsolete, but it tries the old IP address for a while). Using my Macbook as the "cluster controller" and running pdsh, sometimes it would work, sometimes it wouldn't. (depending on what I've done before, and what is "remembered" by MacOS )
So putting all those hostnames into hosts files, and appropriate rules in the sshd config files, etc. makes a world of difference. Now, "it just works".
$pdsh -w beagle[1-8] some command
Works just fine
More information about the Beowulf