[Beowulf] automount on high ports

Wed Jul 2 05:31:15 PDT 2008

Carsten Aulbert wrote:

>> The clients are connecting from ports below 1024 because Berkeley set
>> up a hack in the original BSD stack so that only root could open ports
>> below 1024. This way, you could "know" the process on the remote host
>> was a root process, thus you could feel "secure" [sic]. It doesn't add
>> any real security any more, but it is also not the cause of any
>> problem you are experiencing.
> 
> We might run out of "secure" ports.

But you can force NFS to connect from the ports above 1024 so this 
shouldn't be an issue.

[...]

> OK, we have 1342 nodes which act as servers as well as clients. Every

There is a short writeup on this with quotes from Bruce Allen in 
HPCwire.  Too bad you didn't opt for JackRabbits there :)

> node exports a single local directory and all other nodes can mount this.

Fine, nothing terrible.

> 
> What we do now to optimize the available bandwidth and IOs is spread
> millions of files according to a hash algorithm to all nodes (multiple
> copies as well) and then run a few 1000 jobs opening one file from one
> box then one file from the other box and so on. With a short autofs

Hmmm....  So you want to "track" spatial metadata (e.g. where the file 
is) according to some hash function that each node can execute, and then 
once this is known, perform IO.

So, for example (as a relatively naive/simple minded version) some quick 
Perl pseudo-code ...

	# ....
	my $hash	= MD5SUM($filename);
	my $machine	= $hash % $Number_of_machines;
	my $machine_name= $name[$machine];
	my $full_path	= sprintf("/%s/%s",$machine_name,$filename);
	open(my $fh, ">".$full_path) or die "FATAL ERROR: unable to
		open $full_path\n";
	# ....

Is this about right?

> timeout that ought to work. Typically it is possible that a single
> process opens about 10-15 files per second, i.e. making 10-15 mounts per
> second. With 4 parallel process per node that's 40-60 mounts/second.

Hmmm ... mount latency we have seen is ~0.1 seconds or so, so I can 
believe 10-14/second.  Note that due to strange latency effects in 
larger machines, we have also seen an automount take 0.5 seconds and 
more.  Some delays due to name resolution.  Never fully traced it, but 
this was on a 32 node cluster.  You are talking a little bigger.

> With a timeout of 5 seconds we should roughly have 200-300 concurrent
> mounts (on average, no idea abut the variance).

200-300 mounts across 1342 nodes, sure.  200-300 mounts of one file 
system on one server from 200-300 client machines?  I have some doubts ...

> Our tests so far have shown that sometimes a node keeps a few mounts
> open (autofs4 problems AFAIK) and at some point is not able to mount
> more shares. Usually this occurs at about 350 mounts and we are not yet
> 100% sure if we are running out of secure ports.

Older kernels couldn't do more than 256 mounts.  Not sure when/if this 
limit has been raised.  This is a different problem though.  If you have 
   N machines mounting a file system, then you get N requests on port 
2049 or similar (the inbound NFS port).  You don't run out of secure ports.

If the issue is that you are running 200+ outgoing mount requests from 
one machine, you will likely have a delay issue as you cross the 256 
mount number (if your kernel hasn't been patched ... not sure if/when 
this has/will change).

> All our boxes export now with "insecure" option (NFSv3), but our clients
> all connect from a "secure" port, anyone here who might give us a hint
> how to force this in Linux?

See if you can get less than 256 mounts working well.  If so, and it 
only starts falling off above 256 mounts, this would be important to know.

Joe

> 
> Thanks a lot
> 
> Carsten
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
        http://jackrabbit.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 866 888 3112
cell : +1 734 612 4615