nbd/nfs speeds - using buffers instead of network

Velocet math at velocet.ca
Mon Jun 4 12:41:07 PDT 2001

I've got our prototype cluster (8 nodes, expanding to 40 odd later)
booted on RPL+DHCP/BOOTP on these PcChips M810 boards... Linux 2.4.

Even with NFSv3 we're only getting about 3Mb/s at max xfer across
to NFS. Not sure what the problem is (we're using a RAID but we're gonna
go to a buncha raid 0 ATA-66 drives for scratch files - the raid maxes
out at 6Mb/s for now).

One thing we're gonna try is NBD. This allows remote mounting of a file
on the server (via a server side daemon listening on a port, or via
inetd) as a filesystem. There are many interesting possibilities with this
which are quite useful to beowulf admins. (raid0 across a network! ;)

However the NBD device just freezes whenever I write to it... cant even
make a filesystem on it. Im using pavel's original nbd-client
and server stuff, and the only other package that I can find that might
do this stuff is the Enhanced NBD on Freshmeat, but the code wont compile.
(I have a 2.4.2 kernel and my lots of header files seem to be totally
incompatible with the code. I am not enough of a code guru to wade through
all the fixes required).

The Nbd-server stuff at SCYLD (ironically enough) doenst exist on their
FTP server ('file not found'). I've asked the maintainer of the code
for help (no reply yet, but then again I just emailed a few hours ago).

So a couple things to ask here:

1) has someone got the nbd client and server code? not pavel's versions, I
know they dont work for me. Perhaps alternate versions wont work either,
but I dont know where to start to figure this out.

2) I need some other way to keep the scratch files off the network. I Have
256Mb in my prototype nodes, but we'll be increasing that to 512Mb. I
found under FreeBSD the MFS filesystem (memory fs) works quite well
because it writes to ram when needed (albeit via useland - costs about
1.5% cpu overall for my types of jobs), but when it goes beyond actual
core, it will swap - in my case it would be to the NFS server, which is fine -
only machines that have full MFS's would write at all, and we've calculated
things to be such that this is rare. I Have the network setup to be able
to handle the average load (which is about .25 to .5 Mbps average per node).
Ie the scratch files generally fit into buffers/local core 

Aside: seems to me I should let g98 manage the ram itself and avoid 
needing to write scratch files anyway - but there will always be a situation
where the scratch files may come out larger than local core and I;ll need
a backup system to handle it, and I need more than 3Mb/s (or 600K/s which
is what I see g98 getting in trafshow).

There's no MFS in Linux unfortunately, but I need something like it.

Ramdisk doesnt work because when its full, its full and there's catastrophic
failure (unless g98 understands alternate scratch areas? I havent checked
that now that i think of it).

NFS or NBD could both work here - except NBD has the potential to run
far faster I think - the key to the whole thing however is ensuring local
buffers are used for r/w instead of syncing back to the server all the time
for a read on a recently written file. 

To compound things Linux 2.4 is bitching about locking on the FreeBSD 4.3
NFSv3 raid server - I found things will only work with -o nolock on my Linux
mounts - I think this may not allow me to use local buffers for rereading
recently written scratch files from buffer and avoid abusing the network.

I guess I can fix this in one of several ways:

- get NFSv3 locking working to allow local buffers for writes/rewrites

- get nbd working with extremely long sync/kupdate times so it almost
never syncs back buffers across the net unless they're full

Anyone got other suggestions?

Im doing some bonnie tests here on the raid I see from one of the nodes and Im
getting respectable speeds - 10MB/s for small files - which is higher than I
get on local bonnies on the raid server itself (6MB/s max write speed of the
disks) - with larger test volumes (larger than free mem/max possible buffers)
I get slower speeds, which suggests some buffering is going on for
writing/rewriting which is what I need.

(slightly odd stats below, but... could be the raid was being used... others
have access to it, Im only storing log/source files there, scratch soon moving
as I said to raid 0 ata-66 or -100):


              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
n1   1*   5 10406 85.4  8857  3.5   910  0.2 11842 87.9 154421 90.5 623.4  5.5
n1   10*  5  1074  8.5  3527  1.6   837  0.6 11355 86.3 194485 87.4 693.0  5.9
n1   1* 100  3896 30.7  3800  2.2   835  0.6 11441 86.0 191955 84.4 459.5  3.4

and another try (hmm locking not an issue anymore?)


              -------Sequential Output-------- ---Sequential Input-- --Random--
              -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine    MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  /sec %CPU
n1    1*   5 10199 85.7  8833  6.9   906  0.7 11811 90.0 154012 60.2 587.2  5.3
n1    1* 100  2255 17.7  3447  2.0   834  0.6 11531 86.0 192423 86.4 474.0  3.9

Ken Chase, math at velocet.ca  *  Velocet Communications Inc.  *  Toronto, CANADA 

More information about the Beowulf mailing list