Block Sizes

Robert G. Brown rgb at phy.duke.edu
Thu Oct 31 08:36:54 PST 2002


On Thu, 31 Oct 2002, Leandro Tavares Carneiro wrote:

> Hi,
> 
> 	I work here with an cluster beowulf running an "home-made" parallel 
> aplication, using MPI, and we have some problems of local disk IO.
> 	The aplication needs to load a lot of data information to process the real 
> data, and the development team don't know how to make the read more eficient. 
> They wanna change the block size when reading and writing, but they don't are 
> "IO experts", and they are having a lot of trouble with it.
> 	The application is writen in Fortran, and the development team sayd in  this 
> progamming language the change of block size is not possible, but they believe 
> be possible in C. Anyone have faced this kind of problem?
> 	This application runs also in an SGI machine with 32 cpus and the problem is 
> the same...
> 	If anyone have some tip to give to me, i will be very grateful!
> 
> 	Regards,
> 
> PS: Sorry about my poor english.

You have at least a limited ability to tune filesystem block sizes and
other parameters for good high bandwidth performance at the kernel
level, before fortran or anything else sees it.  Read:

  man hdparm
  man tune2fs
  man mke2fs

I think you'll want to create your partitions with -T largefile4, 4K
blocksize (the maximum, which I recall is also the kernel page size
which is probably WHY it is the maximum) which allocates one inode for
every 4 MB of disk.

There are also wsize,rsizes that can be adjusted in nfs/fstab that
adjust how data is buffered and written back to NFS mounts.

If you want to improve and tune any system with this sort of thing, be
sure to really read all about it and proceed with extreme caution -- a
lot of things you might try will destroy your filesystem(s) or totally
wreck performance.  The default tuning is probably a lot more efficient
than you think, and the advantage of monkeying with it probably minimal.

Is fortran really all that inefficient at reading in blocks from disk?
Could you do something like build a big ramdisk and preload the files
into it and read them from there?  Disk itself is always "slow" in terms
of random access latency and kernel overhead on a stat and open, but a
fast scsi or ide disk can deliver large datafiles to an application at
10's of MB/sec which isn't totally horrible.  Except compared to memory,
of course.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list