Large Files on IA32

Donald Becker becker at scyld.com
Sun Jul 29 19:08:32 PDT 2001


On Mon, 30 Jul 2001, [iso-8859-1] Jakob Østergaard wrote:
> On Sun, Jul 29, 2001 at 04:18:48PM -0400, Donald Becker wrote:
> > On Sat, 28 Jul 2001, [iso-8859-1] Jakob Østergaard wrote:
> > 
> > > The 2GB limit is *not* within ext2.  If the filesystem is all you change, the
> > > 2GB limit will stay.
> > > 
> > > The limitation is in VFS, the virtual file-system layer on top of *any*
> > > filesystem you put in Linux.
...

By the way, the LFS page was accidentally moved from its orignal location
  http://www.scyld.com/software/lfs.html
to
  http://www.scyld.com/products/beowulf/software/lfs.html

> What I wanted to say was: Just changing the FS won't do it.  I didn't
> mean to say "fixing VFS alone will solve all the problems in the
> world".  I'm sorry if it came out that way :)

> > > The 2.4 kernels do not have this limitation anymore.
> > The 2.4 kernels have the same 32 bit block offset limit.
> Now what does that mean ?

I should have been a litle more clear.

We have "64 bit LFS support", but the "64 bit" part is only the user
level code.  The current kernels only supports 41 or 42 bits of file
offset -- 2TB.

This is really a pedantic point.  There are very few people that
legitimately have a single 2TB file.  By the time they terabyte files
are common, we will be using 64 bit machines and "LFS" will be
irrelavent.  But now that I've brought it up..

The kernel filesystems works with blocks devices.  The offset parameter is
an 'int' (32 bits), but refers to a whole block.

Linux was originally based on the i386 and IDE disks.  The kernel block
size matches the IDE disk hardware block size of 512 bytes.  The first
file system paired two IDE blocks into file system block size of 1024
bytes, so there is code with a 1KB base block size.

Changing the FS-independent kernel code to use 64 bit offsets is
non-trivial, but it only needs to be done in one place.  There are a
whole slew of filesystems, and they make various internal assumptions.
A common issue is that they put the block offset into a 32 bit value for
doing physical device reads.  That limits the file size to 32+9 or 32+10
bits.


If only Merced had come out 1996, as originally scheduled.  Or even
1999.  Every competitor would have a 64 bit architecture as well.
Anyone with a 2GB file would already be using a 64 bit architecture, and
we wouldn't need talk about LFS support.  ...at least we will all be
dead before the people start complaining about 64 bit file offsets.

> [root at phoenix /backup]# uname -a 
> Linux phoenix 2.4.7 #1 SMP Sat Jul 21 17:32:55 CEST 2001 i686 unknown
> [root at phoenix /backup]# ll stuff_here 
> -rw-r--r--    1 root     root     3153758271 Jul 30 01:36 stuff_here
> 
> Stock 2.4 kernel fresh off kernel.org, files in excess of 2GB no problem.

The previous limit wasn't 2GB (2^31), it was 4GB.
There is a signed/unsigned boundary at 2GB, so a LFS test suite should
have a few files +- a few bytes and blocks around this boundary.
The real boundary is at 4GB.

The other aspect that I alluded to was sparse vs. dense files.  The quick
way to create a 5GB file is to seek to +5GB and write a block of data.
The intermediate disk blocks are not actually allocated, and will read
as zeros.  This allows you to create a huge file even on a small
filesystem.

These "sparse" files have a completely different code path in the kernel
than regular dense files.  This is a case where validation testing
requires knowing the implemenation details.  The test suite needs to
include both sparse and dense files around the different boundary cases
-- something that some of the other 2.2 LFS implementations apparently
forgot.

Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993





More information about the Beowulf mailing list