Because XFS is BETTER (Re: opinion on XFS)

Robert G. Brown rgb at phy.duke.edu
Thu May 9 09:08:08 PDT 2002


On Thu, 9 May 2002, Eray Ozkural wrote:

> How come you haven't heard many horror stories with EXT2? Almost anybody who's 
> done some heavy coding or busy servers on linux would know that EXT2 is very 
> likely to cause you more headaches, downtime, and corruption, than, say, FAT 
> filesystem.

Almost anybody?  Eray, I just don't get this one.  Until the start of
this thread, I'd NEVER heard of people complain of ext2 file corruption,
attributable to ext2's design itself, under any kind of load at all,
light or heavy, EXCEPT for specific, fairly well known, kernel snapshots
(like 2.2.0 and 2.4.0:-), in interaction with some other component like
NFS, or in interaction with questionable/unstable device drivers or
hardware.  I don't know what you mean by "heavy coding" or "busy
servers", so I don't know if we qualify on either count.  We run servers
of every description and sometimes their loads are "quite heavy" and we
have never encountered any such problem, nor has anybody I've ever
communicated with in the linux world, which by now includes, well, a
fairly large set (another one of those indeterminate metrics:-).

However, that too is anecdotal and "quite heavy" means as little as
"heavy coding" -- is there some objective metric of load (derived, for
example, from tiobench or lmbench) beyond which you would assert "almost
anybody" would encounter ext2 problems?  We routinely use tiobench to
test disk setups for speed and stability before putting them into
production, sometimes for days, both locally and over NFS -- our primary
NFS servers are under "quite heavy" (bursty, poissonian) use by a lot of
workstations and we naturally have high requirements for stability and
performance, which ext2 has always delivered.  We drive the load
averages of the "servers" so being tested up into the tens during the
tests.  We have had webservers with load averages up in the 100's (in
the last few days:-).  No corruption.

If you have a setup for which you CAN positively identify a tiobench
test that reproducibly leads to file corruption, you should report it on
the kernel list -- that's what it is for.  Sometimes when one does that
(and includes the details of one's hardware configuration) one learns
that there are known problems with some element of one's hardware
(motherboard, for example) that are likely responsible.  Other times one
does indeed report a new problem, which it is very important for
EVERYBODY that you do.

In still other cases the very act of reporting reveals that (for
example, I'm not suggesting that any of these describe your setup;-) the
motherboard(s) being used is overclocked, the cables are too long or
termination is suspect, the kernel involved has some experimental
drivers added that could be corrupting things under load.  In cases like
this, the response is usually something like: Don't overclock your
system; use cables that are spec length, replace your terminators (which
should be a knee-jerk response for anybody who has trouble on a scsi
chain), prove the problem exists for a stock vanilla kernel and not a
kernel with a custom addition that might be corrupting all sorts of
things including the ext2 stack.

At the very least, reporting the problem creates the possibility that
the problem be fixed.  Note well that simply dropping in XFS and having
it work doesn't prove that there is any real problem with ext2.  A
computer is a complex system with all sorts of nonlinearly interacting
dynamic entities, and tiny changes in latencies can make big
differences, or XFS could indeed be more robust to problems ELSEWHERE in
the kernel or hardware configuration.  Even the order (ext2 built into
the kernel, xfs added as a module) creates a different arrangement in
kernelspace memory and corruption due to some other entity might be
differently reflected.

ext2 was indeed a pain because of fsck, especially for very large disks
with lots of small files (where it could take a LONG time) --
journalling is better.  I am therefore greatly enjoying ext3.  Still, I
have never heard anybody suggest that ext2 was more than momentarily
unstable as a kernel revision happened to break it (instantly reported
by a few hundred people and VERY quickly fixed).  One reason (or so I
have heard) that Tweedie took so long to release ext3 is that he was
absolutely religious about ensuring that it "inherit" the tremendous
stability of ext2 and so that one could be assured of being able to go
ext2 <=> ext3 in both directions at will and without risking a loss of
data.

  rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu






More information about the Beowulf mailing list