[Beowulf] Big storage

Sat Sep 15 01:37:03 PDT 2007

[That will be my last message on fsprobe.]

According to Bogdan Costescu:
[...]
>
> I might be dense after holiday, but I still don't get the reasons for
> such an interest in running fsprobe. I can see it being used as a
> burn-in test and to prove that a running system can write then read
> data correctly, but what does it mean about the data that is already
> written or about the data that is in flight at the time fsprobe is
> run? (someone else asked this question earlier in the thread and
> didn't get an answer either)
>
I guess that would be me.

I did a "risk & cost analysis" for *my* environment and found that
fsprobe cost is not balanced by enough useful returns.
Therefore, we are not running fsprobe on our X4500s since it is
actually less useful than "zpool scrub" for detecting corruptions or
problems on data.
Even fsprobe's author mentions something along this line in his slide
on possible solutions against silent data corruptions ("ZFS has a
point").

There's nothing magic in fsprobe, there might be data corruption even
if it doesn't detect some.
The steady state of the filesystems on most of our disk servers
(including the X4500s) is 95% to 99% full.
fsprobe is not actually useful in such a case since it will only test a
small portion of the disks (and due to unbalanced ZFS vdevs some disks
will be probably not be tested *at all* on the X4500s).

And I've probably not mentionned this on the list, but CERN IT
departement estimated that a 20% increase of CPU power is necessary to
cope with fsprobe (or similar tools).

>
> 			       How is fsprobe as a burn-in test better
> than, say, badblocks ?
> I am genuinely interested in these answers because I have written a
> somehow similar tool 5-6 years ago to test new storage, simply because
> I didn't trust enough the vendors' burn-in test(s). My interest was a
> bit larger in the sense that apart from data correctness I was also
> checking the behaviour of FS quota accounting (by creating randomly
> sized files with random ownership) and of the disk+FS in face of
> fragmentation (by measuring "instantaneous" speed). But I never saw
> the potential usage by other people mainly because I could not find
> answers to the above questions, so I never thought about making it
> public... and now it's too late ;-)
>
Same here, "been there done that".

> There is another issue that I could never find a good answer to: how
> much testing a storage device should withstand before the testing
> itself becomes dangerous or disturbing ? Access by the test tool
> requires usage of resources: sharing of connections, poluting of
> caches, heads that have to be moved. For example, for the 1.something
> GB/s figure that was mentioned earlier in this thread, would you
> accept a halving of the speed while the data integrity test is being
> run ? Or more generally, how much of the overall performance of the
> storage system would you be willing to give up for the benefit of
> knowing that data can still be written and then read correctly ? And
> sadly I miss some data in the results that Google and others published
> recently: how much were the disks seeking (moving heads) during their
> functioning ? I imagine that it's hard to get such data (should
> probably be from the disk as opposed to kernel, as firmware could
> still reorder), but IMHO is valuable for those designing multi-user
> storage systems where disks move heads frequently to access files
> belonging to different users (and therefore spread on the disk) that
> are used "simultaneously".
>
These are exactly some of my points.

Loïc.
-- 
| Loïc Tortay <tortay at cc.in2p3.fr> -     IN2P3 Computing Centre     |