[Beowulf] Small files
gmpc at sanger.ac.uk
Fri Jun 13 01:32:35 PDT 2014
> I want to ask this general question: how does your shop deal with the
> general problem of
> small files in filesystems on (beowulf) compute clusters?
We have this workload in spades. As others have mentioned, good user
education is the key.
We use inode quotas on lustre (typically 150k -> 1M per user), to act as
a safety net to catch code that wants to generate billions of small
files before it poisons the filesystem.
We encourage people to hash files into nested directories to limit the
number of files in a single directory. (depressingly enough, we spent
the first part of this week tidying up an episode of
millions-of-files-in-a-directory, caused by our batch queueing system of
You should also ensure that the small files are not striped across
multiple OSTs; that really hurts performance. We set our filesystem
default to be "don't stripe".
Once you've done all of that, we find that small file performance is
reasonable. (ie fast enough that people are not actively complaining.)
Dr. Guy Coates, Informatics Systems Group
The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK
Tel: +44 (0)1223 834244 x 6925
Fax: +44 (0)1223 496802
The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
More information about the Beowulf