[Beowulf] Small files

John Hearns hearnsj at googlemail.com
Thu Jun 12 04:12:47 PDT 2014


Tom,
I agree with you regarding small files.
In my case, I manage a DMF (SGI Data Migration Facility) setup.
I was concerned at the amount of small files which we were storing - in
terms of the size of the database files, and storing small files to tape.
SGI engineers reassured me that the system will happily cope with millions
of files, and does so on many sites.
DMF also waits till a large 'chunk' is to be written to tape, ie small
writes are queued up.

However, when watching the amount of files being pushed to the tape tier
one day I noticed something like 10 000 files or more  from one user.
Cue the application of a LART.
Seriously though - I did have a word and he agreed to zip up all the small
PNG files his project was generating.

I have a general policy here that when lots of small files are generated
then the directory is zipped up and the zip files is stored.
We have codes which generate lots of zip files which are stitched together
into movies, and we also store wind tunnel data which is again
lots of PNG files. It is unlikely that anyone would ever want the raw data
files again, but if they should do then an unzip is easy.


> Do you distinguish and segregate them (and/or the people that use them)
on special
> hardware/filesystems?
Suggest you invest in a LART.  http://dictionary.reference.com/browse/lart




On 12 June 2014 11:43, Reuti <reuti at staff.uni-marburg.de> wrote:

> Hi,
>
> Am 11.06.2014 um 21:03 schrieb Tom Harvill:
>
> > This is my first time posting to this list, thanks in advance for any
> time you spend
> > replying.
> >
> > We've found that a large majority of our files (~40MM of ~50MM) are less
> than 10KB.
> > We believe our filesystem (lustre) is bottlenecked with IOPs and locking
> related to
> > jobs running against these files.  We have ~700TB usable storage with
> ~500TB consumed,
> > almost all consumption is by a relatively small number of very very
> large files.
>
> What data is represented in 10KB: binary or ASCII data - would it work to
> put it in a database instead of all these single files? How do you access
> the files: by some kind of index, name, directory...?
>
> -- Reuti
>
>
> > I want to ask this general question: how does your shop deal with the
> general problem of
> > small files in filesystems on (beowulf) compute clusters? Specifically,
> files that users expect
> > to actively use for read and write operations for their research.
> >
> > Do you distinguish and segregate them (and/or the people that use them)
> on special
> > hardware/filesystems?
> >
> > Thanks!
> > Tom
> >
> > Tom Harvill
> > Holland Computing Center
> > University of Nebraska
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20140612/500d15c8/attachment.html>


More information about the Beowulf mailing list