[Beowulf] filesystem metadata mining tools
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Lux, Jim (337C) james.p.lux at jpl.nasa.govSat Sep 12 16:02:10 PDT 2009
- Previous message: [Beowulf] filesystem metadata mining tools
- Next message: [Beowulf] filesystem metadata mining tools
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 9/12/09 8:10 AM, "Rahul Nabar" <rpnabar at gmail.com> wrote: > As the number of total files on our server was exploding (~2.5 million > / 1 Terabyte) I > wrote a simple shell script that used find to tell me which users have how > many. So far so good. > > But I want to drill down more: > > *Are there lots of duplicate files? I suspect so. Stuff like job submission > scripts which users copy rather than link etc. (fdupes seems puny for > a job of this scale) > > *What is the most common file (or filename) > > *A distribution of filetypes (executibles; netcdf; movies; text) and > prevalence. > > *A distribution of file age and prevelance (to know how much of this > material is archivable). Same for frequency of access; i.e. maybe the last > access stamp. > > * A file size versus number plot. i.e. Is 20% of space occupied by 80% of > files? etc. > Another useful application for such a tool would be to get better KLOC counts of source code trees. I find that our trees have lots of duplication among branches (e.g. Everyone has a "test.c" for unit test in with their modules, and all of them are pretty similar)
- Previous message: [Beowulf] filesystem metadata mining tools
- Next message: [Beowulf] filesystem metadata mining tools
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
