[Beowulf] Small files

Joe Landman landman at scalableinformatics.com
Fri Jun 13 06:31:35 PDT 2014


On 06/13/2014 09:17 AM, Skylar Thompson wrote:
> We've recently implemented a quota of 1 million files per 1TB of
> filesystem space. And yes, we had to clean up a number of groups' and
> individuals' spaces before implementing that. There seems to be a trend
> in the bioinformatics community for using the filesystem as a database.

I wasn't going to say anything about this, but, yes, there are some 
significant abuses of file systems going on in this community.  But this 
is nothing new, sadly ...  I've seen this since the late 90's.

Even worse abuses were in parallelism with a specific subset of this 
community ... but I'd rather not go down those rabbit holes (again).


> I think it's enabled partly by a lack of knowledge of scaling and
> speedup in the community, since so much stuff still runs on laptops and
> desktops. I'd really like to teach a basic scientific computing class at
> work to address those concepts, but that would take more time than I
> have right now.

One of the more interesting things we run into is when we set up a big 
and insanely fast storage system, and then someone wants to do byte 
sized IO to it.  This isn't just the bioinfo community, this is across 
all disciplines.

I did teach a graduate course on HPC programming at my alma mater about 
a decade ago.  Covered parallelism, optimization, and gave rough rubrics 
for how to write code that made effective use of the machine resources. 
  I had face-palm moments when one of the kids told me he didn't know C, 
but could work in C++.  Now-a-days we'd be lucky to find anyone whose 
minds were not polluted by Java + other bad-for-hpc things.




-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics, Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
twtr : @scalableinfo
phone: +1 734 786 8423 x121
cell : +1 734 612 4615


More information about the Beowulf mailing list