[Beowulf] RE: [Bioclusters] FPGAin bioinformatics clusters (again?)

Robert G. Brown rgb at phy.duke.edu
Tue Jan 17 07:54:41 PST 2006


On Tue, 17 Jan 2006, Eugen Leitl wrote:

> On Mon, Jan 16, 2006 at 06:43:42PM -0500, Mike Davis wrote:
>
>> sequences (which it wants to be in one folder). A quiz for  the Unix
>> geeks out there, what happens when a folder has 50,000 files in it. Can
>> you say SLOOOOOOOOOWWWW?
>
> Unix doesn't have folders. Are you a Mac person, perchance?
>
> You also seem to be using the wrong file system.
>
> If your application is needing 50 k files in one
> directory, your application should not be needng 50 k files
> in one directory. One trivial fix is to organize it
> into subdirectories, using parts of file name or hashes
> as prefix.

Yeah, exactly.  Or hash in any of many other ways -- ideally on the
actual FUNCTIONAL differentiators of the "DB lookup" if you are likely
to have to process multiple files in one pass, all linked by some common
elements.  There is no substitute for actually using computer science
and intelligent code design for creating optimally scalable code, and no
end of ways of doing something linearly and badly in code written by
people who don't really know how to code OR how to look up efficient
solutions that are written by people who do know how to code.

DB lookup and efficient storage has long since moved into the
trancendental psychic regime with e.g. google and other engines capable
of managing enormous databases.  Google can find a single file out of a
zillion or so (note well my usage of precise numbers;-) slightly before
you enter the search string, a thing that they manage by spinning their
servers so that the lookup engine travels slightly faster than the speed
of light.

    rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list