[Beowulf] dedupe filesystem

Nifty Tom Mitchell niftyompi at niftyegg.com
Thu Jun 4 11:53:47 PDT 2009


On Thu, Jun 04, 2009 at 05:33:32AM +0200, Bogdan Costescu wrote:
> On Wed, 3 Jun 2009, John Hearns wrote:
>
>> Quite often the architecture of storage is a secondary consideration,
>> in the rush to get a Shiny New Fast machine on site and working.
>
> Well, I've seen it ignored even outside of that rush - in the design  
> phase. And I confess of being guilty of doing this as well, but I learn 
> from mistakes :-)
>
.....
>
> I see duplication of data in oalmost all cases as a human behaviour  
> problem, not a technical one, which needs human behaviour solutions and 
> not technical ones, so policies are a good solution. 

Take this list for example.   We each get our own copy
and at times get multiple copies as a side effect of
replies.

One key here is the lack of 'caching' tools for mail and
for HPC in the I/O filesystem space.

There are multiple issues that make this hard, some technical 
some social, some habititual.

On the habitual side, I was recently looking at a CS homework assignment
and noticed that the primary instruction began with "copy" both code
and "data" and then ended with "copy code and data" to submit the homework
assignment result.

The low budget answer today is a human behaviour solution... longer term
solutions will need to understand the "data flow"  and "data state" of
a lot of replicated things (example mail and attachments) a lot better 
including the "off line" state, multiple keyboards (home/ work) and 
connectivity and connectivity quality state.

It is possible that HPC tools and mail could evolve toward a Mecurial view (revision control) of data.
This in turn implies a longer reach for access control and access policy tools.




-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?




More information about the Beowulf mailing list