[Beowulf] Doing i/o at a small cluster

Vincent Diepeveen diep at xs4all.nl
Sat Aug 18 08:05:35 PDT 2012


On Aug 18, 2012, at 3:19 PM, Ellis H. Wilson III wrote:

> On 08/17/2012 12:04 PM, Vincent Diepeveen wrote:
>> Yes i realize that. In principle you're looking at a 1050 files or so
>> that get effectively generated
>> and to generate each file half a dozen of huge files get created.
>>
>> Now in theory generating them is embarrassingly parallel except  
>> that to
>> generate 1 large Set of EGTBs
>> requires around a 3TB of working set size.
>>
>> Now comes the achillesheel. The start of the generation it needs to
>> access real quickly some earlier
>> generated sets; the initial generation of the conversion bitmap  
>> needs to
>> access other EGTBs,
>> as pieces can promote especially. Lucky this is a single pass, but  
>> it's
>> a slow and intensive pass.
>>
>> In such case accessing over the network is important.
>>
>> So there is a huge locality except for 1 pass. The real fast  
>> generation
>> that hammers onto the drives and reads quick and writes
>> fast, that can be done entirely local. The first pass generating a
>> single 'exchange bitmap', needs to lookup to
>> EGTBs earlier generate. For example if we have the EGTB KQRP KRP  
>> then it
>> has to lookup to the much larger
>> EGTB that holds KQRB KRP and a few others.
>>
>> So the cohesion to the other nodes drives is limited to say a few
>> percent of the total i/o getting done.
>>
>> As we speak about complex file management here, it's difficult to do
>> this by hand.
>>
>> In other words, efficient usage of the available harddrive space is
>> important.
>
> Without knowing more about your workload (can you avoid
> read-modify-writes?  how small or large are your individual I/Os  
> and can
> you adjust them?) this really seems to be ideal for Hadoop...
>
> I've dug into the code and have a reasonably firm understanding of the
> nitty-gritty for PVFS 1 and 2, PanFS, NFS and HDFS, and I can promise
> you this, on the surface at least, appears to be well suited for the
> MapReduce paradigm.
>
> I understand your hesitation about Java (not the security ones since
> this cluster should not be exposed directly to the internet  
> anyhow...but
> that's beside the point), but let me put it to you this way:  I've  
> got a
> cluster with 50 individual drives all in separate 50, just dual core
> machines, which are just connected via plain old 1Gb ethernet, and  
> I can
> push close to 4.5GB/s when doing embarrassingly parallel, mainly
> sequential workloads with them.  Obviously if I start to do anything

The important question is: how much system time do you have left if  
you run java?

I need 100% of the CPU power. So a filesystem that doesn't require much
of a system time is most interesting.

I really need each core of every node badly. Having the filesystem  
compress
things for me is something i just can't afford. Finished EGTBs i can  
wr.............ite
a small script for to get compressed after a while...

Now the EGTBs themselves aren't secret anyhow, as they will get spreaded
amongst Diep users, but i feel you seriously underestimate the  
security risk of
java. Once something evil is on the inside of your building, it WILL  
get out
somehow. So to speak because of a car driving closeby and someone who
swings open a door for half a minute.

A chain of software that controls high frequency and low frequency
you are total powerless against.

the hard reality of todays society is that it's cheap to have people  
do this and
it gets massively done. HPC being the most attractive target, and  
they don't care
if they only find tic-tac-toe type software, as long as someone pays  
their bill it'll
keep happening.

Enough about that.

So i just skip as much javasoftware as i can avoid that's the cheap  
way out.

> more random or less embarrassingly parallel this number is halved  
> if not
> worse.  If your work allows you to operate in the environment that
> MapReduce and HDFS allows and encourages, I would strongly suggest you
> pursue that route.  That's the only distributed environment I can  
> think
> of off of the top of my head that can properly handle (out of the box)
> the division you strike between local and remote accesses.
>
>> Compress it with 7-zip and move it away indeed. It'll compress to 3TB
>
> As a side-note -- Hadoop provides support for compression on transfers
> that might help you immensely.  You can pick from a few, but LZO tends
> to be the best one for speed/compression for my workloads.  This could
> really help you when you need to do that 1 pass where all nodes are
> exchanging with each other.
>
> Best of luck!
>
> ellis
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin  
> Computing
> To change your subscription (digest mode or unsubscribe) visit  
> http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list