[Beowulf] Doing i/o at a small cluster

Sat Aug 18 07:55:12 PDT 2012

On Aug 18, 2012, at 1:04 PM, Andrew Holway wrote:

> 2012/8/17 Vincent Diepeveen <diep at xs4all.nl>:
>> The homepage looks very commercial and they have a free trial on it.
>> You refer to the free trial?
>
> http://nexentastor.org/ - Sorry wrong link. Its a commercially backed
> open source project.
>
>> Means buy raid controller. That's extra cost. That depends upon what
>> it costs.
>
> You just need to attach the disks to some SATA port. ZFS does all the
> raid stuff internally in software.
>
>> But it does mean that every node and every diskread and write you do,
>> that they all hammer at the same time at that single basket.
>
> ZFS seems to handle this kind of stuff elegantly. As with hardware
> raid each disk can be accessed individually. NFSoRDMA would ensure
> speedy access times.
>
>>
>> I don't see how you can get out of 1 cheap box good performance like
>> that.
>
> Try it and see. It would certainly be much less of a headache than
> some kind of distributed filesystem which, in my opinion is complete
> overkill for a 4 node cluster. All of the admins that I know that look
> after these systems have the haunted look of a village elder that must
> choose which of the villages daughters must be fed to the lustre
> monster every 6 months.
>
> Dont forget to put in as much memory as you can afford and ideally an
> SSD For read cache (assuming that you access the same blocks over and
> over in some fashion)

I designed something myself in datastructure that's close to ZFS  
according to someone at the time working for Sun in Bangalore;
this was before ZFS was popular, or even introduced (am not sure - it  
was 2001-2002 or so),
but am no aware how the filesystem has been expanded since then to  
satisfy professional needs :)

Note i wasn't aware it works sincethen in Linux as opensource. Does it?

My thing is streaming a dataset of around a 1.3TB over and over again  
and each time something in
the dataset gets modified. So the output is a bitstream that you  
store and this bitstream is, all cores together
storing 1.3TB or so.

Note if i write TB it's terabyte. All those raidcards write Gb =  
gigaBIT.

1.3TB of SSD a node would speed it up considerable, but that's too  
expensive.

I do agree about maintenance, but my cluster ain't larger than 8  
nodes here and i do want that
performance of 0.5TB/s a node, so in case of 8 nodes it should be 4GB/ 
s agreggated bandwidth to the i/o
and not the say nearly 800MB/s that most raidcards, that are cheap on  
ebay, seem to deliver.

So some sort of distributed file system seems the best option, and a  
lot cheaper and a lot faster than a dedicated fileserver
that will not be able to keep up.