[Beowulf] Suggestions to what DFS to use

Tony Brian Albers tba at kb.dk
Tue Feb 14 05:14:08 PST 2017


On 2017-02-14 03:00, Douglas Eadline wrote:
>
>> Hi guys,
>>
>> So, we're running a small(as in a small number of nodes(10), not
>> storage(170TB)) hadoop cluster here. Right now we're on IBM Spectrum
>> Scale(GPFS) which works fine and has POSIX support. On top of GPFS we
>> have a GPFS transparency connector so that HDFS uses GPFS.
>>
>> Now, if I'd like to replace GPFS with something else, what should I use?
>> It needs to be a fault-tolerant DFS, with POSIX support(so that users
>> can move data to and from it with standard tools).
>
> HDFS does have a NFSv3 gateway which helps users move
> data around in a familiar fashion (without the -put -get commands).
> If you need to use HDFS for big block local streaming performance
> that feature can be useful.  If you are doing Spark or MR where data
> locality is important, then HDFS is a low cost alternative
> to other file systems. Plus if you use something like
> Ambari/Hortonworks the management is somewhat integrated
> in the web-GUI. (Hortonworks is open source rpm based)
> If you don't care about locality, then another file system
> will work.
>
> As an aside, having done a handful of Hadoop/Spark workshops
> in the last year, I have found the single most difficult
> aspect of Hadoop/HDFS and Spark on Hadoop/HDFS is understanding
> the "remote" or non-local aspect of HDFS, i.e. the fact that
> a copy of the data must be loaded into HDFS before it
> can be used. The NFS gateway helps because files can
> be seen in a users local file system. But I digress ...
>
> --
> Doug
>
>>
>> I've looked at MooseFS which seems to be able to do the trick, but are
>> there any others that might do?
>>
>> TIA
>>
>> --
>> Best regards,
>>
>> Tony Albers
>> Systems administrator, IT-development
>> Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
>> Tel: +45 2566 2383 / +45 8946 2316
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> --
>> Mailscanner: Clean
>>
>
>

Some very good points there. No doubt the NFS gateway can be useful.
But, NFS gateway in itself is not enough for our purposes.

-- 
Best regards,

Tony Albers
Systems administrator, IT-development
Royal Danish Library, Victor Albecks Vej 1, 8000 Aarhus C, Denmark.
Tel: +45 2566 2383 / +45 8946 2316


More information about the Beowulf mailing list