[Beowulf] Clustering vs Hadoop/spark

Tue Nov 24 08:19:52 UTC 2020

Hi Ben,

Readded the list

I think where im confused is that to me doesn’t that what Hadoop/Spark does distributes the data for computation then aggregates it back into a single data set?

Correct me if I am wrong here. 

Also another thing I cant seem to understand is how for big data analytics a java based platfrom manages to get some great performance to crunch large data sets.

Regards,
Jonathan

-----Original Message-----
From: Benjamin Redling <benjamin.rampe at uni-jena.de> 
Sent: 24 November 2020 09:03
To: Jonathan Aquilina <jaquilina at eagleeyet.net>
Subject: Re: [Beowulf] Clustering vs Hadoop/spark

Hello Jonathan,

On 24/11/2020 06.22, Jonathan Aquilina via Beowulf wrote:
> I am just wondering what advantages does setting up of a cluster have 
> in relation to big data analytics vs using something like Hadoop/spark?

can you distribute any application without programming against a framework?

We distribute a lot of data parallel tasks with the source code unchanged via SLURM.

Regards,
Benjamin
-- 
FSU Jena | JULIELab.de/Staff/Redling
☎  +49 3641 9 44323