[Beowulf] Spark, Julia, OpenMPI etc. - all in one place

Douglas Eadline deadline at eadline.org
Tue Oct 13 10:31:32 PDT 2020


> On Tue, Oct 13, 2020 at 9:55 AM Douglas Eadline <deadline at eadline.org>
> wrote:
>
>>
>> Spark is a completely separate code base that has its own Map Reduce
>> engine. It can work stand-alone, with the YARN scheduler, or with
>> other schedulers. It can also take advantage of HDFS.
>>
>
> Doug, this is correct. I think for all practical purposes Hadoop and Spark
> get lumped into the same bag because the underlying ideas are coming from
> the same place. A lot of people saw Spark (esp. at the beginning) as a
> much
> faster, in-memory Hadoop.

And then this "all or none, either/or" notion develops.
That is, Spark is better than Hadoop, so Hadoop is dead.

The reality is almost all Analytics projects require multiple
tools. For instance, Spark is great, but if you do some
data munging of CSV files and want to store your results
at scale you can't write a single file to your local file
system. Often times you write it as a Hive table to HDFS
(e.g. in Parquet format) so it is available for Hive SQL
queries or for other tools to use.

--
Doug


> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>


-- 
Doug



More information about the Beowulf mailing list