Spark Performance Tuning – Part 3 | Data Exposed

Posted by

This post has been republished via RSS; it originally appeared at: Channel 9.

This week's Data Exposed show welcomes back Maxim Lukiyanov to talk more about Spark performance tuning with Spark 2.x. Maxim is a Senior PM on the big data HDInsight team and is in the studio today to present Part 3 of his 4-part series.

Topics in today's video:

[00:45] - Recap and overview of the first two videos

[03:40] - Join Types (SortMerge and Broadcast)

[09:30] - Cost-based Optimizer

[21:35] - Outliers and Data Skew

Spark 2.2 rc4 on Azure HDInsight: Script action https://github.com/hdinsight/script-action/tree/master/install-spark2-2

 

This articles are republished, there may be more discussion at the original link. But if you found this helpful, you're more than welcome to let us know!

This site uses Akismet to reduce spam. Learn how your comment data is processed.