This post has been republished via RSS; it originally appeared at: Channel 9.
This week's Data Exposed show welcomes back Maxim Lukiyanov to talk more about Spark performance tuning with Spark 2.x. Maxim is a Senior PM on the big data HDInsight team and is in the studio today to present the final part of his 4-part series.
Topics in today's video:
[00:45] - Intro
[02:15] - Advanced Partitioning and Bucketing
[10:30] - Advanced Joins: Joining Large Tables
[19:00] - Debugging and Recap
Spark 2.2 rc4 on Azure HDInsight: Script action https://github.com/hdinsight/script-action/tree/master/install-spark2-2