Spark Performance Tuning – Part 2 | Data Exposed

Posted by

This post has been republished via RSS; it originally appeared at: Channel 9.

This week's Data Exposed show welcomes back Maxim Lukiyanov to talk more about Spark performance tuning with Spark 2.x. Maxim is a Senior PM on the big data HDInsight team and is in the studio today to present Part 2 of his 4-part series.

Topics in today's video:

[01:40] - DataSets vs. DataFrames vs. RDDs

[10:45] - Garbage Collection Overhead and Executor Size

[18:20] - Data Formats  

[22:35] - Data Partitioning

[26:25] - Caching

This articles are republished, there may be more discussion at the original link. But if you found this helpful, you're more than welcome to let us know!

This site uses Akismet to reduce spam. Learn how your comment data is processed.