Interactive Spark on Azure | Data Exposed

Posted by

This post has been republished via RSS; it originally appeared at: Channel 9.

It is hard to believe that it has been almost 2 years since we last had Maxim on our show, but I can tell you we are extremely excited he's back. Maxim is a Senior Program Manager in the Big Data team at Microsoft and he's back to talk about Interactive Spark on Azure.

Maxim begins our discussion by walking us through the process and challenges data scientists go through when processing data. He explains that data science is an iterative process but that typically their productivity is not efficient because they spend a lot of time waiting for jobs to complete. One of the big factors, Maxim explains, is the size and cleanliness of data which contributes to the long wait times.

At the [05:20] mark Maxim shows us how Spark on Azure provides a solution to this problem by limiting the length of iterations, thus helping you be more productive. Maxim walks us through how that is accomplished. He first introduces is to Apache Spark, and then discusses how Spark on Azure makes data exploration even better.

 At the [08:38] mark its DEMO TIME, where Maxim spends a few minutes showing us how to spin up a Spark HDInsight cluster, then spends the remaining 10 minutes demoing how to use Spark in HDInsight to execute jobs efficiently. I won't give anything away here, so be sure to watch to see Maxim work his Spark magic! Awesome show!

We definitely look forward to having him back!