Can cross-cloud data analytics be easy? | Microsoft Fabric

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Work with your data in place, wherever it resides, with Microsoft Fabric, our next generation data analytics service powered by one of the first true multi-cloud data lakes, called OneLake. Go from raw data to meaningful insights over data spread across your organization and in other clouds in seconds — without moving it. 

 

Main.png

Microsoft Fabric provides a single integrated service that includes data integration capabilities, data engineering for shaping your data, data warehousing, the ability to build data science models, real time analytics, and business intelligence. Data from these different experiences is brought together by one unified data lake for your organization, OneLake, and is accessible regardless of the engine used. 

 

Justyna Lucznik, Principal Group PM for Microsoft Fabric, joins Jeremy Chapman to share how to make your organizational data more accessible. 

 

Work with your data in place, wherever it lives. 

1.png

Check out Microsoft Fabric, a next-gen data analytics service powered by OneLake, a unified foundation for all your data.

 

Bring data from Azure storage accounts & other clouds into one unified data lake.

Screenshot 2023-05-24 at 12.56.44 PM.png

See how to link data without duplication or movement with shortcuts in OneLake, part of Microsoft Fabric.

 

Use large language models, like GPT, to speed up model creation. 

3.png

See how Copilot is coming to Microsoft Fabric.

 

Watch our video here.


QUICK LINKS: 

00:00 — Introduction 

01:47 — Unified data in OneLake 

03:05 — Personalized experiences for data professionals 

04:43 — How to get started with Microsoft Fabric 

06:06 — Shortcuts capability in OneLake 

07:35 — Native Windows Explorer integration 

09:15 — Collaboration and shared work space 

10:04 — LLMs and Copilot for model creation 

11:33 — SQL Analyst experience 

12:53 — Business user experience 

14:06 — Wrap up

 

Link References: 

Start your free trial at https://aka.ms/TryFabric 

Check out Microsoft Fabric at https://app.fabric.microsoft.com 

 

Unfamiliar with Microsoft Mechanics? 

As Microsoft’s official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. 

 

Keep getting this insider knowledge, join us on social: 


Video Transcript:

- Can you go from raw data to meaningful insights over your data spread across your organization and in other clouds in seconds without moving it? Well, today we’re going to take a closer look at Microsoft Fabric, our next generation data analytics service that lets you work with your data in place wherever it resides powered by one of the first true multi-cloud data lakes called OneLake. And joining me today is lead product manager, Justyna Lucznik a fan favorite on Microsoft Mechanics. Welcome back.

 

- Thank you. It’s been too long. So happy to be back. And in the new studio.

 

- It has been too long. And thanks so much for joining us today. You know, first off, congratulations on the tech preview launch for Microsoft Fabric. And you know, we’re seeing this huge focus on data access and quality these days. It goes beyond traditional BI, as we’re all kind of interacting with and using our data and large language models, even in generative AI these days. And in those cases the source data is really key to generating accurate responses. So how does Microsoft Fabric then aim to make data accessibility and working with data easier?

 

- Sure, so as you alluded to earlier it’s really about removing that pain and drudgery from accessing your organizational data. And this could be sitting in many disparate data sources across your entire data estate. You’ll typically use many different tools to transform it and make it accessible. So with Microsoft Fabric, that path from raw fragmented data to meaningful insights is significantly reduced. It gives you a single integrated service that provides data integration capabilities data engineering for shaping your data, data warehousing, you can build data science models, real-time analytics and business intelligence. And the data from these different experiences is brought together by one unified data lake for your organization called OneLake which is automatically provisioned for your tenant.

 

- So what makes it possible then to unify all that data into OneLake?

 

- Well, we’ve modernized the data architecture. All the different service engines store their data using the Open Delta Parquet format in OneLake. Thanks to this open format, the data in OneLake is accessible regardless of the engine that you use. For example, this way data from the Spark engine and the lakehouse can be read by the data warehouse and vice versa. And by the way, we provisioned the necessary compute on demand, which can scale infinitely to support the smallest self-serving reporting instance to the largest petabyte scale, big data analytics jobs requiring Spark compute. So there’s no dependency on having to get resources set up for you to be able to work with your data. Now one thing that sets OneLake apart is that you can even bring data from your Azure Storage accounts and other clouds into OneLake, then leverage it across all the engines using a technique called shortcuts which instantly allows the linking of data without any data duplication or movement. This creates virtual files of your data that you can view from OneLake. And if the data’s already in Open Delta Parquet format it’ll automatically appear as a virtual table in OneLake that you can query again without the data ever leaving your data store.

 

- So you mentioned that we’re spinning up resources that you need as you work with your data. So what are the different roles then that we support and how is it scoped in terms of their experiences?

 

- Yeah, so I’ve shown you all the different workloads that we support. Well, this translates to personalized experiences for whatever your role is as a data professional. For example, if we take a look at the Microsoft Fabric homepage, which you can get to at fabric.microsoft.com, we’re able to choose our service entry point. I’ll choose a data warehouse, and from there I’m presented with easy to access core capabilities central to my job. For example, I can easily create a new data warehouse right from here. I just need to give it a name and then hit create. And I’m instantly navigated to the warehouse experience where I can start my project. Importantly, since this warehouse gets saved in a shared workspace, I can collaborate with people from other data disciplines in my team. For example, I can see my data scientist has been building a conversion model and the BI analyst has put together a marketing report. And not only is the workspace shared, we want to make sure all the experiences in Microsoft Fabric are unified. For example, we have a shared monitoring hub to view and track recent activities CICD for versioning and automated deployment. Microsoft Fabric respects the data classification and sensitivity labels protecting your data as it travels. And we’ve also designed a OneLake Data Hub for discovering, managing and accessing the data across your organization as part of the experience. Additionally, whether you’re a data engineer, data scientist all the way to a data analyst or business end user you can securely work with the data in OneLake from your tool of choice.

 

- Okay, so if you’re a new user what does it take to get all this up and running?

 

- So let me show you the starting experience of a brand new user. The cool thing is I don’t have to start into Azure portal or configure any Azure resources. I don’t even need an Azure subscription to get started. I can just use the OrgID that I use with Power BI or Microsoft 365. Now let’s assume I’m a data engineer and my aim is to ingest and prepare the data for my organization. I’m going to start by navigating to the Fabric data engineering experience. And since I want to start a new project, I can simply go to the left nav and create a new workspace. Now I’ll just give it a name, marketing workspace and hit apply. It’s about as easy as creating a folder on my desktop. And when I create a new Fabric item such as a new notebook I’m automatically enrolled in a free trial. Once a trial has been successfully kicked off, you’ll see that I automatically land inside the notebook ready to write some code. I did not get asked about Spark cluster configurations, network settings, or any other setup. So let’s go ahead and just print “Hello World” and you’ll see that I get my results back instantly. Now what’s happening behind the scenes to make this happen is that Fabric pre-provisions live pools of Spark compute, they’re ready to go without any setup necessary and you can get started in seconds.

 

- And that’s pretty fast, especially compared to those five minute or so wait times you might have when you spin up a new Spark job. So what’s the experience then like working with OneLake?

 

- Sure, so let’s jump back into our workspace and I’m actually going to create a new item called the lakehouse. There are many ways you can ingest data into the lakehouse. For example, you could use data integration tools like data flows and pipelines, but I’m going to use our new shortcuts capability so I can work with the data even in external storage accounts natively in Microsoft Fabric without the data ever moving from its location. In the lakehouse I’m going to choose the option to create a new shortcut. And you can see I can create shortcuts to data that’s already in OneLake. I can browse the OneLake Data Hub and select the lakehouse I would like to create a shortcut from. I can easily select the table I need. And in just a few seconds my marketing table automatically shows up, ready to be used. But the other amazing thing about shortcuts is they also work with external storage and they’re actually multi-cloud. So this time round let’s bring in some unstructured data and instead of choosing OneLake not only can I choose Azure, but even an AWS S3 bucket. All I have to do is select it as a source, specify its location, and populate all of my account information. On the next screen I can give my shortcut a name. And that’s it. Within seconds I can see a shortcut created in the file section, which is the messy unstructured data lake portion of the lakehouse. I can even open up the files and explore them directly inside the lakehouse.

 

- So it’s a powerful experience from the browser. But what if I want to explore my data with a local app? Do I use something like Azure Data Studio?

 

- We can do even better. Native Windows integration with the File Explorer. This makes it super simple to access the data lake and just work with it as you would with OneDrive. Here I’ve opened up my local Windows Explorer and you can see that alongside my OneDrive folder I also have a OneLake folder synced. Navigating through it I can open up my marketing workspace and my lakehouse I was just working in is right here and I can navigate to the tables and files folders too. If I jump into the files section, I can easily navigate through the folder structure and open up the reviews folder sitting in my Amazon S3 Bucket directly from the file explorer. I can even open up the same file locally that had opened in the lakehouse despite the fact it’s still sitting in my Amazon storage account. So now let’s jump back into files so that I can bring some local data into OneLake. I have folders containing CSV files saved on my desktop. So I’m just going to drag and drop the folder inside and you’ll see that the folder automatically shows up. Now navigate back to the lakehouse and refresh it. We can see our CustomerSurveys folder shows up in the lakehouse as well. I can navigate through it and quickly look through my data. One last thing I’ll do while I’m here is converge one of my CSV files into a Delta table so that it’s easily accessible across the entire platform. All I have to do is right click the CSV and choose to load it to a table. And my customer surveys automatically shows up as a Delta table.

 

- And I got to say I really love the Shell integration with Windows in this case it’s kind of like working with OneDrive in the File Explorer except it’s actually working across multiple clouds. Now earlier you showed us this kind of shared workspace concept where all the different data disciplines can work together to collaborate. So can we take a closer look at that shared workspace?

 

- Sure. And again, because all of the data is in the open Delta format. It’s recognized by all the engines in Microsoft Fabric. Now I have my tables ready. Let me show you how different data professionals can collaborate on the same copy of data. To get started, I’m going to put on the hat of a data scientist. Here I’m in the notebook we created previously. I can add my existing lakehouse it’s the marketing one over here, and I’ll add it to the side by browsing through the OneLake Data Hub. Once my lakehouse appears in a notebook view, I can simply drag and drop my data onto the notebook canvas and now I can use Spark to process it and start building my predictive model.

 

- This is the year of Large Language Models with a huge amount of interest for things like GPT. And by the way, we’re hosting separate instances of them in the Microsoft Clouds. Is this something that you could use maybe to start creating predictive models?

 

- Sure. Actually, let me give you a first look at some of the capabilities we’re natively integrating into notebooks inside Microsoft Fabric. Now that I’ve loaded my data I can ask Copilot to help me figure out what would be a good machine learning model I could build. I can ask my question directly in the notebook cell. Now as this processes, the awesome thing to note here is that the Copilot in Fabric is data aware. This means that when I refer to my data, it understands that we’re talking about the tables and files in the lakehouse. After a couple of seconds, Copilot generates me an answer in natural language and it recommends I build a binary classification model based on my data. I can now ask Copilot to help me generate a logistic regression, predicting whether a customer opens up a new account. Notice that I started my prompt with %code instead of %chat, and this is because I want code as a response instead of natural language. And we can see that in a few seconds. Copilot generated me the code for predictive model importing the libraries I need and using all the fields in my dataset. I can go ahead and run the code like I would with any other notebook cell. And once things are done, I can see the accuracy of the model is pretty good, with over 82%.

 

- And it’s really amazing to see how the AI can even speed up model creation from scratch. So switching gears though to the analyst. So what are the day-to-day experiences then for a SQL analyst?

 

- Well, SQL analysts will feel right at home in Microsoft Fabric. So let’s actually take a look at their experience as well. Back inside my workspace, I can navigate to the SQL endpoint. This opens up a T-SQL experience which is powered by the SQL engine. I can see familiar constructs like schema stored procedures, views, and queries. So I’m going to create a new SQL query to explore the data. I can write a query to look at the average age broken down by job type and education. And you’ll see that I get an instant response back. And for users who prefer a low-code approach, they can also construct SQL queries using our immersive visual editor. And the important thing here is that we’re using the SQL engine directly on top of the data in the lake. No data movement has happened. From here, I can simply go to the built-in modeling view and start developing my BI data model, directly in the same warehouse experience. I can easily create a relationship between my two tables and by combining the data warehouse directly with Power BI we’re making it really easy for data engineers and business analysts to collaborate over the same data.

 

- And the cool thing here is that even roles that wouldn’t typically use a data lake are now using one. So it’s kind of democratizing the whole analytics experience. Speaking of which, what is the experience then if I’m a business user?

 

- Yeah, let’s take a look. So after all of my data has been prepared, I can move into creating a brand new Power BI report. I can launch my new report directly from inside the warehouse experience. And can immediately start dragging, dropping my fields in. And as I start building up my new report Power BI will use a new capability called Direct Lake mode to natively read data in the Delta format stored in OneLake. This unlocks a lot of performance for Power BI and again no data has been duplicated to achieve this. I can take this even further. Excel can also benefit from data stored in OneLake. Here I’m in Excel connected to my lakehouse. And from my spreadsheet I can easily build out a pivot table directly on top of the same data without ever leaving Excel. And you’ll notice that my highly confidential sensitivity label from my lakehouse has automatically flown all the way into Excel, ensuring all my work is encrypted and secure.

 

- And that’s really a great point with data protection. And you know right now making your data more accessible is going to open up a whole lot of new app experiences around your data. And you also have more opportunity to use your AI models, as we saw before, with domain specific knowledge. And you also mentioned that Microsoft Fabric is now in preview. So for everyone who’s watching right now, especially given all the options, what do you recommend that they do to get started?

 

- First sign up for the preview and start rolling up your sleeves and using Microsoft Fabric. All you need is a Microsoft account to start your free trial at aka.ms/TryFabric. And start where you’re comfortable based on how you normally work with your data, whether you’re a business user, analyst, data engineer, or data scientist. And of course, let us know what you think. Today we focused on the core experience of Microsoft Fabric. There’s so much more to cover from enterprise scale, real-time analytics, to pure querying performance and much more that we’re going to need to save for another show.

 

- Looking forward to that. And thanks so much for joining us today Justyna and sharing a first look at Microsoft Fabric. Of course, keep watching Microsoft Mechanics for all latest updates. And be sure to subscribe if you haven’t already. And as always, thank you for watching.

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.