PowerBI: Screen scrape to Gorgeous Visual in 5 Easy Steps

This post has been republished via RSS; it originally appeared at: Healthcare and Life Sciences Blog articles.

titles.jpg
Have you ever wanted to scrape data from a website and create a dashboard with the data?  Maybe you want to get insights from the metrics? Or maybe you just want to make pretty visuals? 
 
 
Well, so did we :D! 
 
We wanted to track metrics from the Healthcare and Life Sciences blog .  Ideally, we wanted a dashboard showing at least views but possibly other KPIs. And what do you know! Each blog conveniently shows how many views we have per blog and we wanted to capitalize on that feature. 
 

So Write Code? 

Initially we thought about screen scrape-ing, using open source screen scraping libraries. The problems with this approach: 

  • Time to result: we probably wouldn’t get to our end goal (a pretty dashboard) in a day. 
  • Liability: we would be stuck maintaining code until we didn’t need the dashboard 
  • Hosting: even if we created a SPA we’d still need to host it, and share links etc 
  • Authorization: we expected we would come to a point where we could not make all the data we were collecting public- so we’d need to add Authorization 

With all these problems even finding a library to start from seemed like a daunting task. 

 

Hello, Power BI!

This is where Power BI (desktop) comes in. It has built in intelligence to scrape a web page and it tries to guess different tablature data that exist on the page. (And it does a pretty decent job at it, too!) 
 
This is our final pie chart- showing views per author, filtered for the last 45 days.

finalpiechart.png

 

 
It will take you less than 10 minutes from here to get to this chart.  Shall we get started? 
 

Let's start the magic:

First open Power BI. Click GetData’. Search ‘Web’ in the modal dialog and click ‘Connect’:
getdata_web.png

 

  
Enter in the URL you want Power BI to scrape: 

DanielKim_8-1643729754053.png

 

 Since this is a public site, select ‘Anonymous 

DanielKim_9-1643729754054.png

 

 
This will now show all the possible tables that exist on the page: 

selectTable.png

 

For our example, we will select Table 1. (This table includes the information we need, such as, author, views, post date and etc.). Check the checkbox next to Table 1 (or relevant table you need) and click Load’ 

 
That’s it!  
 
Now, it’s all just aesthetics and grabbing and filtering on the correct fields. 
 
Beautifying our data 
Let’s go and change the column names for our sanity :D. As of now, it’s just Column 1, 2, 3, and etc. 
 
Click on the second icon (one that looks like a table) and once the data loads, right click on the columns and ‘rename’: 

rename column.png

 

 

 Go back to the dashboard view: 
gobackto_Dashboard.png

 

 

 
 
 
Choose your preferred visual, then fields:
selectVisual_Filters.png

 

And Voila! 

piechart.png

 

 
 
Oh, you want more? 

You can use FILTERS to hone in and get more detailed metrics (ie. Our final result). We dragged the Post Date column to ‘Filter on this page’ section. We then filtered the chart to only show view counts in the last 45 days.  

DanielKim_21-1643729754073.png

 

 
  

Note: As you might have figured, the notable limitation to this ‘cheat’ is how much data the website actually offers for the viewers. For our usage, it was a perfect use case to leverage an existing feature on an existing solution without having to reinvent the wheel :smiling_face_with_smiling_eyes:. We still believe this is a great trick to keep it in your sleeves as there are plethora of sites where you can extract views, likes, dates, and descriptions right from the page!
 

Thanks for reading :)

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.