Ingest Healthcare Open Data into Azure and Power BI using New GitHub Repository

This post has been republished via RSS; it originally appeared at: Healthcare and Life Sciences Blog articles.

Numerous Government agencies make Healthcare Open Data available to the public at no cost. Data from the CDC, CMS, FDA, World Bank, US Census, USDA and many others provide rich sources of valuable data. These vast sources of robust and useful data are free to use but can have different file formats, different table structures, different context, and different data granularities. Ingesting all of this data into a common place where it can be used and shared is often time-consuming and challenging. I’ve put together a repository in GitHub called Power Pop Health to help with these challenges.


Power Pop Health is a collection of content intended to simplify the process of ingesting and prepping Healthcare Open Data for Analytics, Business Intelligence, Data Science, and more. Power Pop Health has a simple mission: Make it easy for you to ingest, transform and format Healthcare Open Data and common reference tables so that you can achieve more. The GitHub repository can be viewed at this link.


How does Power Pop Health work? I’ve tried to make it simple with low code/no code/no PowerShell deployment so that anyone can use it with nothing more than an Azure subscription and Power BI. Where code is necessary, there are cut-and-paste scripts with tutorial videos for the deployment:

  • Step 1 – Ingest Raw Data into an Azure Data Lake
  • Step 2 – Make the Data usable in Azure and/or Power BI
  • Step 3 – You take it from here! The data is ready to blend with your Organizational data, use for training, create demos, analyze to find trends, etc.

What data is currently available in the first release of Power Pop Health?

Over the last few years I have accumulated examples and tutorials that leverage public Healthcare data. This first release is a repository to share those examples in a unified format, and in one place. Future additions to this repository will be based on feedback from the community, with an initial plan to focus primarily on Population Health data such as Social Determinants of Health. Below is a chart of the data available in this first release:



Here's a quick summary of each data set in the initial release. Before using these data sources, I'd also recommend reading the licensing terms from the data providers to ensure that you are using the data appropriately:

1. CDC Daily PM 2.5 Concentrations – Air Quality measurements at the level of States and Counties for 2001-2016.
2. CDC Population Weighted UV Irradiance – Ultraviolet Radiation measurements at the level of States and Counties for 2004-2015.
3. CMS DRG /MDC / Surgical Class v38.1 – Diagnoses Related Groups (DRGs), Major Diagnostic Categories, and Surgical Class version 38.1.
4. CMS ICD10 CM 2021 – 2021 ICD10 CM Diagnosis codes for the US.
5. CMS ICD10 PCS 2021 – 2021 ICD10 PCS Procedural codes for the US.
6. Date Table (DataFlows) – A custom Date Table that can be deployed to Power BI DataFlows.
7. Date Table (Power Query) – A custom Date Table that can be deployed to Power BI Power Query.
8. Time Table - (DataFlows) – A custom Time Table that can be deployed to Power BI DataFlows.
9. Time Table (Power Query) – A custom Time Table that can be deployed to Power BI Power Query.
10. FCC State & County FIPS – A reference table for State and County FIPS geographical mapping codes provided by the FCC.
11. FDA Food Recall Enforcement Reports – Foods that have been recalled.
12. FDA CAERS Reports (Food Events) – Adverse events attributed to Foods.

13. Medicare Part D Provider Utilization and Payment Data 2013-2018 - I'll have this data available in the next release, but for now it is available in an end-to-end Azure Synapse and Power BI solution at this link: 


What’s coming next in Power Pop Health?

I’ll change the roadmap based upon feedback, popularity of data sources, and updates to Azure and Power BI. Tentatively the plan is to roll out three phases:



  1. Current Phase – Roll out a framework for ingesting several sources of healthcare, population health, reference tables and other Open Data into Azure Data Lake and Power BI DataFlows/Power Query.
  2. Phase 2 – Add additional data sources. Introduce an Azure SQL DB layer where larger tables of data can be curated and queried for high performance. Also add some Power BI PBIX files with examples of data visualization.
  3. Phase 3 and Beyond - Add additional data sources. Introduce an Azure Synapse layer including Azure ML.

How can you get started?

Read the landing page on the GitHub site at this link, and follow the instructions in the videos at the bottom of that page. Each source of Healthcare Open Data also has a folder containing specific instructions with links to videos describing how to deploy those datasets.


Suggestions and Questions

This launch is the first time I'm sharing the Power Pop Health content for feedback, so please pass along suggestions that can help make this repository better and more useful. Are there different data sets that would offer value? Would additional data transformations into other formats be helpful? Please direct suggestions and questions to my LinkedIn or Twitter accounts:



REMEMBER: these articles are REPUBLISHED. Your best bet to get a reply is to follow the link at the top of the post to the ORIGINAL post! BUT you're more than welcome to start discussions here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.