Site icon

Student project showcase Weather Impact on Air Quality using Data Analytics and Power BI

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.



Have you ever wondered what could be the cause of poor air quality? or maybe the air is fresh during a certain season but in another season, it is hard to breathe properly. Has it ever occurred to you that there are particles in the atmosphere that contribute to poor air quality and the concentration of these particles are affected by weather conditions?


This project aims to analyze the air quality of a Beijing using data on its weather conditions from 2010–2014 and also explore how the weather affects the quality.


The tools I used to perform this analysis are Python and Power Bi.

If you new to Python or PowerBI Microsoft Learn has some excellent resources:

Get started with Power BI - Training | Microsoft Learn

Python for beginners - Training | Microsoft Learn

Case Study


A company in the environmental consulting industry is seeking to analyze the air quality in a specific city during hot and cold weather, during high-wind conditions and during precipitation. They are interested in making recommendations to the government and businesses in the region on how to mitigate the impact of weather conditions on air quality.


As a Data Analyst, you are expected to analyze the data provided, seek insights and make recommendations to achieve the set objectives.

Additionally, kindly use this dataset to analyze the historical impact of weather conditions on air quality, and make predictions on air quality during specific weather conditions. This information could be used to inform emergency response plans and prepare for potential air quality issues. 


More details can be found here.


Dataset Information

This data set has been sourced from the Machine Learning Repository of University of California, Irvine Beijing PM2.5 Data Set (UC Irvine). The dataset can be found here and the field description can be found here.


Data Analysis Process 

In order to discover patterns in the raw data and draw valuable information from them, the set of procedures served as crucial steps for the successful completion of this project. They are:

  1. Background Study
  2. Data Gathering 
  3. Data Assessment and Cleaning
  4. Exploratory Data Analysis
  5. Data Visualization
  6. Insights and Recommendation


Background Study

According to Department of Health, New York, Fine particulate matter (PM2.5) is an air pollutant that is a concern for people’s health when levels in air are high. PM2.5 are tiny particles in the air that reduce visibility and cause the air to appear hazy when levels are elevated.

Air Quality in this dataset is determined by the level (Concentration in Ug/m3) of Particulate matter (PM2.5) in the atmosphere. According to Breeze Technologies, PM2.5 levels over 55Ug/m3 shows a poor level of air quality and above 110Ug/m3 shows a severe level of air quality. For this analysis, a limit of 100Ug/m3 was placed to signify that the air quality is getting to a severe level.

The bulk of the analysis is centered around how the concentration of PM 2.5 changes due to a change in the atmospheric condition.


Data Gathering

The dataset was gotten from HERE and was loaded to Jupyter Notebook to begin the data Assessment and Cleaning process using Python.


Data Assessment and Cleaning

The Dataset was assessed for issues with its Quality and issues with its structure. The snapshot of the data is seen below



The dataset was assessed visually and programmatically (using codes). Then the appropriate steps were taken to clean the data. The steps are:

  1. Created a Date column using the year, month and day column. After it was created, the datatype was corrected
  2. Handled the null in the pm2.5 column
  3. In the cbwd column, replaced ‘cv’ with ‘SW’ representing the South West
  4. The PRES column which is representing the atmospheric pressue is in Hecto-paschal. I convert the unit to atm (Standard unit for pressure) and saved it in a new column atm_pressure 
  5. Classified the months into four seasons


The full data cleaning procedure were documented here


The look of the cleaned dataset is shown below



Exploratory Data Analysis

In this section, The “”Question-Visualization-Observations” framework would be used. This framework involves asking a question from the data, creating a visualization to find answers, and then recording observations.


The Questions asked of the data are


The main purpose of exploring the data is to find patterns, identify anomalies, test hypotheses, and verify presumptions with the aid of summary statistics and graphical representations. 


The exploratory data analysis process were extensively documented here.


Data Visualization

The next step is to translate the information gotten from the data into a visual context which makes it easier to communicate my findings. 


This was done using Power BI after I exported my cleaned and pre-processed data from my Jupyter Notebook. The dashboard created using Power BI is shown below


Dashboard page 1


Dashboard page 2



After Exploring the dataset and doing research on this case study and visualizing the data, I discovered the following insights:


On checking the Mean value of PM2.5 across all years and all atmospheric conditions, the mean value is 97.80Ug/m3 which is very close to the threshold (100Ug/m3). From this observation, the city’s air quality is not at its best with average PM2.5 level in the poor level. 


After checking the PM2.5 level for each Month and Season, It is observed that PM2.5 level is the highest during the Winter Season (December to February) having approximately 110Ug/m3 on average.


The Average PM2.5 level during the Autumn season also surpassed the threshold having 101.58Ug/m3. The The average PM2.5 level during the spring season and summer season is 88.24Ug/m3 and 91.74Ug/m3. 


From this observation, it tells that the PM2.5 level is worse during the winter season then the autumn season.


A wind rose plot was created to observe the direction where the wind often goes to. It is seen below


From the plot, we can see that the wind mostly goes in the South East direction but the higher wind speed goes in the North West direction.


Higher levels of PM2.5 occurs more often when the wind direction is going towards the South West (SW) then the South East. It gets extremely high during the winter period when the wind direction is headed towards the South East and South West



After observing the relationship between the wind speed and the PM2.5 level. It is observed that the lower the wind speed the higher the PM2.5 level.


The Average hours of Precipitation (Rainfall) and Snowfall was observed throughout the month. From the charts, the higher occurrence of rainfall occurs during the Summer Season (June - August) while the higher occurrence of Snowfall occurs during the Winter Season (December  -  February).




On periods where there are a low hours of precipitation (rainfall), the PM2.5 levels are extremely high. When there are longer hours of rainfall, the PM2.5 levels are low in comparison to when there are longer amount of rainfall.



After gathering insights from the data, I would love to make a few recommendations






For Further Exploration

The Case study for this Project stated:


“Additionally, kindly use this dataset to analyze the historical impact of weather conditions on air quality, and make predictions on air quality during specific weather conditions. This information could be used to inform emergency response plans and prepare for potential air quality issues.”

The Analysis I carried explored the historical impact of the weather conditions on Air quality which was used to generate insights and make recommendations to curb the effect of the weather conditions on Air Quality and reduce the concentration of PM 2.5 in the atmosphere.


However, the analysis does not contain the Predictive Analysis needed to forecast and make predictions on the Air Quality in the coming days, months or years. Thus, the project could be furthered or continued from its current stage and be modelled in order to perform Predictive Analysis.


The Full Analysis has been fully recorded and documented on my GitHub Repository


Thank you for reading and I can be contacted on LinkedIn and Twitter



  1. Olanrewaju Oyinbooke [LinkedIn, Twitter]  
  2. Tina Okonkwo [LinkedIn, Twitter]
Exit mobile version