Aviation flight data analytics with Azure Synapse Analytics

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Picture1 new .PNG

In this blog we will cover flight data analytics use case specific to aviation industry. We will also discuss how cloud analytics technical platforms can be the foundation of such a use case in this blog. Note that for any airlines company, any aircraft deals with humongous amounts of data. This data is generated by aircraft and engine legacy sensors in various frequencies. While a few legacy sensors might generate signals once in every four seconds (0.25 Hertz), few are generating signals four times in one second (4 Hertz). This is an example of time series data. Most of these data that are capture during flight phase of aircraft is stored in Black Box  .Fun fact is neither it’s black, nor box, more of orange and partially cylindrical shape.  There are thousands of data signals that are captured when a flight travels from origin to destination location and generates Terabyte scale data in one single trip. It includes flight rotation in x, y, z axis or also known as Roll, Pitch, Yaw  ,flight speed, radio altitude, fuel flow in engine, radio altitude and even information’s like seatbelt sign etc. This information is stored in a recording device known as Flight Data recorder- FDR. Such stored information is helpful to understand a lot about aircraft performance, engine fuel efficiency, aviation risk and many more aspects. Another piece of the backbox records cockpit voice (Cock pit voice recorder- CVR). When a plane crash happens the first thing that aviation investigators look for is FDR and CVR subsequently analyzed by aviation industry for actionable insights, root cause, preventive measures.

Note that this aviation black box data does not transmit signal to on premises in real-time till today. While there is still room for further enhancement, for now FDR, CVR data kept in fireproof, waterproof way inside flight. It is being manually retrieved from aircraft after flight phase ends by operations staff. Subsequently it is sent to the airline’s technical team for further decoding and computation. Recent days cloud computation has added numerous capabilities towards advance computation using such data. In this blog we will learn about different business analytics use case using aviation black box data and other related data. Also, we will learn how we can ingest, analyze such flight black box data and other similar time series data using Microsoft cloud, azure synapse analytics platform. We will touch upon briefly the following topics.

  •  Flight data analytics potential use case with data sources such as aviation black box data, aviation weather data.
  • Data Ingestion techniques of aviation data in Microsoft Azure Synapse platform.
  • Synapse data explorer cloud service-based data exploration techniques using Black box data.
  • Sample power bi visualization on Aviation incident database.
  • Synapse spark user experience only using black box data.

Let us start looking at different potential use case and actionable insights unlocked using FDR data.

Aviation business usecase

Aviation industry wants to minimize the operational risk, minimize incidents, and eventually wants to provide seamless and safe journey experience to passenger. Expert pilots might want to see if the following phenomenon happened while they were flying.

  • Flight unstabilised approach.
  • Engine Icing
  • High vertical energy at touchdown.
  • Strong tailwind.
  • Cruise overspeed and alert
  • Prescriptive skillset towards pilot’s competency.

Traditionally aviation safety teams use Boeing simulators and similar devices to recreate and simulate flight paths and used for training purposes.

On the other hand, the flight operations team might be interested in the following topics.

  • Flight fuel efficiency and related causal factors like flight engine dirt.
  • Flight fuel planning and optimization (additional fuel, alternate fuel, contingency fuel etc).
  • Real time information about weather data
  • Real time information on air traffic data

Note that while the engineering team might be interested in

  • Predictive maintenance alert for engineering parts.
  • Automated workflow trigger based on such alert.

and leadership will be interested in

  • Aviation risk insurance optimization
  • Pilot Fatigueness
  • Flight tooling

Different personas of Airlines team will have their different KPI hence there are wide variety of potential use case that are built over FDR data. To answer previously mentioned questions, aviation industry leverages various sources of data to build such an analytics platform. Data sources like Black box or flight data recorder, real time weather data, real time air traffic, in-house flight incident data, aircraft workshop data and many more such data sources are used to build aviation analytics platforms. IATA provides external data as well to understand more on such data like passenger traffic data, passenger survey data, cargo data, safety/accident data and many more. https://www.iata.org/en/services/statistics/ All such data must go through the following  

  • Ingest the decoded airlines black box data date in cloud data lake.
  • Integrate with all other internal and external sources of data and build cloud data lake.
  • Analyze the black box, weather time series data, incident database and all other related sources.
  • Apply meaningful data engineering and science experimentation on that data lake.

In our following section we will use cloud analytics service (Microsoft Azure synapse data explorer) to ingest sample flight black box data (FDR also known as QAR data).

Flight Data Ingestion

In this section we will be using sample flight black box data to load into the Microsoft cloud platform. Aviation organization retrieve black box data manually from aircraft and then load, decode accordingly for further analysis. For academics purpose in NASA site we will get similar sample of flight data records in mat lab format. Python has packages which can convert it into csv format. For our academics simplicity I have kept similar sample decoded file, Use this storage endpoint and attach it in Azure storage explorer tool for exploring the data further. Here we are using Synapse data explorer database aviation to load the data in empty table aviationtable2. For reloading you may use the following commands to clear the existing table.

 

 

 

 

 

.clear table aviationtable2 data

 

 

 

 

 

The following screenshot depicts one click data ingestion operations in synapse data explorer using the storage direct link. Learn more about one click ingestion in the Further Reading section.

 

Picture1.3.png         Fig 1.1 Loading decoded black box sample data in Synapse data explorer using one click       ingestion mechanism

 

Now we have loaded sample FDR data in Azure synapse data explorer database. Let’s move on to the next step and learn how to analyze such data using KQL or Kusto query.

Flight data observational analytics

Aviation safety management experts want to analyze the data and see crucial signals and patterns from the data. Aircraft Roll, Pitch and Yaw motion rate is one of such key signals which safety experts need to cross examine.

 

 In the following KQL code we are using subframe counter (read it as sensor signal timestamp equivalent) and body roll rate. We are retrieving the data from the previously loaded table and visualizing in line chart within synapse studio development hub KQL script. Quick eyeballing provides subframe counter (timestamp) details in which aircraft had higher body roll rate motion during flight cruise phase.

 

 

 

 

Aviationtable2
|project ['SubframeCounter'], ['BODYROLLRATE']
| render linechart

 

 

 

The following visualization screenshot is outcome of previous code within Synapse studio. Learn more about usage of same in Further read section.

Picture1,4.png

 

                  Fig 1.2 -Analyzing pitch body roll rate Synapse studio

 

 

Likewise simple Kusto based query and visualization can be used further to start basic flight data exploration. In the following code snippet, we can analyze pitch altitude rate in similar way and plot as scatter chart.

 

 

 

 

aviationtable2
|project ['SubframeCounter'],PITCHATTITUDERATE
|render scatterchart 

 

 

 

 

While Azure synapse data explorer charts are useful for operational dashboards, we can also leverage Power BI for unified analytics dashboard from synapse studio. Power BI rich visualization capability can be used to explore some questions as well. Let us try to observe flight stability here in the following visualization. During the landing phase, the flight path will have to follow almost textbook pattern to avoid any incident. It means prior to touch down point, it must follow almost exact prescribed distance, speed, height at every second. Any abnormality or instability might be correlated or causal factor of landing incidents like high vertical energy while aircraft touchdown runway. Incident occurrence data is another separate data source. Usually, it contains aviation internal incidents globally. Sample Power BI chord diagram plotting using flight path origin, destination pair and incident occurrence by pair can give us quick insights on specific incident pattern in any route. That can further lead us to investigate aircraft condition, weather, pilot skill set etc. on that route. In the following diagram we also have used the last twenty seconds of data prior to landing and plotted the flight altitude in power bi for different flights.

 

 

Capture1.6-7.PNG

                     Fig 1.3 Power BI line chart to analyze flight stabilized approach

 

In the following section we would walk through time series analysis s using synapse data explorer using sample weather data.

 

Synapse data explorer-based analytics for weather data

Note that Aviation analytics need to also integrate the aircraft data with weather, air traffic, passenger, cargo (Hazardous or not) and many other info. Here we will use sample weather data for academic purpose. We will perform the following steps to understand how to analyze time series-based weather data using synapse data explorer service

  • Add sample weather data provided by ASX web user interface inside ADX environment.
  • Perform weather event count analysis

There is different weather open free/service based api available which we may leverage to integrate the weather data in cloud data lake. For our ease of academic’s purpose, we are using sample time series data available inside our Azure data explorer tool. We can go to Azure data explorer user interface from Synapse studio and click on Home ribbon in left side and then click on Explore sample data with KQL. Subsequently we need to click on Basic data option inside Home to import weather-based data inside help database.

Now we will run following Kusto to understand high-level overview of storm events.

 

 

 

 

// Visualize the data by rendering charts

// - The "render" operator visualizes query result as a chart. It should be the last operator to query.
StormEvents
| summarize EventCount = count() by State
| where EventCount > 100
| render piechart

 

 

 

 

The following screenshot is outcome of previous code snippet, red dots highlighted in the line chart are the outliers within the time window used in code snippet.

 This information during flight landing, take of or cruise phase can correlate with any specific incident which needs to be further analyzed using domain and data science knowledge

Aviation machine learning engineering tool-Synapse notebook

Let us now shift our focus back to our flight black box data again from weather data and incident data. For industry use cases we may need to leverage several rich machines and deep learning packages of python as well. Synapse analytics platform provides synapse spark capability which we can use for python spark-based data science experimentation. Synapse studio generates initial spark code, loads in data frame in code free manner. Note that the following code is the outcome of previous right click and not that we are writing from scratch. Thus, Azure synapse platform provides less code experience for spark developers.

 

 

 

%%pyspark

# Read data from Azure Data Explorer table(s)
# Full Sample Code available at: https://github.com/Azure/azure-kusto-spark/blob/master/samples/src/main/python/SynapseSample.py

kustoDf = spark.read \
    . format("com.microsoft.kusto.spark.synapse.datasource") \
    . option("spark.synapse.linkedService", "AzureDataExplorer6") \
    . option("kustoDatabase", "aviation") \
    . option("kustoQuery", "aviationtable | take 10") \
    . load()

display(kustoDf)

 

 

 

Once our data is loaded by us in session data frame from cloud storage account or data explorer, our data science process will follow usual experimentation process to figure out any incident root cause. As highlighted earlier, it needs both aviation domain knowledge and data science knowledge to do feature removal, ranking and feature extraction. Subsequently apply right parameter tuning, algorithm to answers questions to find out root cause of flight aviation risk such as following list and many more

  1. Runway incursion
  2. Runway excursion
  3. High vertical energy
  4. Engine Ice
  5. Tail strike

There are a few research papers we have shared in the further read section around QAR/FDR data. Industry aviation risk use case might leverage such paper to build required Analytics product.

Aviation analytics- Cloud Data and AI architecture diagram

Overall let us do a quick recap of cloud data and ai architecture that we used in this blog. Note that it can be further improvised by us based on different aviation analytics use case. Data retrieved from black boxes are usually stored by aviation team in on-premises infra after its downloaded manually from aircraft. This will be batch transferred to Azure Data Lake gen2 via Synapse pipeline. Other data, which is generated in real-time like air traffic and weather in real-time can be directly ingested to Azure cloud using Azure iot hub or event hub. Subsequently we built Synapse data lake capability in cloud. Synapse spark will do heavy weight datascience experimentation and synapse data explorer for fast processing of humongous data in real time with low latency. Operational dashboard, power BI dashboards, Intelligent apps use this data lake foundation to consume this descriptive, predictive, and prescriptive analytics platform and render the outcome to users.

 

 

Fig 1.12.PNG

                                  Figure -1.4 Cloud data and ai architecture

 

 

Further reading

To read more on the topics covered in this chapter, you can refer to the following resources:

 

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.