Geospatial Analytics for Connected Vehicle with Synapse Data Explorer

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Tech Community.

Picture 1.PNG


Connected car technology is evolving at a fast pace and has become the new norm of digital transformation in the automobile industry. A connected car has the capability to communicate with other devices and transmit signals to other devices over a network. The connected vehicle USA government fact sheet connected_vehicles_work.pdf ( provides a self-explanatory description on connected vehicle as below.



Cars, trucks, buses, and other vehicles will be able to “talk” to each other with in-vehicle or aftermarket devices that continuously share important safety and mobility information with each other. Connected vehicles can also use wireless communication to “talk” to traffic signals, work zones, toll booths, school zones, and other types of infrastructure. The vehicle information communicated is anonymous, so vehicles cannot be tracked, and the system is secure against tampering.


In the connected vehicle platform cars send signals using IOT sensors to smart devices within the same network. These signals are used widely both by consumers and car manufacturers. The owner of car can see vehicle health status like battery related info, tyre pressure related info etc. using car manufacturer provided super App and other digital touch points. They can also see vehicle lock status and a lot of other telemetry information like distances travelled in every trip and other trip details etc. This type of platform is built on IOT and 4G/5G network capability and heavily dependent on analytics capabilities like Geo spatial analytics which is used nowadays in most of the mobile super App and other digital platforms. In Geospatial analytics we capture Geographic Information System (GIS) data which in turn collects earth’s surface/location and related information. Once data is collected, the analytics platform processes the data, applies relevant algorithms to generate insights. And why is such Geospatial analytics so important? Assume we want to solve Vehicle theft management problem which is quite a sensitive use case for car owners. We want to get alert notification whenever vehicle moves or departs from the designated zone. In connected vehicle platform based super app owners can use geofencing capability to tag a vehicle’s current location and generate alert based on any movement. Similarly connected vehicle telematics has multiple use case related to Geospatial problems. The following screenshot shows an illustrative diagram of connected vehicle based geofencing mechanism.



                           Fig 1.1 Illustrative example of Geofencing in connected vehicle platform


Sometime industry versions of e-sim are integrated during car manufacturing to provide such car connectivity. For any organization this type of IOT based connected vehicle information when integrated with parking lot, computer vision and other geospatial information can assist on multiple scenarios like below and many more.

  • Car driving assistance utilities
  • Traffic congestion related real time info
  • Parking lot availability inside commercial building
  • Car accident based preventive features

In our blog we will focus on how similar geo spatial analytics technical platforms of connected vehicles can be designed by Microsoft Azure Synapse Analytics Data Explorer based Kusto Query Language (KQL) and its inbuilt functions. For readers who are new to Azure analytics platform please refer Further read section at end and corresponding articles for conceptual understanding of this limitless analytics platform. While synapse data explorer KQL provides a wide list of rich functionalities to address geospatial queries, we will focus on some key points as listed here.


  • Geo distance calculation
  • Geo area inside polygon
  • GeoMap Visualization
  • Geospatial clustering


Developers’ prerequisite for academic purpose


We will use the debsdx table hosted inside synapse data explorer cluster/database as source data in the next section. This table is preloaded with car drop off, pickup location using datafile kept in public path-  We will use this data to leverage its location details like longitude and latitude for our Geospatial calculation and our understanding purpose. For readers if we need to upload this data for our academic purpose, we can follow the steps below.

  1. Download the data using Aure storage explorer desktop tool. Inside tool, we need to go to Connect to Azure storage option in left side and click on ADLS Gen2 container or directory option, then select Anonymously (my blob allows public access) radio button and give the public path as mentioned earlier inside blob container URL to download the data optionally to local desktop.
  2. If we cannot provision synapse data explorer due to logistic issues we can leverage free adx cluster Create a free Azure Data Explorer cluster. | Microsoft Docs and then follow one click ingestion method to ingest the data locally or from storage account directly. Refer further read section to learn more.

To illustrate Geospatial basic concepts, we have plotted a Geospatial map in Power BI which is in turn connected to Synapse data explorer table where we have loaded the data in previous steps. The following diagram uses drop-off longitude and latitude location of the table to render this Geomap.


                                  Fig 1.2 Power BI map using synapse data explorer


In this scenario a medical service company who provides both ambulance and air ambulance service are leveraging such connected vehicle platform data platform and related history signals which are already collated in their cloud analytics platform. For the emergency scenarios of a patient company uses air ambulance to fly the patient from designated pickup point to designated drop-off location near hospital. So, in this usecase medical service company want to focus Euclidian distance and not Manhattan or other distances using the signals transmitted by connected car platform.


Calculate geo distance using geospatial analytics capabilities in Synapse data explorer


To figure out the shortest distance we will use our debsdx table which has pickup locations, drop off locations and the road trip distance stored as attributes. Here we are interested here to find out the shortest distance between a pickup location and drop-off location hence we are retrieving the latitude, longitude details of pickup locations and trip distance and row identifier as prerequisite with the help of the following query.





debsdx | project MedallionID, PickupLatitude, PickupLongitude, DropoffLatitude, DropoffLongitude, TripDistanceMiles | sort by TripDistanceMiles desc





 Let us take a note of the Pickup Latitude, Pickup Longitude, Dropoff Latitude, Dropoff Longitude as we would use this information in following steps. From the previous result output, we see that road distance by car for pickup location to drop-off location is 67.77 Miles. Now we want to find out the geo-distance between these two locations. In the following code we used geo_distance_2points () function to calculate the shortest geo distance in meters and subsequently converted to miles.





print shortest_geodistance_miles = (geo_distance_2points (73.7851,40.6458, 73.6503, 41.467)) *0.000621371





Output of preceding query shows shortest distance is 57.1 Miles.  We need to note here that geo_distance_2points () function needs input in following format and output is in meter.





geo_distance_2points (longitude of location 1, latitude of location 1, longitude of location 2, latitude of location 2)





Now let us understand about geo area inside polygon and   geo_polygon_area () function capability.


Calculate geo area inside polygon using geospatial analytics capabilities of Synapse data explorer.

Let’s understand what Geo polygon is.  A polygon is defined by Esri as “a GIS object that stores its geographic representation—a series of x and y coordinate pairs that enclose an area—as one of its properties (or fields) in the row in the database”. Typically, it’s used to find objects, individual and other relevant entity inside a defined geospatial polygon and build subsequent actionable insights based on finding. In this section we want to focus on how to define and then find the area inside geospatial polygon. Defined Geo polygon and Connected vehicle telemetry can be combined to build insights starting from crime scenarios to parking lot-based use cases. Following picture is conceptual diagram of geospatial polygon for our understanding purpose only.



                                  Fig 1.3 Illustrative picture of geo polygon.

In the following code snippet, we are selecting a few sample geolocations for our geospatial polygon area computation purpose.



debsdx | project MedallionID, PickupLatitude, PickupLongitude, DropoffLatitude, DropoffLongitude, TripDistanceMiles | sort by TripDistanceMiles desc



 Let us take a note of these outcome i.e., Pickup Latitude, Pickup Longitude, Dropoff Latitude, Dropoff Longitude locations for our subsequent computation purpose.

In the following code we used geo_polygon_area () function to calculate the area inside geo polygon shape with the edges as location details selected in previous step.



let triparea = dynamic ({"type":"Polygon","coordinates”: [[-74.0352,40.7135], [-73.92,40.7721], [-73.9551,40.7888], [-73.9561,40.8036],[-74.0352,40.7135]}); print area = geo_polygon_area(triparea)



Following block is the outcome of code snippet execution highlighted in previous step.







Note that geo_polygon_area () function needs input in following format as shown below and output is in square meter. It can be both clockwise and counterclockwise coordinates.



geo_distance_2points (longitude of location 1, latitude of location 1, longitude of location 2, latitude of location 2 ….. longitude of location n, latitude of location n, longitude of location 1, latitude of location 1)



Area inside the  Geo shapes be calculated using such geo_polygon_area function. Now let us move on to map visualization capability using data explorer tool.

Visualize map in data explorer web dashboard

For positional data it needs extensive spatial analysis, and without visualization it cannot be complete. We would now walk you through the map rendering capability of data explorer. Note that this feature can only be executed in data explorer web dashboard currently.  From synapse studio we would jump to azure data explorer web user interface. In the below code snippet, we are using longitude and latitude of a few sample pickup locations to render the geospatial map inside data explorer web use interface.



debsdx | project PickupLongitude, PickupLatitude | render scatterchart with (kind=map)



It will result geospatial map inside data explorer web user interface.

We have now covered the map visualization part within the data explorer dashboard and will move to Geohash topic in the next section.


Geospatial clustering.

Geo spatial clustering is basically grouping of spatial data using geospatial tools or other techniques. In data science we can apply clustering algorithms on geospatial data to obtain the same. For Synapse data explorer KQL platform, it supports the following type of geospatial clustering methods.

  • Geographic location string-based encoding -Geohash
  • Quadrilateral hierarchical cells-S2 cells
  • Hexagonal Hierarchical space index-H3 cells

Here we are focusing on Geohash based geospatial clustering. Geohash capability deals with certain accuracy levels related to area coverage and eventually represents a rectangular area in plain surface. In the following code snippets, we are retrieving Geohash string value of co-ordinates with level18 accuracy. Note that each level of accuracy represents the area coverage range, the highest being 1 and lowest being 18.



print pickuplocationgeohash = geo_point_to_geohash (-74.0352,40.7135, 18)



The following result is the outcome of the previous code snippet using geo_point_to_geohash () function.






Note that geo_point_to_geohash () function needs input in the following format.

  geo_point_to_geohash (longitude of location, latitude of location, accuracy)

Note that we can convert back this geohash to a bing-url. The following code snippet depicts the geohash_to_map_url function capability. Here I have used another geo hash location and fetching the corresponding bin gurl of that geo hash.




// Use string concatenation to create Bing Map URL from a geohash, let point_to_map_url = (_point: dynamic, _title: string) { strcat ('', _point. coordinates [1],'_', _point.coordinates[0], '_', url_encode(_title)) }; // Convert geohash to center point, and then use 'point_to_map_url' to create Bing Map deep-link let geohash_to_map_url = (_geohash: string, _title: string) { point_to_map_url(geo_geohash_to_central_point(_geohash), _title) }; print geohash = 'w21zsgkr5qu8n2mq7b' | extend url = geohash_to_map_url (geohash, "Deb’s location")



 Outcome of preceding code generates this Bing Map Deep link URL  Bing Maps - Directions, trip planning, traffic cameras & more 


Data Architecture

The following reference architecture is based on how to build alert notification in a connected vehicle super app. While this architecture is focused on vehicle based real time alert system, it can be extended to other similar real time streaming use cases as well. Overall, this architecture represents an azure-based analytics platform for the connected car super App however can be tweaked on case-by-case basis. This reference architecture shows end to end vehicle telematics processing from IOT ingestion to geospatial based visualization dashboard. In this reference architecture vehicle stream data is ingested by azure event hub service in azure cloud platform. This service has the ability to stream millions of records in a second and can integrate seamlessly with various streaming endpoints/protocols. Once streaming data is ingested in azure event hub, we can leverage synapse data explorer native Data Connections feature inside portal blade to integrate event hub with synapse data explorer. Usually for connected vehicle scenarios there would be TB scale of data ingested almost every day hence data life cycle management is quite important for designing such architecture. Developers need to choose synapse data explorer retention policy and caching policy accordingly to optimize the cost of data platform. Refer Further read section to learn more on how to configure such data retention and caching policy. Now we have configured land streaming data in Synapse data explorer. As the next step we need to analyze the data to build observational analytics and subsequently we need to build an effective alert mechanism and send notification using the right channel. Leveraging Data explorer update policy to find abnormal pattern will be one recommended way to find out abnormal signals against which we need to generate alert in a dedicate alert table.


                                                      Fig 1.4- Architecture of geospatial analytics based super App


Azure platform provides services like logic apps which can be integrated to generate notification based on alert table data entry. Combining Azure communications services and logic app we can send SMS to consumer based on event. Using power apps and power bi platform we can build real time visualization and notification-based apps for target devices. Note that this architecture can be tweaked based on current on Prem deployment of any organization. Example we may leverage Kafka Sink connect deployed in Azure Kubernetes service if any organization is already using Kafka cluster in on Prem or cloud. Learn more about  such  technical details using the following references.

Further read



Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.