Geospatial joins are now supported by Azure Data Explorer

This post has been republished via RSS; it originally appeared at: Azure Data Explorer articles.

This blog post covers a new geo function and the basics of geospatial joins.

 

geo_polygon_to_s2cells

geo_polygon_to_s2cells takes a polygon or multipolygon and a S2 cell level as input and creates an array of all token strings which cover it. Be aware that covering a large-area polygon with small-area cells can lead to a huge amount of covering cells. As a result, the query might return null.

 

KQL-Query:

 

let polygon = dynamic({"type":"Polygon","coordinates":[[[0,0],[0,50],[100,50],[0,0]]]}); print s2_cell_token_count = array_length(geo_polygon_to_s2cells(polygon, 5));

 

 Result:

s2_cell_token_count
286

 

Geospatial joins

This query pattern is oftentimes used in various mobility solutions (geospatial telemetry and static reference data), geospatial risk analysis and agriculture optimization using weather data. It is based on the three-dimensional S2 geometry and the functions geo_polygon_to_s2cells and geo_point_in_polygon. By use of this functionality a geospatial join consists of a coarse-grained join using the S2 cell coverage and the exact validation using the geo_point_in_polygon function.

 

The four main steps:

  1. Converting polygons to S2 cells of level k,
  2. Converting points to the same S2 cells level k,
  3. Joining on S2 cells,
  4. Filtering by geo_point_in_polygon().

 

The following picture explains the flow of the entire KQL query. First you need to choose the right S2 cell level. It should not be too big and not too small (it is not recommended to use more than 10.000 cells, 65535 are possible in theory). Second you are creating the S2 cell tokens for the static dataset. Next you are joining the polygons with the timeseries based on the tokens. This provides you with a dataset which might still has some false positives (S2 cell overlap) and that is why you need to end the statement with a check if the actual point is in the polygon.

geospatialJoin.png

 

This is a concrete example of a geospatial join based on the StormEvents and US_States table located on the help cluster. It calculates the damage in $ by state.

 

KQL-Query:

 

let join_level = 4; US_States | project State = features.properties.NAME, polygon = features.geometry | extend covering = geo_polygon_to_s2cells(polygon, join_level) | mv-expand covering to typeof(string) | join kind = inner hint.strategy = broadcast ( StormEvents | project BeginLon, BeginLat , DamageProperty | extend covering = geo_point_to_s2cell(BeginLon, BeginLat, join_level) ) on covering | where geo_point_in_polygon(BeginLon, BeginLat, polygon) | summarize CountOfEvents=count(), DamageInDollar=sum(DamageProperty) by tostring(State) | top 3 by DamageInDollar desc

 

Result:

State CountOfEvents DamageInDollar

Kansas

2298

533352500

Texas

3889

488525500

Ohio

852

371630500

REMEMBER: these articles are REPUBLISHED. Your best bet to get a reply is to follow the link at the top of the post to the ORIGINAL post! BUT you're more than welcome to start discussions here:

This site uses Akismet to reduce spam. Learn how your comment data is processed.