How to: Handle duplicate records in Azure Data Explorer

This post has been republished via RSS; it originally appeared at: Azure Data Explorer articles.

Azure Data Explorer is an append only database that isn’t designed to support frequent data deletion. If you accidentally ingest your data into Azure Data Explorer multiple times, the following tips can help you handle the duplicate records:

  1. Filter out the duplicate rows in the data during query. The arg_max() aggregated function can be used to filter out the duplicate records and return the last record based on the timestamp (or another column).
  2. Filter duplicates during the ingestion process.
  3. Drop extents with duplicated records and re-ingest the data. // create table with the extent ids that include the duplicate data // add the specific date .set ExtentsToCompress <| bla //original table name | extend eid = extent_id() | dt=ingestion_time() // one option to find the date | where dt in a date range // alternative option to find the date |summarize by eid // present extent ids ExtentsToCompress // ingest the distinct rows into a temp table // increase performance .set BlaTmp <| bla | extend eid = extent_id()| where eid in (ExtentsToCompress) | project-away eid | distinct * // drop extents with duplicates values .drop extents <| .show table bla extents | where ExtentId in(ExtentsToCompress) // re-ingest the distinct values .set-or-append bla <| BlaTmp
  4. For few records, use purge command for remove specific records. Note that data deletion using the .purge command is designed to protect personal data and should not be used in other scenarios. It is not designed to support frequent delete requests, or deletion of massive quantities of data, and may have a significant performance impact on the service

 

For more information regarding how to handle queries with duplicated records read:  Handle duplicate data in Azure Data Explorer 

 

 

Learn more about Azure Data Explorer (Kusto):

  1. Azure Data Explorer
  2. Documentation
  3. Course – Basics of KQL
  4. Query explorer
  5. Azure Portal
  6. User Voice
  7. Cost Estimator

Join us to share questions, thoughts, or ideas about Azure Data Explorer (Kusto) and receive answers from the diverse and knowledgeable Azure Data Explorer community.

 

Azure Data Explorer product team

“Join the conversation on the Azure Data Explorer community”.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.