This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.
The Blob Inventory feature provides an overview of your containers, blobs, snapshots, and blob versions within a storage account. You can make use of the inventory report to understand various attributes of blobs and containers such as your total data size, age etc. depending upon the available fields for the inventory reports.
When it comes to gaining insights on breakup of blobs based on the type, access tier etc., there are certain limitations when it comes to the current available metrics. However, you can make use of the inventory report and merging that with the power of running SQL queries in Synapse help you with further drilldown by parsing these reports to that can help you taking further decisions ahead.
By default, the Blob Inventory feature is disabled. You need to enable the feature and based on our requirement, we can either enable the feature for blob or container level. You can refer to the link for the detailed steps that you need to configure the inventory report and you can opt only for the fields that will help with your use case:
Azure Storage blob inventory | Microsoft Learn
Now, before we move further in this blog, there are 2 pre-requisites:
- Enable the Blob Inventory feature for blobs/container along with all the required fields.
- Create an Azure Synapse Workspace.
Please note that the Blob Inventory and Azure Synapse Workspace (used in this blog) are paid features.
Let’s take a look at sample use cases to gain insights on objects inside a container. You will need a blob inventory report having a prefix rule match of the target container. In case you want to use this for account level, that’s also feasible as well.
Case1: We need a count breakup based on Access Tier
Query
SELECT AccessTier AS Tier, COUNT(*) As TotalBlobCount FROM OPENROWSET( bulk '<URL to your inventory CSV file>', format='csv', parser_version='2.0', header_row=true ) AS Source GROUP BY AccessTier
|
Output:
Case 2: We need a count breakup based on Blob Type
Query
SELECT BlobType AS BlobType, COUNT(*) As TotalBlobCount FROM OPENROWSET( bulk '<URL to your inventory CSV file>', format='csv', parser_version='2.0', header_row=true ) AS Source GROUP BY BlobType
|
Output:
Case 3: We need to identify blobs over a particular size. In the below sample we took that as 2 MB for now
SELECT Name, [Content-Length] FROM OPENROWSET( bulk '<URL to your inventory CSV file>', format='csv', parser_version='2.0', header_row=true ) AS Source WHERE [Content-Length]>2000000
|
Output:
Case 4: We need to count the blobs that were created in the year 2023.
SELECT LEFT([Name], CHARINDEX('/', [Name]) - 1) AS Container, COUNT(Name) As TotalBlobCount FROM OPENROWSET( bulk '<URL to your inventory CSV file>', format='csv', parser_version='2.0', header_row=true ) AS Source where "Creation-Time" like '%2023-%' |
Output
Similarly, if you want breakup of counts on a particular field of the report, it can be done is similar fashion. In case, you need details on the account level then you can tweak your report configurations and leverage the same queries ahead.
Hope this helps!