Site icon TheWindowsUpdate.com

How to aggregate the Azure Storage Blob Logs with Python

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Background

This article describes how to aggregate the Azure Storage logs collected using the Diagnostic settings in Azure Monitor when selecting an Azure Storage Account as destination. This approach downloads the logs and aggregates them on your local machine.

Please keep in mind that in this article, we only copy the logs from the destination storage account. They will remain on that storage account until you delete them.

 

At the end of this article, you will have access to a CSV file that will contain the information for the current log structure (e.g.: timeGeneratedUTC, resourceId, category, operationName, statusText, callerIpAddress, etc.).

This script was developed and tested using the following versions but it is expected to work with previous versions:

 

Approach

 

This article is divided into two steps:

  1. Create Diagnostic Settings to capture the Storage Logs and send them to an Azure Storage Account

  2. Use AzCopy and Python to download and to aggregate the logs

Each step has a theoretical introduction and a practical example.

 

1. Create a Diagnostic Settings to capture the storage logs and send then to a storage account

 

Theoretical introduction

Critical and business processes that rely on Azure resources can be monitored (availability, performance, and operation) using Diagnostic Settings. Please review this documentation to understand better about Monitoring Azure Blob Storage.

 

To collect resource logs, you must create a diagnostic setting. When creating a diagnostic setting you can specify one of the following categories of operations for which you want to collect logs (please see more information here Collection and routing)

Support documentation:

As mentioned above, in this article we will explore the scenario of using an Azure Storage account as the destination. Please keep in mind of the following:

To understand how to create a Diagnostic Setting please review this documentation Create a diagnostic setting. This documentation shows how to create a diagnostics setting to send the logs to a Log Analytics workspace. To follow this article, you will need, on the Destination details, to select "Archive to a storage account" instead of "Send to Log Analytics workspace". You can use this documentation if you want to send the logs to a Log Analytics workspace. 

 

An important remark is that in this article, we only copy the logs to the local machine, we do not delete any data from your storage account.

 

Practical example

Following this documentation (Create a diagnostic setting), I created a diagnostic setting and selected the following Logs categories ['StorageRead', 'StorageWrite', 'StorageDelete'] and Metrics 'Transaction'. Please keep in mind that for this article, I will only create a diagnostic setting for blobs although it is possible to create a diagnostic setting also for Queue, Table, and File.

 

 

Please note that on the storage account defined as the "Destination", you should see the following containers: ['insights-logs-storagedelete', 'insights-logs-storageread', 'insights-logs-storagewrite']. Also, it could take some time for the containers to be created, and it will depend if you selected all the categories and when any log is created for each category.

 

2. Use AzCopy and Python to download and aggregate the logs

 

Theoretical introduction

 

In this step, we will use AzCopy to retrieve the logs from the Storage Account and then, we will use Python to consolidate the logs.

 

AzCopy is a command-line tool that moves data into and out of Azure Storage. Please review our documentation about AzCopy Get started with AzCopy. On this documentation you will understand how to Download AzCopy, Run AzCopy, and how to Authorize AzCopy.

 

Practical example

For this practical example, we need two storage accounts:

 

Prerequisites

 

Python script explained

 

Please find below all Python script components explained. The full script will be available after.


Imports needed for the script

 

import os import subprocess import shutil import pandas as pd

 

Auxiliary functions

 

Function to list all files under a specific directory:

 

# Inputs: # dirName - Directory path to get all the files # Returns: # A list of all files under the dirName def getListOfFiles(dirName): # create a list of file and sub directories # names in the given directory listOfFile = os.listdir(dirName) allFiles = list() # Iterate over all the entries for entry in listOfFile: # Create full path fullPath = os.path.join(dirName, entry) # If entry is a directory then get the list of files in this directory if os.path.isdir(fullPath): allFiles = allFiles + getListOfFiles(fullPath) else: allFiles.append(fullPath) return allFiles

 

Function to retrieve the logs using AzCopy:

 

# Inputs: # azcopy_path: Path to the AzCopy folder # storageEndpoint: Storage endpoint # sasToken: SAS token to authorize the AzCopy operations # path: Path where the logs are on the Azure Storage Account # localStorage: Path where the logs will be stored on the local machine # Returns: # The logs as they are on the Azure Storage Account def getLogs(azcopy_path, storageEndpoint, sasToken, path, localStorage): # Define any additional AzCopy command-line options as needed options = "--recursive" # Construct the source_url source_url = storageEndpoint + path + sasToken # Construct the AzCopy command azcopy_command = azcopy_path + " " + "copy " +'"'+ source_url + '" ' + localStorage + " " + options # Execute the AzCopy command subprocess.run(azcopy_command, shell=True)

 

Parameters definition

 

Please see below the parameters that we need to specify - Information need during the script execution:

 

# ------------------------------------------------------------------------------------------------------- # AzCopy path # ------------------------------------------------------------------------------------------------------- azcopy_path = "C:\\XXX\\azcopy_windows_amd64_10.19.0\\azcopy.exe" # ------------------------------------------------------------------------------------------------------- # Storage account information where the logs are being stored (storage account logs destination info): storageAccountName = "XXX" storageEndpoint = "https://{0}.blob.core.windows.net/".format(storageAccountName) sasToken = "XXXX" # ------------------------------------------------------------------------------------------------------- # Storage account to be logged. Information regarding the storage account where we enabled the Diagnostic Setting logs subscriptionID = "XXX" resourceGroup = "XXXX" storageAccountNameGetLogs = "XXXX" start = "XXXX" # The next variables are composed based on the information presented above storageDeleteLogs = "insights-logs-storagedelete/resourceId=/subscriptions/" + subscriptionID + "/resourceGroups/" + resourceGroup + "/providers/Microsoft.Storage/storageAccounts/" + storageAccountNameGetLogs + "/blobServices/default/" + start storageReadLogs = "insights-logs-storageread/resourceId=/subscriptions/" + subscriptionID + "/resourceGroups/" + resourceGroup + "/providers/Microsoft.Storage/storageAccounts/" + storageAccountNameGetLogs + "/blobServices/default/" + start storageWriteLogs = "insights-logs-storagewrite/resourceId=/subscriptions/" + subscriptionID + "/resourceGroups/" + resourceGroup + "/providers/Microsoft.Storage/storageAccounts/" + storageAccountNameGetLogs + "/blobServices/default/" + start # ------------------------------------------------------------------------------------------------------- # Local machine information - Path on local machine where to store the logs # ------------------------------------------------------------------------------------------------------- search = "logs_" + start.replace("/", "_") logsDest = "C:\\XXX\\XXX\\Desktop\\XXX\\" + storageAccountNameGetLogs + "\\" + search + "\\" # The next variables are composed based on the information presented above. # The following folders will store temporarily all the individual logs. They will be deleted after all the logs are consolidated localStorageDeleteLogs = logsDest + "storagedeleteLogs" localStorageReadLogs = logsDest + "storagereadLogs" localStorageWriteLogs = logsDest + "storagewriteLogs"

 

To download all the logs

 

If you want to download all the logs (Delete, Read, Write operations), please keep the code below as it is. Comment the lines regarding the logs that you do not want to download. Just add # at the beginning of the line.

 

print("\n") print("#########################################################") print("Downloading logs from the requests made on the storage account name: {0}".format(storageAccountNameGetLogs)) print("\n") getLogs(azcopy_path, storageEndpoint, sasToken, storageDeleteLogs, localStorageDeleteLogs) getLogs(azcopy_path, storageEndpoint, sasToken, storageReadLogs, localStorageReadLogs) getLogs(azcopy_path, storageEndpoint, sasToken, storageWriteLogs, localStorageWriteLogs)

 

Merge all log files into a single file

 

To merge all the logs into a single file (csv format), please run the following code:

 

# Inputs: # logsDest: Path on local machine (Where to store the logs) # Returns: # A csv file sorted by time asc, and some expanded fields print("#########################################################") print("Merging the log files") print("\n") read_files = getListOfFiles(logsDest) destinationFileJson = logsDest + "logs.json" with open(destinationFileJson, "wb") as outfile: for f in read_files: with open(f, "rb") as infile: outfile.write(infile.read()) # Read the JSON file into a DataFrame df = pd.read_json(destinationFileJson, lines=True) # Sort by time asc df = df.sort_values('time') # Change time format df['time'] = pd.to_datetime(df['time']) # Split resourceId to create three new columns (subscription, resourceGroup, provider) df['subscription'] = df['resourceId'].apply(lambda row: row.split("/")[2]) df['resourceGroup'] = df['resourceId'].apply(lambda row: row.split("/")[4]) df['provider'] = df['resourceId'].apply(lambda row: row.split("/")[6]) # Split properties column to create a column for each property df = pd.concat([df.drop('properties', axis=1), df['properties'].apply(pd.Series)], axis=1) # Split identify column to create a column for each identify df = pd.concat([df.drop('identity', axis=1), df['identity'].apply(pd.Series)], axis=1) df = df.rename(columns={'time' : 'timeGeneratedUTC', 'type': 'authenticationType', 'tokenHash': 'authenticationHash'}) df = df.reset_index(drop=True) # Save log file in csv format destinationFileCSV = logsDest + "logs.csv" df.to_csv(destinationFileCSV, sep = ",", index = False) print("######################################################### \n") print("Clean temporary files \n") if os.path.exists(destinationFileJson): os.remove(destinationFileJson) print(f"{destinationFileJson} has been deleted.") else: print(f"{destinationFileJson} does not exist.") print("\n") try: shutil.rmtree(localStorageDeleteLogs) print(f"{localStorageDeleteLogs} and its contents have been deleted.") except OSError as e: print(f"Error: {localStorageDeleteLogs} and its contents cannot be deleted. {e}") print("\n") try: shutil.rmtree(localStorageReadLogs) print(f"{localStorageReadLogs} and its contents have been deleted.") except OSError as e: print(f"Error: {localStorageReadLogs} and its contents cannot be deleted. {e}") print("\n") try: shutil.rmtree(localStorageWriteLogs) print(f"{localStorageWriteLogs} and its contents have been deleted.") except OSError as e: print(f"Error: {localStorageWriteLogs} and its contents cannot be deleted. {e}") print("\n ######################################################### \n") print("Script finished. The logs from the requests made on the storage account name {0} are merged.".format(storageAccountNameGetLogs)) print("Please see below resources created. \n") print("Local machine storage merged logs location:") print("- csv file: ", destinationFileCSV) print("\n#########################################################")

 

The full python script is attached to this article.


Output

To understand better the parameters included on the logs after executing this full code script, please review Azure Monitor Logs reference - StorageBlobLogs.

 

Disclaimer:

 

Exit mobile version