MSTICPy and Jupyter Notebooks in Azure Sentinel, an update

We published an overview of MSTICPy 18 months ago and a lot has happened since then with many changes and new features. We recently released 1.0.0 of the package (it’s fashionable in Python circles to hang around in “beta” for several years) and thought that it was time to update the aging Overview article.


 


What is MSTICPy?


MSTICPy is a package of Python tools for security analysts to assist them in investigations and threat hunting, and is primarily designed for use in Jupyter notebooks. If you’ve not used notebooks for security analysis before we’ve put together a guide on why you should.


 


The goals of MSTICPy are to:



  1. Simplify the process of creating and using notebooks for security analysis by providing building-blocks of key functionality.

  2. Improve the usability of notebooks by reducing the amount of code needed in notebooks.

  3. Make the functionality open and available to all, to both use and contribute to.


MSTICPy is organized into several functional areas:



  • Data Acquisition – is all about getting security data into the notebook. It includes data providers and pre-built queries that allow easy access to several security data stores including Azure Sentinel, Microsoft Defender, Splunk and Microsoft Graph. There are also modules that deal with saving and retrieving files from Azure blob storage and uploading data to Azure Sentinel and Splunk.

  • Data Enrichment – focuses on components such as threat intelligence and geo-location lookups that provide additional context to events found in the data. It also includes Azure APIs to retrieve details about Azure resources such as virtual machines and subscriptions.

  • Data Analysis – packages here focus on more advanced data processing: clustering, time series analysis, anomaly identification, base64 decoding and Indicator of Compromise (IoC) pattern extraction. Another component that we include here but really spans all of the first three categories is pivot functions – these give access to many MSTICPy functions via entities (for example, all IP address related functions are accessible as methods of the IpAddress entity class.)

  • Visualization – this includes components to visualize data or results of analyses such as: event timelines, process trees, mapping, morph charts, and time series visualization. Also included under this heading are a large number of notebook widgets that help speed up or simplify tasks such as setting query date ranges and picking items from a list. Also included here are a number of browsers for data (like the threat intel browser) or to help you navigate internal functionality (like the query and pivot function browsers).


There are also some additional benefits that come from packaging these tools in MSTICPy:



  • The code is easier to test when in standalone modules, so they are more robust.

  • The code is easier to document, and the functionality is more discoverable than having to copy and paste from other notebooks.

  • The code can be used in other Python contexts – in applications and scripts.


 


Companion Notebook


Like many of our blog articles, this one has a companion notebook. This is the source of the examples in the article and you can download and run the notebook for yourself. The notebook has some additional sections that are not covered in the article.


The notebook is available here.


 


Documentation and Resources


Since the original Overview article we have invested a lot of time in improving and expanding the documentation – see msticpy ReadTheDocs. There are still some gaps but most of the package functionality has detailed user guidance as well as the API docs. We do also try to document our code well so that even the API documents are often informative enough to work things out (if you find examples where this isn’t the case, please let us know). 


 


In most cases we also have example notebooks providing an interactive illustration of the use of a feature (these often mirror the user guides since this is how we write most of the documentation). They are often a good source of starting code for projects. These notebooks are on our GitHub repo.


 


Getting Started Guides


If you are new to MSTICPy and use Azure Sentinel the first place to go is the Use Notebooks with Azure Sentinel document. This will introduce you the the Azure Sentinel user interface around notebooks and walk you through process of setting up an Azure Machine Learning (AML) workspace (which is, by default, where Azure Sentinel notebooks run). One note here – when you get to the Notebooks tab in the Azure Sentinel portal, you need to hit the Save notebook button to save an instance of one of the template notebooks. You can then launch the notebook in the AML notebooks environment.


 


The next place to visit is our Getting Started for Azure Sentinel notebook. This covers some basic introductory notebook material as well as essential configuration. More advanced configuration is covered in Configuring Notebook Environment notebook – this covers configuration settings  in more detail and includes a section on setting up a Python environment locally to run your notebooks.


 


Although this article is aimed primarily at Azure Sentinel users, you can use MSTICPy with other data sources (e.g. Splunk or anything you can get into a pandas DataFrame) and in any Jupyter notebook environment. The Azure Sentinel notebooks can be found in our Notebooks GitHub repo.


 


Notebook Initialization


Assuming that you have a blank notebook running (in either AML or elsewhere) what do you do next?


 


Most of our notebooks include a more-or-less identical setup sections at the beginning. These do three things:



  1. Checks the Python and MSTICPy versions and updates the latter if needed.

  2. Imports MSTICPy components.

  3. Loads and authenticates a query provider to be able to start querying data.


setup_cell.png


If you see warnings in the output from the cell about configuration sections missing you should revisit the previous Getting Started Guides section. This cell includes the first two functions in the list above. The first one – running utils.check_versions() – is not essential in most cases once you have your environment up and running but it does do a few useful tweaks to the notebook environment, especially if you are running in AML.


 


The init_notebook function automates a lot of import statements and checks to see that the configuration looks healthy.


 


The third part of the initialization loads the Azure Sentinel data provider (which is the interface to query data) and authenticates to your Azure Sentinel workspace. Most data providers will require authentication.


data_connect.png


Assuming you have your configuration set up correctly, this will usually take you through the authentication sequence, including any two-factor authentication required.


 


Data Queries


Once this setup is complete, we’re at the stage where we can start doing interesting things!


MSTICPy has many pre-defined queries for Azure Sentinel (as well as for other providers). You can choose to run one of these predefined queries or write your own. This list of queries documented here is usually up-to-date but the code itself is the real authority (since we add new queries frequently). The easiest way to see the available queries is with the query browser. This shows the queries grouped by category and lets you view usage/parameter information for each query.



Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.