Site icon TheWindowsUpdate.com

Why Use Jupyter for Security Investigations?

Follow @ianhellen on twitter.


 


What is Jupyter?


Jupyter is an interactive development and data manipulation environment hosted in a browser. It takes code that you type into a cell, executes it and returns the output to you. Here is an example:


 


For more introductory information and sample notebooks go to jupyter.org. and the jupyter introductory documentation


Why Jupyter?


“Why would I use Jupyter notebooks to work with Azure Sentinel data rather than the built-in query and investigation tools?” might be your first question. And the first answer is that, usually, you wouldn’t. In most cases, the scenario and data that you are investigating can be handled perfectly well in with the coming graphical investigation tool, with Log Analytics queries and cool case features like Bookmarks.


The second point to make is that it is not an either/or question .You should think about Jupyter notebooks as something to use to supplement the built-in and growing capabilities of the Azure Sentinel portal. 


 


One reason that you might want to reach for Jupyter is when the complexity of what you are looking for becomes too high. “How complex is too complex?” is a difficult question to answer but some guidelines might be:



Some of the other benefits of working in Jupyter are outlined in the following sections.


 


Data Persistence, Repeatability and Backtracking


One of the painful things when working on a more complex security investigation is keeping track of what you have done. You might easily find yourself with tens of queries and results sets – many of which turned out to be dead ends. Which ones do you keep? How easy is it to backtrack and re-run the queries with different values or date ranges? How do you accumulate the useful results in a single report? What if you want to re-run the same pattern on a future investigation?


With most data-querying environments the answer is a lot of manual work and heavy reliance on good short-term memory. Jupyter, on the other hand, gives you a linear progression through the investigation – saving queries and data as you go. With the use of variables through the progression of the queries (e.g. for time ranges, account names, IP addresses, etc.) it also makes it much easier to backtrack and re-run and to reuse the entire workflow in future investigations.


 


Scripting and Programming environment


In Jupyter you are not limited to querying and viewing results but have the full power of a programming language. Although you can do a lot in a flexible declarative language like Kql (or others like SQL), being able to split your logic into procedural chunks is often helpful and sometimes essential. A declarative language means that you need to encode your logic in a single (possibly complex) statement, while procedural languages allow you to execute logic in a series of steps.


Being able to use procedural code lets you:



 


Joining to External Data


Most of your telemetry/event data will be in Azure Sentinel workspace tables but there will often be exceptions:



Any data that is accessible over your network or from a file can be linked with Azure Sentinel data via Python and Jupyter.


 


Access to Sophisticated Data Processing, Machine Learing and Visualization


Azure Sentinel and the Kusto/Log Analytics data store underlying it have a lot of options for visualization and advanced data processing (even clustering, windowed statistical and machine learning functions) and more capabilities are being added all the time. However, there may be times when you need something different: specialized visualizations, machine learning libraries or even just data processing and transformation facilities not available in the Azure Sentinel platform. You can see examples of these in some of the Azure Sentinel sample notebooks (see References at the end of the document).


Some well-known examples of these in the Python language are:



 


Why Python?


Jupyter can be used with many different languages – what makes Python a good choice?


 


Popularity


It is very likely that you already have Python coders in your organization. It is now the most widely taught language in Computer Science courses and used widely in many scientific fields. It is also frequently used by IT Pros — where it has largely replaced perl as the go-to language for scripting and systems management — and by web developers (many popular services such as DropBox and Instagram are almost entirely written in Python).


 


Ecosystem


Driven by this popularity, there is a vast repository of python libraries available on PyPi and nearly 1 million python repos on Github. For many of the tools that you need as a security investigator – data manipulation, data analysis, visualization, machine learning and statistical analysis – no other language ecosystem has comparable tools.


One remarkable point here is that pretty much every major python package and the core language itself are open source and written and maintained by volunteers.


 


Running Python Code in a Kusto Query


An interesting feature for the Azure Sentinel Kusto (a.k.a. Log Analytics) data store that was recently released to public preview is the ability to run python code as part of a Kusto query. For more information see this document.


 


Alternatives to Python


You can use other language kernels with Juypter, and you can mix and match languages (to a degree) within the same notebook using ‘magics’ that allow execution of individual cells using another language. For example, you could retrieve data using a PowerShell script cell, process the data in python and use JavaScript to render a visualization. In practice, this can be a little trickier than it sounds but certainly possible with a bit of hand-wiring.


 


References


Exit mobile version