Microsoft Purview- Paint By Numbers Series (Part 5) – Premium eDiscovery Overview and Settings

Posted by

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.




Before we start, please not that if you want to see a table of contents for all the sections of this blog and their various Purview topics, you can locate the in the following link:

Microsoft Purview- Paint By Numbers Series (Part 0) - Overview - Microsoft Tech Community



This document is not meant to replace any official documentation, including those found at  Those documents are continually updated and maintained by Microsoft Corporation.  If there is a discrepancy between this document and what you find in the Compliance User Interface (UI) or inside of a reference in, you should always defer to that official documentation and contact your Microsoft Account team as needed.  Links to the data will be referenced both in the document steps as well as in the appendix.

All of the following steps should be done with test data, and where possible, testing should be performed in a test environment.  Testing should never be performed against production data.


Target Audience

The Advanced eDiscovery (Aed) section of this blog series is aimed at legal and HR officers who need to understand how to perform a basic investigation.


Document Scope

We will create a case and configure the settings for this case in this section of the blog.




This document does not cover any other aspect of Microsoft E5 Compliance, including:

  • Data Classification
  • Information Protection
  • Data Protection Loss (DLP) for Exchange, OneDrive, Devices
  • Data Lifecycle Management (retention and disposal)
  • Records Management (retention and disposal)
  • Premium eDiscovery
    • Overview and Settings
    • Data Sources and Collections
    • Review Sets
    • Communications
    • Holds
    • Processing
    • Exports
    • Jobs
  • Insider Risk Management (IRM)
  • Priva
  • Advanced Audit
  • Microsoft Cloud App Security (MCAS)
  • Information Barriers
  • Communications Compliance
  • Licensing
  • It is presumed that you have a pre-existing of understanding of what Microsoft E5 Compliance does and how to navigate the User Interface (UI).

It is also presumed you are using an existing Information Types (SIT) or a SIT you have created for your testing.


If you wish to set up and test any of the other aspects of Microsoft E5 Compliance, please refer to Part 1 of this blog series (listed in the link below) for the latest entries to this blog.  That webpage will be updated with any new walk throughs or Compliance relevant information, as time allows.


Microsoft Compliance - Paint By Numbers Series (Part 1) - Sensitive Information Types - Microsoft Tech Community


Use Case


There are many use cases for Advanced eDiscovery.  For the sake of simplicity, we will use the following: Your organization has a Human Resources investigation against a specific user.



  • Data Sources – These are the locations (EXO, SPO, OneDrive) where searches will be performed.  These are all the custodians (users) being investigated.  This is not the users performing the investigation.
  • Collections – This is the actual search being performed.  Collections include user, keyword, data, etc.
  • Review Sets – Once a collection/search has been performed, the data most be reviewed.  This tab is where secondary searches can be done and a review of the data.
  • Communications – If the HR or legal team wishes, they can notify the user that they are under investigation.  You can also set up reminder notifications in this section of the UI. 
    • Note - This task is optional.
  • Hold – Once the data has been collected/searched or reviewed, either all or part of the data can be placed on legal hold.  This means that the data cannot be deleted by the end user and if they do, then only their reference to the data is deleted.  If the user deletes their reference, then the data is placed into a hidden hold directory.
  • Processing – This tab is related to the indexing of data in your production environment.  You would use this if you are not finding data that you expect and you need to re-run indexing activities.
    • Note - This task is optional.
  • Exports #1 – When referring to the tab, this provides the data from the case to be exported to a laptop or desktop.
  • Export #2 – This is also the term used to export a .CSV report.
  • Jobs – This provides a list of every job run in eDiscovery and is useful when trying to see the current status of your jobs (example – Collection, Review, Processing, Export, etc).  This is useful if you launch an activity and want to monitor its status in real-time.
  • Setting – High level analytics and settings and reports, etc.
  • Custodian – This is the individual being investigated.




  • Core vs Advanced eDiscovery (high level overview)
    • Core eDiscovery – This allows for searching and export of data only.  It is perfect for basic “search and export” needs of data.  It is not the best tool for data migration or HR and/or Legal case management and workflows.
    • Advanced eDiscovery – This tool is best used as a first and second pass tool to cull the data before handing that same data to outside council or legal entity.  This tool provides a truer work flow for discovery, review, and export of data along with reporting and redacting of data.
  • If you are not familiar with the Electronic Discovery Reference Model (EDRM), I recommend you learn more about it as it is a universal workflow for eDiscoveries in the United States.  The link is in the appendix.
  • For my test, I am using a file named “1-MB-Test-SSN-1-AeD” with the phrase “Friedrich Conrad Rontgen invented the X-Ray” inside it. This file name stands for 1MB file with SSN information for Advanced eDiscovery testing.
  • We will not be using all of the tabs in available in a AeD case.
  • How do user deletes of data work with AeD?
  • If the end user deletes the data on their end and there IS NO Hold, then the data will be placed into the recycle bin on the corresponding applications.
  • If the end user deletes the data on their end and there IS a Hold, then the data will NOT be placed into the recycle bin on the corresponding applications.  However, the user reference to the data will be deleted so they will believe that the data is deleted.



If you have performed Part 1 of this blog series (creating a Sensitive Information Type), then you have everything you need.  If you have not done that part of the blog, you will need to populate your test environment with test data for the steps to follow.


Create a Case

  1. Click Create Case




  1. Give the case a Name, Case Number (if applicable), and Case Description, and then click No, just go to the home page.
    1. Note – the more you put in the description, the better for reporting later on.  So, if you have received an email from HR, Legal, outside council, etc., you can cut and paste that information into the Case Description.




  1. You will now find yourself in the Case Overview






  1. With the case created, we will now run an investigation



Before we start collecting, reviewing and exporting data, we need to be sure the settings for the case are configured to your needs.

When you click on the case, you will see your Settings tab on the far right.  In that tab you will find 3 tabs

  • Case Information
  • Access & permissions
  • Search & analytics

We will go through each of these three tabs.



Case Information

If you click Select, you can change the case name, number, description, or change the status of the case. 




Under Actions, you can close the case, delete the case, or copy information to hand to Microsoft support if needed.





Access & Permissions

Under Access & Permission, if you click Select, you can add and/or remove users to manage this case.  Please note that a user must have other eDiscovery permissions configured in Purview for these case specific permissions to take effect.  Please reference the information on permissions in the Appendix and Links section below.




Search and Analytics

If you click Select, you can change the search settings on thr3eads and other functions.  Here are the official explanations for the top sections of this tab.

  • “Near duplicates/email threading: When turned on, duplicate detection, near duplicate detection, and email threading are included as part of the workflow when you run analytics on the data in a review set.


  • Document and email similarity threshold: If the similarity level for two documents is above the threshold, both documents are put in the same near duplicate set.


  • Minimum/maximum number of words: These settings specify that near duplicates and email threading analysis are performed only on documents that have at least the minimum number of words and at most the maximum number of words.”


  • Themes

“How does a person write a document? They generally start with one or more ideas they want to convey in the document, and compose using words that align with the ideas. The more prevalent an idea is, the more frequent the words that are related to that idea tend to be. This informs how people consume documents as well. The important thing to understand from reading a document is the ideas that the document is trying to convey, which ideas appear where, and what the relationships between the ideas are.


This can be extended to how a person wants to consume a set of documents. They want to see which ideas are present in the sets, and which documents are talking about those ideas. Also, if they find a particular document of interest, they want to be able to see documents that discuss similar ideas.


The Themes functionality in eDiscovery (Premium) attempts to mimic how humans reason about documents, by analyzing the themes that are discussed in a review set and assigning a theme to documents in the review set. In eDiscovery (Premium), Themes goes one step further and identifies the dominant theme in each document. The dominant theme is the one that appears the most often in a document.





At the bottom of this tab, you can enable and configure Optical Character Recognition (OCR) for the case at hand.  For more details on this, I recommend you look at the official links listed the Appendix and Links section below.





See the “Configure search and analytics settings” URL in the Appendix and Links section below for more detail on these settings.



Appendix and Links









Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.