Discover, Classify, and Protect Sensitive Data with the AIP Scanner

Most modern organizations have terabytes (or petabytes) of unstructured data sitting in their on-premises data repositories and SharePoint libraries. Managing this data the way you manage other corporate resources is a daunting but achievable task using tools that you likely already own. In this article, we will walk you through the discovery of sensitive data and show you options to classify and protect that data.


 


The AIP scanner allows you to scan your on premises data repositories against the standard Office 365 sensitive information types and custom types you build with keywords or regular expressions. Once the data is discovered, the AIP scanner(s) can aggregate the findings and display them in Analytics reports so you can begin visualizing your data risk and see recommendations for setting up protection rules based on the content.


1.png


To configure the scanner, there are a few steps you need to follow:



  • Configure on-premises prerequisites

    • Server

    • SQL

    • Installer account permissions

    • Local Service Account

    • Open required network locations

    • AIP scanner binaries



  • Configure Azure prerequisites

    • Global admin credentials

    • Cloud service account (creation or sync)

    • Create Azure AD applications for service authentication

    • Configuring AIP Azure Log Analytics (Optional)



  • AIP scanner profile configuration

  • AIP scanner installation


Now, this may seem like a lot of things, but don’t worry. We will walk you through the whole process so that it is as painless as possible. However, before production deployment we recommend that you read through the official documentation at https://docs.microsoft.com/en-us/azure/information-protection/deploy-aip-scanner to ensure that you will not run into any issues.


 


On-Premises Prerequisites



  • At least one Server (Physical or Virtual) capable of running the AIP scanner


    • The official specifications are listed in the docs here but at least 4 cores and 8GB RAM is required (more is highly recommended) and at least 10GB of free storage space for temporary files (again, more = better) 


  • A SQL Server Instance to store configuration and scanned file list (Microsoft uses SQL Server Express Edition installed locally on each of our AIP scanner servers but any supported version will work)

  • An installer account with sysadmin rights to the SQL instance and local admin rights on the Server (apologies for the overuse of bold, but these requirements are often missed)

  • An on-premises user account to run the AIP scanner service (e.g. Contoso\AIPScanner)


    • No special rights are needed for configuration, but this account will need read rights to all configured repositories to do discovery and read/write for labeling and protection


  • Internet connectivity that allows the following URLs over HTTPS (port 443):

    • *.aadrm.com

    • *.azurerms.com

    • *.informationprotection.azure.com

    • informationprotection.hosting.portal.azure.net

    • *.aria.microsoft.com




Installing Scanner Binaries


Installing the AIP scanner binaries is a very straight-forward process as they are included with the AIP client. Navigate to https://aka.ms/AIPClient and click the Download button. When presented with the download options, check the box next to AzInfoProtection.exe (NOTE: The AzInfoProtection_ul.exe client does NOT contain the Scanner binaries in the current version) and click the Next button. The download should start automatically. Once complete, double-click on the file and run through the quick setup on the prepared AIP scanner server.


 


Azure Prerequisites



  • Global Admin permissions for the tenant

  • Synchronized or created cloud service account


    • This is typically done using Azure AD Sync after the on-premises service account is created.  If you are not using Azure AD Sync to synchronize your on-premises service account, you will need to create an account in the cloud.  This can be accomplished manually by logging into the Azure Portal, or via PowerShell script. 
      For convenience, we have created an example named New-CloudServiceAccount.ps1 which you may review and download from https://aka.ms/MIPFiles/Scripts.


  • Configure Azure Applications necessary for AIP scanner authentication. 


 


Creating Azure AD Applications


We must create Azure AD Applications for AIP Authentication to allow the scanner to protect files non-interactively (you only need to run these the first time you are setting up the AIP scanner. You can use the same command created at the end to authenticate multiple AIP scanner servers). The official documentation for creating these applications is found here.


For convenience, we have created an example named New-AIPAuthToken.ps1 which you may review and download from https://aka.ms/MIPFiles/Scripts. This will create a file named Set-AIPAuthentication.txt on the desktop that will contain the command needed to authenticate your AIP scanner server(s).


 


Configuring AIP Azure Log Analytics (Optional)


Although this step is technically optional, we recommend configuring analytics prior to running your first scan so you can begin to visualize your data risk as shown in the initial image in this article. In the AIP blade of the Azure Portal, you will see Configure analytics (preview) under the Manage section. Click on this and you should see a page like the one below.


ala.png


If you already have a configured ALA Workspace for this purpose, check the box next to it and press OK. Otherwise, click the + Create new workspace link.


 


Fill in the items shown in the image below:



  • Log Analytics Workspace (must be unique across Azure)

  • Azure Subscription (If this is not populated, you will need to get access or have someone with access to the subscription create the workspace)

  • A new or existing Resource group

  • The Location closest to your users (usually this will be in the same geography as your tenant)

  • A Pricing tier (usually Per GB or Standalone. Free tier only stores logs for 7 days)

  • Press OK.
    ALANew.png


Finally, back in the Configure analytics (preview) blade, check the box next to the workspace and click OKala2.png


NOTE: Checking the box next to Enable deeper analytics allows the actual matched content to be stored in the Log Analytics workspace. This could include many types of sensitive information such as PII, Credit Card Numbers, and Banking Information. This option is typically used during testing of automatic conditions and not widely used in production settings due to the sensitive nature of the collected data. If this is used in a production setting, extreme caution should be taken with securing access to this workspace.


 


AIP Scanner Profile Configuration


As introduced in our previous blog post, configuration of the AIP scanner is now done via the Central Management User Interface in the Azure Portal. The PowerShell configuration commands for AIP scanner will be deprecated. We will quickly walk through the minimum configuration elements to install a functional scanner in discover mode.


 


The AIP scanner Profiles (Preview) blade can be found in the AIP blade on the left side in the Scanner section. Follow the steps below to create and configure an AIP scanner profile.



  • Click on Profiles (Preview) and click the + Add button at the top to create a new profile.
    Profiles.png

  • In the Add a new profile blade, provide a Profile name.

  • Under Policy enforcement, set Enforce to Off.

  • Click Save.
    NewProfile.png

  • Once the profile is saved, under Profile settings, click on Configure repositories.
    ConfRepo.png

  • In the Repositories blade, click + Add.

  • In the Repository blade, enter the path to your repository (\\Fileserver\Documents or https://SharePointServer/Documents)
    Repo.png

  • Leave the settings as Profile default and click Save.


There are many other options and settings for the Scanner profiles and we will dive into those further in another blog post. If you would like more information today, please see our official documentation.


 


Installing the AIP Scanner


We should now have all prerequisites in place to install the AIP scanner. To do this, type the command below in an Administrative PowerShell window.


Install-AIPScanner -Profile “ProfileName”

You will be prompted to enter the local AIP scanner service account credentials in Domain\AccountName format and to provide the SQL Server instance name (This will be ServerName or ServerName\SQLExpress depending on the version you installed).


 


If you encounter any errors, please validate that the installer account has the permissions mentioned in the On-Premises Prerequisites and you do not have any firewall issues reaching the SQL server or Azure.


 


Now that you have the AIP scanner service installed, you can run the Set-AIPAuthentication command to get the non-interactive authentication token.


 


In the Admin PowerShell prompt, run the code below to run the Set-AIPAuthentication command. This will ask for the credentials of the on-premises AIP scanner service account. The service is then restarted to allow it to use the new token and pull down policy.


$scannercred = Get-Credential
$AuthCommand = Get-Content ~\Desktop\Set-AIPAuthentication.txt
Start-Process powershell.exe -Credential $scannercred -NoNewWindow -ArgumentList $AuthCommand
Restart-Service AIPScanner

This command can be modified to point to the txt file in other locations if needed for use on future instances of the AIP scanner.  The same Set-AIPAuthentication command can be used on multiple servers in your tenant even if they use different profiles.


 


The scanner should be fully functional at this point and you can run the commands below to verify the state (should be idle) and start the initial discovery scan.


Get-AIPScannerStatus
Start-AIPScan

After a few minutes you will begin seeing data start to flow into your Data discovery (Preview) dashboard in the azure portal. Since you are only doing discovery, you will not see any labeled or protected files (unless you have been using AIP before running the scanner), but you will see the identified files and the sensitive data types found in the configured repositories.


discovery.png


The is also a new blade under Analytics named Recommendations (Preview) that will be populated by this data. Any sensitive information types discovered that do not have associated automatic classification conditions will display in this blade.


 


You may then click on the sensitive information type and a fly-out panel will allow you to assign the information type to a classification label. This allows you to quickly map your sensitive information to classification labels.


recommendations.png


NOTE: The AIP scanner will only trigger on conditions which are set to Automatic.


 


Once you have configured these conditions, you can return to the profile in the Azure portal and change the settings to the ones below.



  • Schedule: Always

  • Info types to be discovered: Policy only

  • Enforce: On

  • Save
    Profile2.png


Because we set the schedule to Always, the scanner will begin monitoring the files automatically within 5 minutes. If you want to start the scan yourself, follow the instructions below.



  • In the AIP blade, under Scanner, click on Nodes.

  • Select your AIP Scanner server and click Start in the toolbar.


The result will be similar to the image shown below with labeled and protected files and the distribution graph showing in the Data discovery (Preview) dashboard.


1.png


Please let us know in the comments if you have any questions on this approach. If you are interested in how Microsoft uses the AIP scanner, please see the MSIT showcase at https://aka.ms/ScannerShowcase.


 


Thanks,


 


The Information Protection Customer Experience Engineering Team

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.