Introducing Native Document Support for PII Detection (Public Preview)

Posted by

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Azure AI Language’s PII (Personally Identifiable Information) detection service has empowered numerous customers in securing sensitive information. It identifies, extracts, and redacts PII entities, such as person names, addresses, and credit card details, within unstructured text. The service accommodates a diverse array of PII categories, including region-specific information, and supports 79 languages. While our current offerings predominantly accept text as input, we have been actively exploring avenues to expand our capabilities further.


In the current data-driven landscape, safeguarding sensitive information within intricate document structures poses a significant challenge. Conventional methods, involving manual pre-processing and post-processing, are time-consuming and prone to errors, compromising compliance.


Today, we are excited to announce the public preview of native document support for PII detection. This capability can now identify, categorize, and redact sensitive information in unstructured text directly from complex documents, allowing users to ensure data privacy compliance within a streamlined workflow. It effortlessly detects and safeguards crucial information, adhering to the highest standards of data privacy and security. The formats currently supported are .pdf, .docx and .txt.


Here is a glimpse of what this new feature is capable of:

Screenshot - Blog.png

Image caption: Comparison of a source PDF document with PII (left) and the redacted output document (right).


Why use this feature?

  • Skip the need for pre-processing your documents: Perform PII detection and redaction directly on the documents without the hassle of text extraction.
  • Preserve layout and structure: Receive the document as an output in the same format, ensuring the preservation of the original layout and structure.
  • Perform multiple tasks in one request: Seamlessly execute multiple tasks spanning various Azure AI Language capabilities within a single request.
  • Configure redaction to your needs: Tailor the PII detection and redaction process to your preferences, including selecting entity types to redact and defining characters for masking.

How does this work?

  1. Upload your documents to your Azure blob storage.
  2. Request the service for PII detection/redaction.
  3. Check the job status.
  4. Retrieve the completed results from the storage account.

What’s more: Document-based support extended to Azure AI Language capabilities.

Alongside native document support for PII detection, customers can now leverage document inputs for summarization: learn more.

Await further updates as we add native document support for more Azure AI Language skills in the future.


Stay tuned: Available in public preview this December '23

Visit our documentation to learn more about how you can integrate this cutting-edge solution into your workflow once it's available.


Have thoughts or questions? We value your feedback! Feel free to share your comments and insights below – we're eager to hear from you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.