Document Intelligence preview adds more prebuilts, support for image and figures, and more!

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Azure AI Document Intelligence, formerly known as Form Recognizer, is an AI service for all your document understanding needs. The latest update previews new features including image and figure extraction, new prebuilt models for US Tax 1040 form and other common tax and mortgage forms. Custom models are also updated with the addition of confidence scores for tables, rows and cells, support for overlapping fields and updates to the classification model to support incremental training and Office file types.


In today's fast-paced digital world, businesses are drowning in a sea of documents, requiring manual review. Document Intelligence makes it easy to extract insights from documents, you can use the Layout API to extract content and structure to query documents for insights with the RAG (retrieval augmented generation) pattern. As tax season approaches in the US, you may need to process tax forms like 1040 or 1099 with the prebuilt models or you could build custom models in minutes to classify and extract specific fields from any form or document.


Gone are the days of tedious manual data entry. With Document Intelligence, your team can automate document processing, freeing up valuable time to focus on what really matters. Boost productivity, streamline operations, and uncover hidden insights—all with Azure AI Document Intelligence.


What is new in Preview? 

Document Intelligence continues to evolve adding new models and updates to existing models.


Layout Model

The Layout API extracts content and structure from PDF, images and Office file types like Word, PowerPoint and Excel, and HTML. The most recent update to layout is:




Figure and Image Detection

Documents like business plans, financial reports, manuals usually contain graphs and figures as well. For more complete ingestion of these document types, Layout has added figure and image detection, this includes extracting the bounding region of the image, associated captions and context. When using the content of a document to extract insights with a large language model (LLM), layout now enables the extraction and processing of information in embedded images and figures. Pair this feature with the formula add-on and you have a simple solution for extracting all the information from academic papers.


Hierarchical Document Structure

One of the challenges in document ingestion is not only extracting all the elements but also maintaining meaningful structure and semantic relationships. This understanding is vital for extracting meaningful insights, summarization, and contextual analysis. In the latest preview, layout added support for section hierarchies, where the paragraphs, sections, tables, and figure are grouped in respect to the document structure. You can use output to markdown format to easily get the document structure and its associate content in markdown.



Prebuilt Models

Prebuilt models offer an out-of-the-box solution that provides the fields for a known document type with a simple API call. Tax and mortgage processing in the US just got easier with the addition of the 1040, 1099 forms and the 1003 URLA, 1008 and closing disclosure mortgage form prebuilt models. Need to extend the schema of a prebuilt model to meet your specific needs? Just add the fields you need as query fields to extract the expanded schema.




New 1040 tax form  

Expanding the supported US tax forms with the introduction of the US Tax 1040 form and its many variations including: 1040, SR, ES, Schedules A, B, C, C-Ez, D, E, EIC, F, H, J, R, SE, 8812, 1, 2, 3. The 1040 model supports various scenarios from tax processing to income verification with the Id prebuilt model.



Try the new US tax 1040 model in the Document Intelligence Studio 


New US Mortgage forms  

With support for 1003 Uniform Residential Loan Agreement (URLA) form, the 1008 loan summary form and the closing disclosure, Document Intelligence now supports a mortgage processing and loan origination scenarios.



Try the new US Mortgage closing disclosure model in the Document Intelligence Studio 


New - Marriage certificates and Credit cards

With Document Intelligence, you can now extract data from marriage certificates and credit/debit cards with prebuilt models. Pair the credit card model with W-2 and 1099 models to create a workflow for storing personal finance documents or use the marriage certificates with contracts for completing legal document ingestion scenarios.



Try the new Marriage certificates model in the Document Intelligence Studio 




Try the new Credit cards model in the Document Intelligence Studio 



The invoice model has added support for tax items field in de, es, pt, and en-CA locales,expanded currency code support for Bosnia-Herzegovina Convertible Mark (BAM), Bulgarian Lev (BGN), Israeli New Shekel (ILS), Macedonian Denar (MKD), Russian Ruble (RUB), Thai Baht (THB), Turkish Lira (TRY), Ukrainian Hryvnia (UAH), and Vietnamese Dong (VND). Along with these new fields there are many AI quality improvements. 




Identity Documents and Receipt

The Id model added fields to extract information needed from EU IDs and driver licenses, as well as AI quality improvements. The Receipt model can now support non-thermal receipts with various formats, such as invoice-like and ticket-like format. Digital or screenshot receipts from diverse categories like transportation, communication, entertainment, flight, training, pharmacy and medical are also supported.


Custom Models


Custom Classification

Custom classification models, a type of deep-learning model, blend layout and language features to precisely detect and identify documents processed within your application. Previously, custom classification only supported PDF, TIFF, and images. With the added support for Office document types like Word, PowerPoint, and Excel files, you can now train a single model to classify documents irrespective of the document type.

A common challenge when training classifier models is the need to maintain the training dataset so you can update the model with new classes or improve the confidence of the model with additional samples for an existing class, with incremental training, you can now only provide the additional samples for a new or existing class without needing to maintain the entire training dataset.


Try out the updated custom classification model in the Document Intelligence Studio 


Custom Extraction

Need to extract fields from a document type not supported with prebuilt models? Train a custom model in minutes to extract the fields you need with grounding (ensuring the fields are contained within the document) and confidence. Custom models now produce a confidence score for tables, rows and cells. This is a key capability to support the human in the loop (HITL) pattern by using confidence thresholds to trigger human review to balance the extraction accuracy and review costs.

Custom neural models now also support overlapping fields for when you need to extract a span of text as two distinct fields. For instance the execution date and effective date of a contract.




Document Intelligence adds containers support for Read and Layout models in the latest GA version (2023-07-31). Install and run the containers for when you need the capabilities of Document Intelligence, but need to run the service locally or fully disconnected. See the documentation for more information on getting started with containers.



Get started with the preview features! 

The preview updates are available in only a few select regions that include US East, West US2 and West Europe.  The API version is 2024-02-29-preview. Check for the updated SDK in the documentation. 

  • Visit thewhat's new pageto learn more about all the new capabilities in Azure AI Document Intelligence  



Leave a Reply

Your email address will not be published. Required fields are marked *


This site uses Akismet to reduce spam. Learn how your comment data is processed.