Analyze complex documents with Azure Document Intelligence Markdown Output and Azure OpenAI

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

Understanding Azure Document Intelligence Layout Model

The Azure Document Intelligence Layout model is a powerful tool within the Azure AI ecosystem designed to understand and interpret the layout and structure of documents. It can analyze various elements, such as text, tables, and selection marks, making it an invaluable asset for processing complex documents. Especially extracting tables is a key requirement for processing documents containing large volumes of data typically formatted as tables. The Layout model extracts tables in the pageResults section of the JSON output. Extracted table information includes the number of columns and rows, row span, and column span. Each cell with its bounding polygon is output along with information whether the area is recognized as a columnHeader or not. The model supports extracting tables that are rotated. Each table cell contains the row and column index and bounding polygon coordinates. For the cell text, the model outputs the span information containing the starting index (offset). The model also outputs the length within the top-level content that contains the full text from the document.

{
    "tables": [
        {
            "rowCount": 9,
            "columnCount": 4,
            "cells": [
                {
                    "kind": "columnHeader",
                    "rowIndex": 0,
                    "columnIndex": 0,
                    "columnSpan": 4,
                    "content": "(In millions, except earnings per share)",
                    "boundingRegions": [],
                    "spans": []
                    },
            ]
        }
    ]
}

However this format can be difficult to use if you need to further harness this data by feeding it to Azure OpenAI, for large complex tables it may be quite verbose to be used inside a prompt. On the other hand if we use the plain text output the tables structure is getting lost.

Markdown as a Bridge

Markdown, a lightweight markup language with plain-text formatting syntax, can serve as an intermediary format to bridge the gap between raw document data and structured data analysis. By converting document layouts into markdown, we can simplify the process of structuring document information before feeding it into AI models for extraction.

Step-by-Step Guide to Extracting Information

1. Preparation of Documents: Start with gathering the documents you wish to analyze. These could be in various formats, such as PDFs, Word documents, or images.

2. Document Analysis with Azure Document Intelligence Layout Model: Utilize the Azure Document Intelligence Layout model to analyze the document structure. This model will identify and categorize different elements within your documents, such as paragraphs, tables, and headings.

3. Conversion to Markdown: The Layout API can output the extracted text in markdown format. Use the outputContentFormat=markdown to specify the output format in markdown. The markdown content is output as part of the content section.

"analyzeResult": {
"apiVersion": "2024-02-29-preview",
"modelId": "prebuilt-layout",
"contentFormat": "markdown",
"content": "# CONTOSO LTD...",
}

We can do the same in Document Intelligence Studio -> Layout Model-> Analyze Options.

Choosing markdown as output inside Document Intelligence Studio

4. Information Extraction with Azure AI: With the document information now structured in markdown, you can leverage various Azure AI services to extract specific information. This method signs when used with Azure OpenAI because when instructing the model to read the markdown tables as such in the prompt then you can easily and accurately query the information in the tables.

Markdown representation of the extracted table

Querying markdown text in Azure OpenAI Studio

5. Post-Extraction Processing: After extraction, the data can be further processed or analyzed based on your business needs. This might involve aggregating data from multiple documents, performing data visualization, or integrating the extracted information into business workflows.

Advantages

The use of markdown as an intermediary format offers several advantages:

- Simplified Data Structure : Markdown simplifies the document’s layout, making it easier for AI models to process the information.
- Flexibility: Markdown is widely supported and can be easily converted into other formats or displayed on different platforms.
- Efficiency : This approach can handle documents with dynamic tables and varying layouts, reducing manual preprocessing work.

Conclusion

Azure Document Intelligence Layout model with markdown output presents a sophisticated approach to processing and extracting information from complex documents. Azure AI’s capabilities help businesses can unlock valuable insights hidden within their documents, enhancing decision-making and operational efficiency. This process not only streamlines data extraction but also opens new avenues for automating and optimizing document-intensive workflows.

Sources: Document layout analysis — Document Intelligence (formerly Form Recognizer) — Azure AI services | Microsoft Learn

Leave a Reply Cancel reply