Azure AI Document Intelligence
Azure AI Document Intelligence (formerly Form Recognizer) uses AI to extract text, key-value pairs, tables, and structures from documents. It handles invoices, receipts, IDs, tax forms, contracts, and custom document types.
Key Capabilities
- Prebuilt Models — Ready-to-use models for common document types
- Custom Models — Train on your own document formats
- Document Classification — Automatically sort documents by type
- Add-on Features — Handwriting, barcodes, formulas, font styling
Prerequisites
- Azure subscription
- Create a Document Intelligence resource in the Azure Portal
Step 1: Try the Studio
Document Intelligence Studio lets you test models visually.
- Go to documentintelligence.ai.azure.com
- Choose a prebuilt model (e.g., Invoice)
- Upload a sample document or use the provided samples
- Click Analyze and review extracted fields
Step 2: Use Prebuilt Models
Invoice Model
from azure.ai.documentintelligence import DocumentIntelligenceClient
from azure.core.credentials import AzureKeyCredential
client = DocumentIntelligenceClient(
endpoint="https://my-doc-intel.cognitiveservices.azure.com/",
credential=AzureKeyCredential("your-key")
)
with open("invoice.pdf", "rb") as f:
poller = client.begin_analyze_document(
model_id="prebuilt-invoice",
body=f
)
result = poller.result()
for invoice in result.documents:
print(f"Vendor: {invoice.fields['VendorName'].content}")
print(f"Invoice #: {invoice.fields['InvoiceId'].content}")
print(f"Total: {invoice.fields['InvoiceTotal'].content}")
print(f"Due Date: {invoice.fields['DueDate'].content}")
# Line items
for item in invoice.fields.get("Items", {}).get("valueArray", []):
desc = item["valueObject"]["Description"]["content"]
amount = item["valueObject"]["Amount"]["content"]
print(f" - {desc}: {amount}")
Available Prebuilt Models
| Model | Use Case |
|---|---|
prebuilt-invoice |
Invoices and bills |
prebuilt-receipt |
Receipts from retail/restaurants |
prebuilt-idDocument |
Passports, driver's licenses |
prebuilt-tax.us.w2 |
US W-2 tax forms |
prebuilt-healthInsuranceCard.us |
Insurance cards |
prebuilt-contract |
Contracts and agreements |
prebuilt-layout |
General document structure |
prebuilt-read |
OCR text extraction |
Step 3: Build Custom Models
For documents unique to your business:
1. Gather Training Data
- Minimum 5 sample documents (recommend 15+)
- Ensure variety in layouts and content
2. Label Your Documents
- Go to Document Intelligence Studio
- Create a new Custom extraction model project
- Upload your training documents
- Label the fields you want to extract (drag to select text regions)
3. Train the Model
poller = client.begin_build_document_model(
build_mode="template", # or "neural" for varied layouts
model_id="my-purchase-order",
blob_container_url="https://mystorage.blob.core.windows.net/training-data"
)
model = poller.result()
print(f"Model ID: {model.model_id}")
print(f"Fields: {[f for f in model.doc_types['my-purchase-order'].field_schema]}")
Template vs Neural
| Feature | Template | Neural |
|---|---|---|
| Layout | Fixed/similar | Varied |
| Training data | 5+ docs | 15+ docs |
| Speed | Fast | Slower |
| Best for | Structured forms | Semi-structured docs |
Step 4: Automate with Logic Apps / Power Automate
Power Automate Example
Trigger: When a file is added to SharePoint "Invoices" folder
→ Extract invoice fields using Document Intelligence
→ If amount > $10,000: Send approval to manager
→ Create row in Excel with extracted data
→ Archive the processed document
.NET Integration Example
using Azure.AI.DocumentIntelligence;
using Azure;
var client = new DocumentIntelligenceClient(
new Uri("https://my-doc-intel.cognitiveservices.azure.com/"),
new AzureKeyCredential("your-key"));
var content = new AnalyzeDocumentContent
{
UrlSource = new Uri("https://example.com/invoice.pdf")
};
var operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed, "prebuilt-invoice", content);
var result = operation.Value;
foreach (var doc in result.Documents)
{
Console.WriteLine($"Vendor: {doc.Fields["VendorName"].Content}");
Console.WriteLine($"Total: {doc.Fields["InvoiceTotal"].Content}");
}
Resources
Video: Search "Azure AI Document Intelligence" on Microsoft Azure YouTube for the latest walkthroughs.

