你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
内容理解分析器决定如何处理和提取信息。 它们可确保所有内容的统一处理和输出结构,以提供可靠且可预测的结果。 我们为常见用例提供 预生成分析器 。 本指南演示如何自定义这些分析器,以更好地满足你的需求。
本指南使用 cURL 命令行工具。 如果未安装,可以下载适用于开发环境的相应版本:
定义分析器架构
若要创建自定义分析器,请定义描述要提取的结构化数据的字段架构。 在以下示例中,我们基于 预生成的文档分析器 创建分析器来处理收据。
创建包含以下内容的 JSON 文件 request_body.json
:
{
"description": "Sample receipt analyzer",
"baseAnalyzerId": "prebuilt-documentAnalyzer",
"config": {
"returnDetails": true,
"enableFormula": false,
"disableContentFiltering": false,
"estimateFieldSourceAndConfidence": true,
"tableFormat": "html"
},
"fieldSchema": {
"fields": {
"VendorName": {
"type": "string",
"method": "extract",
"description": "Vendor issuing the receipt"
},
"Items": {
"type": "array",
"method": "extract",
"items": {
"type": "object",
"properties": {
"Description": {
"type": "string",
"method": "extract",
"description": "Description of the item"
},
"Amount": {
"type": "number",
"method": "extract",
"description": "Amount of the item"
}
}
}
}
}
}
}
生成分析器
PUT 请求
curl -i -X PUT "{endpoint}/contentunderstanding/analyzers/{analyzerId}?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: application/json" \
-d @request_body.json
PUT 响应
201 Created
响应包含一个 Operation-Location
标头,其中包含一个可用于跟踪此异步分析器创建操作状态的 URL。
201 Created
Operation-Location: {endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview
完成后,对操作位置 URL 执行 HTTP GET 将返回"status": "succeeded"
。
curl -i -X GET "{endpoint}/contentunderstanding/analyzers/{analyzerId}/operations/{operationId}?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}"
分析文件
发送文件
现在可以使用创建的自定义分析器来处理文件并提取架构中定义的字段。
在运行 cURL 命令之前,请对 HTTP 请求进行以下更改:
- 将
{endpoint}
和{key}
替换为 Azure 门户 Azure AI 服务实例中的终结点和密钥值。 - 将
{analyzerId}
替换为之前创建的自定义分析器的名称。 - 将
{fileUrl}
替换为要分析的文件的可公开访问 URL,例如具有共享访问签名 (SAS) 或示例 URLhttps://github.com/Azure-Samples/azure-ai-content-understanding-python/raw/refs/heads/main/data/receipt.png
的 Azure 存储 Blob 的路径。
POST 请求
curl -i -X POST "{endpoint}/contentunderstanding/analyzers/{analyzerId}:analyze?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}" \
-H "Content-Type: application/json" \
-d "{\"url\":\"{fileUrl}\"}"
POST 响应
响应 202 Accepted
包括 {resultId}
,可用于跟踪此异步作的状态。
{
"id": {resultId},
"status": "Running",
"result": {
"analyzerId": {analyzerId},
"apiVersion": "2025-05-01-preview",
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
"warnings": [],
"contents": []
}
}
获取分析结果
- 将
{endpoint}
和{key}
替换为 Azure 门户 Azure AI 服务实例中的终结点和密钥值。 - 将
POST
响应中的{resultId}
替换为resultId
。
GET 请求
curl -i -X GET "{endpoint}/contentunderstanding/analyzerResults/{resultId}?api-version=2025-05-01-preview" \
-H "Ocp-Apim-Subscription-Key: {key}"
GET 响应
200 OK
响应包含一个显示操作进度的status
字段。
status
如果操作成功完成,则为Succeeded
。- 如果是
running
或notStarted
,请手动或使用脚本再次调用 API:请求之间至少间隔一秒钟。
示例响应
{
"id": {resultId},
"status": "Succeeded",
"result": {
"analyzerId": {analyzerId},
"apiVersion": "2025-05-01-preview",
"createdAt": "YYYY-MM-DDTHH:MM:SSZ",
"warnings": [],
"contents": [
{
"markdown": "Contoso\n\n123 Main Street\nRedmond, WA 98052\n\n987-654-3210\n\n6/10/2019 13:59\nSales Associate: Paul\n\n\n<table>\n<tr>\n<td>2 Surface Pro 6</td>\n<td>$1,998.00</td>\n</tr>\n<tr>\n<td>3 Surface Pen</td>\n<td>$299.97</td>\n</tr>\n</table> ...",
"fields": {
"VendorName": {
"type": "string",
"valueString": "Contoso",
"spans": [{"offset": 0,"length": 7}],
"confidence": 0.996,
"source": "D(1,774.0000,72.0000,974.0000,70.0000,974.0000,111.0000,774.0000,113.0000)"
},
"Items": {
"type": "array",
"valueArray": [
{
"type": "object",
"valueObject": {
"Description": {
"type": "string",
"valueString": "2 Surface Pro 6",
"spans": [ { "offset": 115, "length": 15}],
"confidence": 0.423,
"source": "D(1,704.0000,482.0000,875.0000,482.0000,875.0000,508.0000,704.0000,508.0000)"
},
"Amount": {
"type": "number",
"valueNumber": 1998,
"spans": [{ "offset": 140,"length": 9}
],
"confidence": 0.957,
"source": "D(1,952.0000,482.0000,1048.0000,482.0000,1048.0000,508.0000,952.0000,509.0000)"
}
}
}, ...
]
}
},
"kind": "document",
"startPageNumber": 1,
"endPageNumber": 1,
"unit": "pixel",
"pages": [
{
"pageNumber": 1,
"angle": -0.0848,
"width": 1743,
"height": 878,
"spans": [
{
"offset": 0,
"length": 375
}
],
"words": [
{
"content": "Contoso",
"span": {"offset": 0,"length": 7 },
"confidence": 0.995,
"source": "D(1,774,72,974,70,974,111,774,113)"
}, ...
],
"lines": [
{
"content": "Contoso",
"source": "D(1,774,71,973,70,974,111,774,113)",
"span": {"offset": 0,"length": 7}
}, ...
]
}
],
"paragraphs": [
{
"content": "Contoso",
"source": "D(1,774,71,973,70,974,111,774,113)",
"span": {"offset": 0,"length": 7}
}, ...
],
"sectios": [
{
"span": {"offset": 0,"length": 374 },
"elements": ["/paragraphs/0","/paragraphs/1", ...]
}
],
"tables": [
{
"rowCount": 2,
"columnCount": 2,
"cells": [
{
"kind": "content",
"rowIndex": 0,
"columnIndex": 0,
"rowSpan": 1,
"columnSpan": 1,
"content": "2 Surface Pro 6",
"source": "D(1,691,471,911,470,911,514,691,515)",
"span": {"offset": 115,"length": 15},
"elements": ["/paragraphs/4"]
}, ...
],
"source": "D(1,759,593,1056,592,1057,741,760,742)",
"span": {"offset": 223,"length": 151}
}
]
}
]
}
}