你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

如何通过代码部署和推理一个托管计算部署

2025-06-05

Azure AI Foundry 门户模型目录提供超过 1,600 个模型，部署这些模型的常见方法是使用托管计算部署选项，这有时也称为托管联机部署。

部署大型语言模型 (LLM) 可使其能够在网站、应用程序或其他生产环境中使用。部署通常涉及到将模型托管在服务器或云中，并创建 API 或其他接口供用户与模型交互。你可以调用部署来实时推理生成式 AI 应用程序（例如聊天和 Copilot）。

本文介绍如何使用 Azure 机器学习 SDK 来部署模型。本文还将介绍如何对已部署的模型执行推理。

先决条件

具有有效付款方式的 Azure 订阅。免费或试用的 Azure 订阅将不起作用。如果没有 Azure 订阅，请先创建一个付费的 Azure 帐户。
如果没有项目，请创建基于中心的项目。
为 Azure 订阅启用了市场购买。在此处了解详细信息。

获取模型 ID

可以使用 Azure 机器学习 SDK 部署托管计算模型，但首先，让我们浏览模型目录并获取部署所需的模型 ID。

小窍门

由于可以在 Azure AI Foundry 门户中自定义左窗格，因此你可能会看到与这些步骤中显示的项不同。如果未看到要查找的内容，请选择 ... 左窗格底部的更多内容。

登录到 Azure AI Foundry 并转到“主页”。
在左侧边栏中选择“模型目录”。
在“部署选项”筛选器中，选择“托管计算”。
选择模型。
从所选模型的详细信息页面中复制模型 ID。它看起来像这样：azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16

部署模型

安装 Azure 机器学习 SDK。

pip install azure-ai-ml
pip install azure-identity

使用 Azure 机器学习进行身份验证并创建客户端对象。请将占位符替换为你的订阅 ID、资源组名称和 Azure AI Foundry 项目名称。

from azure.ai.ml import MLClient
from azure.identity import InteractiveBrowserCredential

workspace_ml_client = MLClient(
    credential=InteractiveBrowserCredential,
    subscription_id="your subscription name goes here",
    resource_group_name="your resource group name goes here",
    workspace_name="your project name goes here",
)

创建终结点。对于托管计算部署选项，需要在模型部署之前创建一个终结点。将终结点视为可容纳多个模型部署的容器。终结点名称在区域中必须唯一，因此在本示例中，使用时间戳创建唯一的终结点名称。

import time, sys
from azure.ai.ml.entities import (
    ManagedOnlineEndpoint,
    ManagedOnlineDeployment,
    ProbeSettings,
)

# Make the endpoint name unique
timestamp = int(time.time())
online_endpoint_name = "customize your endpoint name here" + str(timestamp)

# Create an online endpoint
endpoint = ManagedOnlineEndpoint(
    name=online_endpoint_name,
    auth_mode="key",
)
workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).wait()

创建部署。将下一个代码中的模型 ID 替换为从在 “获取模型 ID ”部分中选择的模型的详细信息页复制的模型 ID。

model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16" 

demo_deployment = ManagedOnlineDeployment(
    name="demo",
    endpoint_name=online_endpoint_name,
    model=model_name,
    instance_type="Standard_DS3_v2",
    instance_count=2,
    liveness_probe=ProbeSettings(
        failure_threshold=30,
        success_threshold=1,
        timeout=2,
        period=10,
        initial_delay=1000,
    ),
    readiness_probe=ProbeSettings(
        failure_threshold=10,
        success_threshold=1,
        timeout=10,
        period=10,
        initial_delay=1000,
    ),
)
workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
endpoint.traffic = {"demo": 100}
workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).result()

推理部署

需要使用示例 json 数据来测试推理。使用以下示例创建 sample_score.json。

{
  "inputs": {
    "question": [
      "Where do I live?",
      "Where do I live?",
      "What's my name?",
      "Which name is also used to describe the Amazon rainforest in English?"
    ],
    "context": [
      "My name is Wolfgang and I live in Berlin",
      "My name is Sarah and I live in London",
      "My name is Clara and I live in Berkeley.",
      "The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
    ]
  }
}

使用sample_score.json进行推理。根据保存示例 json 文件的位置，更改下一个代码中评分文件的位置。

scoring_file = "./sample_score.json" 
response = workspace_ml_client.online_endpoints.invoke(
    endpoint_name=online_endpoint_name,
    deployment_name="demo",
    request_file=scoring_file,
)
response_json = json.loads(response)
print(json.dumps(response_json, indent=2))

配置自动缩放

若要为部署配置自动缩放，可以转到 Azure 门户，找到在 Machine learning online deployment AI 项目的资源组中键入的 Azure 资源，并使用“设置”下的“缩放”菜单。有关自动缩放的详细信息，请参阅 Azure 机器学习文档中的自动缩放联机终结点。

删除部署终结点

若要删除 Azure AI Foundry 门户中的部署，请选择部署详细信息页面顶部面板上的“删除”按钮。

配额注意事项

若要使用实时终结点进行部署并执行推理，您需要消耗按区域分配给订阅的虚拟机 (VM) 核心配额。注册 Azure AI Foundry 时，你会收到可以在所在区域中使用的多个 VM 系列的默认 VM 配额。你可以继续创建部署，直到达到配额上限。一旦发生这种情况，可以请求提高配额。

详细了解可在 Azure AI Foundry 中执行哪些操作
在 Azure AI 常见问题解答文章中获取常见问题的解答

通过