你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

持续评估 AI 代理（预览版）

2025-06-05

重要

本文中标记了“（预览版）”的项目目前为公共预览版。此预览版未提供服务级别协议，不建议将其用于生产工作负载。某些功能可能不受支持或者受限。有关详细信息，请参阅 Microsoft Azure 预览版补充使用条款。

持续评估代理可为 AI 应用程序提供准实时的可观测性和监视。启用后，此功能会持续以一定采样率评估代理交互，以便通过 Foundry 可观测性仪表板中显示的指标提供对质量、安全性和性能的见解。通过使用持续评估，你可以尽早识别和排查问题、优化代理性能和维护安全。评估还与跟踪相关联，以启用详细的调试和根本原因分析。

入门指南

先决条件

注释

必须使用 Foundry 项目来实现此功能。不支持 基于中心的项目 。请参阅如何知道我拥有哪种类型的项目？并创建 Foundry 项目。

在项目中创建的代理
一个 Azure Monitor Application Insights 资源

连接 Application Insights 的步骤

在 Azure AI Foundry 中导航到你的项目。
在左侧菜单中选择 “监视 ”，然后转到 “应用程序分析”。
将 Application Insights 资源连接到项目。

使用 Azure AI 项目客户端库设置持续评估

pip install azure-ai-projects azure-identity

创建代理运行

import os, json
from azure.ai.projects import AIProjectClient
from azure.identity import DefaultAzureCredential

project_client = AIProjectClient.from_connection_string(
    credential=DefaultAzureCredential(), conn_str=os.environ["PROJECT_CONNECTION_STRING"]
)

agent = project_client.agents.create_agent(
    model=os.environ["MODEL_DEPLOYMENT_NAME"],
    name="my-assistant",
    instructions="You are helpful assistant",
    tools=file_search_tool.definitions,
    tool_resources=file_search_tool.resources,
)

# Create thread and process user message
thread = project.agents.create_thread()
project.agents.create_message(thread_id=thread.id, role="user", content="Hello, what Contoso products do you know?")
run = project.agents.create_and_process_run(thread_id=thread.id, agent_id=agent.id)

# Handle run status
if run.status == "failed":
    print(f"Run failed: {run.last_error}")

# Print thread messages
for message in project.agents.list_messages(thread_id=thread.id).text_messages:
    print(message)

选择评估员

接下来，需要定义要持续运行的评估器集。若要详细了解支持的评估工具，请参阅什么是评估器？

from azure.ai.projects.models import EvaluatorIds

evaluators={
"Relevance": {"Id": EvaluatorIds.Relevance.value},
"Fluency": {"Id": EvaluatorIds.Fluency.value},
"Coherence": {"Id": EvaluatorIds.Coherence.value},
},

通过创建 `AgentEvaluationRequest` 持续评估代理运行

                      
project.evaluation.create_agent_evaluation(
    AgentEvaluationRequest(  
        thread=thread.id,  
        run=run.id,   
        evaluators=evaluators,
        appInsightsConnectionString = project.telemetry.get_connection_string(),
    )
)

注释

Application Insights 必须连接到项目，否则服务不会运行。若要连接 Application Insights 资源，请参阅连接 Application Insights 的步骤。

使用 Application Insights 获取评估结果


from azure.core.exceptions import HttpResponseError
from azure.identity import DefaultAzureCredential
from azure.monitor.query import LogsQueryClient, LogsQueryStatus
import pandas as pd


credential = DefaultAzureCredential()
client = LogsQueryClient(credential)

query = f"""
traces
| where message == "gen_ai.evaluation.result"
| where customDimensions["gen_ai.thread.run.id"] == "{run.id}"
"""

try:
    response = client.query_workspace(os.environ["LOGS_WORKSPACE_ID"], query, timespan=timedelta(days=1))
    if response.status == LogsQueryStatus.SUCCESS:
        data = response.tables
    else:
        # LogsQueryPartialResult - handle error here
        error = response.partial_error
        data = response.partial_data
        print(error)

    for table in data:
        df = pd.DataFrame(data=table.rows, columns=table.columns)
        key_value = df.to_dict(orient="records")
        pprint(key_value)
except HttpResponseError as err:
    print("something fatal happened")
    print(err)

捕获评估结果的推理解释

AI 辅助评估人员采用思考链推理，为评估结果中的分数生成解释。若要启用此功能，请在 AgentEvaluationRedactionConfiguration 对象中将redact_score_properties设置为 True，并将其作为请求的一部分传递。

这有助于了解每个指标分数背后的原因。

注释

推理说明可能会根据对话的内容提及敏感信息。


from azure.ai.projects.models import AgentEvaluationRedactionConfiguration
              
project.evaluation.create_agent_evaluation(
    AgentEvaluationRequest(  
        thread=thread.id,  
        run=run.id,   
        evaluators=evaluators,  
        redaction_configuration=AgentEvaluationRedactionConfiguration(
            redact_score_properties=False,
       ),
        app_insights_connection_string=app_insights_connection_string,
    )
)

自定义采样配置

可以通过定义 AgentEvaluationSamplingConfiguration，并在系统限制为每小时最多 1000 次请求的范围内，指定首选的采样百分比和最大请求次数，自定义采样配置。


from azure.ai.projects.models

sampling_config = AgentEvaluationSamplingConfiguration (  
    name = agent.id,  
    samplingPercent = 15,       # Percentage of sampling per hour (0-100)
    maxRequestRate = 250,       # Maximum request rate per hour (0-1000)
)                                
project.evaluation.create_agent_evaluation(
    AgentEvaluationRequest(  
        thread=thread.id,  
        run=run.id,   
        evaluators=evaluators,  
        samplingConfiguration = sampling_config,  
        appInsightsConnectionString = project.telemetry.get_connection_string(),
    )
)

注释

如果多个 AI 应用程序将持续评估数据发送到同一 Application Insights 资源，建议使用服务名称来区分应用程序数据。有关详细信息，请参阅 Azure AI 跟踪。

查看连续评估结果

使用持续评估设置将应用程序部署到生产环境后，可以使用 Azure AI Foundry 和 Azure Monitor 监视代理的质量和安全性。

使用 Azure AI 评估 SDK 在本地评估 AI 代理

通过