使用 Windows ML 运行 ONNX 模型

2025-05-20

重要

Windows ML API 目前是实验性的， 不支持 在生产环境中使用。尝试这些 API 的应用不应发布到 Microsoft 应用商店。

使用 Microsoft.Windows.AI.MachineLearning 命名空间中的 Windows 机器学习（ML）类型，可以在 Windows 应用中本地运行 ONNX 模型，而无需手动管理基础执行提供程序（EP）包。 API 处理下载、更新和初始化 EP；之后，您可以继续使用 Microsoft.Windows.AI.MachineLearning 和/或 ONNX Runtime。

先决条件

运行版本 24H2（内部版本 26100）或更高版本的 Windows 11 电脑。

除了上述先决条件外，还有特定于语言的先决条件，具体取决于应用编写的语言。

.NET 8 或更高版本
面向 TFM windows10.0.26100 或更高版本

步骤 1：安装 WinML 运行时包和 NuGet 包

运行时包通过 Microsoft 应用商店进行分发。在 Windows 终端中运行以下命令以安装它（标识符是运行时包的应用商店目录 ID）：

winget install --id 9MVL55DVGWWW

然后，根据应用程序的编程语言执行以下步骤。

在 .NET 项目中，添加 Microsoft.Windows.AI.MachineLearning NuGet 包（如果使用 NuGet 包管理器，请确保包含预发行版包）。

dotnet add package Microsoft.Windows.AI.MachineLearning --prerelease

然后在代码中导入命名空间。

using Microsoft.ML.OnnxRuntime;
using Microsoft.Windows.AI.MachineLearning;

在 Visual Studio 项目中，使用 NuGet 包管理器搜索并将 Microsoft.Windows.AI.MachineLearning NuGet 包添加到项目（请确保在搜索中包含预发行版 pacakges）。

然后将 OnnxRuntime 头文件添加到代码中。

#include <win_onnxruntime_cxx_api.h>

Windows ML 提供名为 onnxruntime-winmlPython 的 Python 绑定，该绑定支持 EP 获取和配置。设置后，Python 应用程序可以使用 ONNX 运行时功能，例如像往常一样自动选择 EP。

pip install --index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple --extra-index-url https://pypi.org/simple onnxruntime-winml

然后在代码中导入 OnnxRuntime 模块。

import onnxruntime as ort

步骤 2：下载并注册最新的 EPs

然后，我们将使用 Windows ML 来确保设备上提供最新的执行提供程序（EP），并注册到 ONNX 运行时。

// First we create a new instance of EnvironmentCreationOptions
EnvironmentCreationOptions envOptions = new()
{
    logId = "WinMLDemo", // Use an ID of your own choice
    logLevel = OrtLoggingLevel.ORT_LOGGING_LEVEL_ERROR
};

// And then use that to create the ORT environment
using var ortEnv = OrtEnv.CreateInstanceWithOptions(ref envOptions);

// Then, initialize Windows ML infrastructure
Infrastructure infrastructure = new();

// Ensure the latest execution providers are available (downloads them if they aren't)
await infrastructure.DownloadPackagesAsync();

// And register the EPs with ONNX Runtime
await infrastructure.RegisterExecutionProviderLibrariesAsync();

// First we need to create an ORT environment
Ort::Env env(ORT_LOGGING_LEVEL_ERROR, "WinMLDemo"); // Use an ID of your own choice

// Then, initialize Windows ML infrastructure
winrt::Microsoft.Windows.AI.MachineLearning::Infrastructure infrastructure{};

// Ensure the latest execution providers are available (downloads them if they aren't)
co_await infrastructure.DownloadPackagesAsync();

// And register the EPs with ONNX Runtime
co_await infrastructure.RegisterExecutionProviderLibrariesAsync();

步骤 3：配置执行提供程序

ONNX 运行时允许应用基于设备策略配置执行提供程序（EPs），或显式配置，从而可以更好地控制提供程序选项以及应使用哪些设备。

建议从显式选择 EP 开始，使结果更加可预测。完成此作后，可以尝试使用设备策略以自然、面向结果的方式选择执行提供程序。

若要显式选择一个或多个 EP，你将使用 GetEpDevices 函数，它可以在 OrtApi 上枚举所有可用设备。 SessionOptionsAppendExecutionProvider_V2 然后，可用于显式追加特定设备，并向所需的 EP 提供自定义提供程序选项。

using Microsoft.ML.OnnxRuntime;

// Get all available EP devices from the environment
var epDevices = ortEnv.GetEpDevices();

// Accumulate devices by EpName
// Passing all devices for a given EP in a single call allows the execution provider
// to select the best configuration or combination of devices, rather than being limited
// to a single device. This enables optimal use of available hardware if supported by the EP.
var epDeviceMap = epDevices
    .GroupBy(device => device.EpName)
    .ToDictionary(g => g.Key, g => g.ToList());

// For demonstration, list all found EPs, vendors, and device types
foreach (var epGroup in epDeviceMap)
{
    var epName = epGroup.Key;
    var devices = epGroup.Value;

    Console.WriteLine($"Execution Provider: {epName}");
    foreach (var device in devices)
    {
        string deviceType = GetDeviceTypeString(device.HardwareDevice.Type);
        Console.WriteLine($" | Vendor: {device.EpVendor,-16} | Device Type: {deviceType,-8}");
    }
}

// Configure and append each EP type only once, with all its devices
var sessionOptions = new SessionOptions();
foreach ((var epName, var devices) in epDeviceMap)
{
    Dictionary<string, string> epOptions = new();
    switch (epName)
    {
        case "VitisAIExecutionProvider":
            // Demonstrating passing no options for VitisAI
            sessionOptions.AppendExecutionProvider(ortEnv, devices, epOptions);
            Console.WriteLine($"Successfully added {epName} EP");
            break;

        case "OpenVINOExecutionProvider":
            // Configure threading for OpenVINO EP, pick the first device found
            epOptions["num_of_threads"] = "4";
            sessionOptions.AppendExecutionProvider(ortEnv, [devices.First()], epOptions);
            Console.WriteLine($"Successfully added {epName} EP (first device only)");
            break;

        case "QNNExecutionProvider":
            // Configure performance mode for QNN EP
            epOptions["htp_performance_mode"] = "high_performance";
            sessionOptions.AppendExecutionProvider(ortEnv, devices, epOptions);
            Console.WriteLine($"Successfully added {epName} EP");
            break;

        default:
            Console.WriteLine($"Skipping EP: {epName}");
            break;
    }
}

#include <win_onnxruntime_cxx_api.h>

// Get all available EP devices from the environment
std::vector<Ort::ConstEpDevice> ep_devices = env.GetEpDevices();

// Accumulate devices by ep_name
// Passing all devices for a given EP in a single call allows the execution provider
// to select the best configuration or combination of devices, rather than being limited
// to a single device. This enables optimal use of available hardware if supported by the EP.
std::unordered_map<std::string, std::vector<Ort::ConstEpDevice>> ep_device_map;
for (const auto& device : ep_devices)
{
    ep_device_map[device.EpName()].push_back(device);
}

// For demonstration, list all found EPs, vendors, and device types
for (const auto& [ep_name, devices] : ep_device_map)
{
    std::cout << "Execution Provider: " << ep_name << std::endl;
    for (const auto& device : devices)
    {
        std::cout << " | Vendor: " << std::setw(16) << device.EpVendor() << " | Device Type: " << std::setw(8)
                    << ToString(device.Device().Type()) << std::endl;
    }
}

// Configure and append each EP type only once, with all its devices
Ort::SessionOptions session_options;
for (const auto& [ep_name, devices] : ep_device_map)
{
    Ort::KeyValuePairs ep_options;
    if (ep_name == "VitisAIExecutionProvider")
    {
        // Demonstrating passing no options for VitisAI
        session_options.AppendExecutionProvider_V2(env, devices, ep_options);
    }
    else if (ep_name == "OpenVINOExecutionProvider")
    {
        // Configure threading for OpenVINO EP, pick the first device found.
        ep_options.Add("num_of_threads", "4");
        session_options.AppendExecutionProvider_V2(env, {devices.front()}, ep_options);
    }
    else if (ep_name == "QNNExecutionProvider")
    {
        // Configure performance mode for QNN EP
        ep_options.Add("htp_performance_mode", "high_performance");
        session_options.AppendExecutionProvider_V2(env, devices, ep_options);
    }
    else
    {
        std::cout << "Skipping EP: " << ep_name << std::endl;
    }
}

# This example shows how to register a specific EP.
# Note that EPs registered by Windows ML cannot be accessed via the old "providers" option

import onnxruntime as ort

# Select a specific EP.
def add_ep_for_device(session_options, ep_name, device_type, ep_options=None):
    ep_devices = ort.get_ep_devices()
    for ep_device in ep_devices:
        if ep_device.ep_name == ep_name and ep_device.device.type == device_type:
            session_options.add_provider_for_devices([ep_device], {} if ep_options is None else ep_options)

options = ort.SessionOptions()
add_ep_for_device(options, "QNNExecutionProvider", ort.OrtHardwareDeviceType.NPU)  # example for QNN NPU
assert options.has_providers()

session = ort.InferenceSession(
    "path_to_your_model.onnx",
    sess_options=options,
)

有关详细信息，请参阅 ONNX 运行时 OrtApi 文档。若要了解有关 EP 的版本控制策略，请参阅执行提供程序文档的版本控制。

步骤 4：编译模型

ONNX 模型必须编译为可在设备的基础硬件上高效执行的优化表示形式。在步骤 3 中配置的执行提供程序有助于完成此转换。

自 1.22 版起，ONNX 运行时引入了新的 API，以更好地封装编译步骤。 ONNX 运行时编译文档中提供了更多详细信息（请参阅 OrtCompileApi 结构）。

注释

编译可能需要几分钟才能完成。因此，为确保任何 UI 保持响应性，请考虑在应用程序中将此作为后台操作执行。

// Prepare compilation options using our session we configured in step 3
OrtModelCompilationOptions compileOptions = new(sessionOptions);
compileOptions.SetInputModelPath(modelPath);
compileOptions.SetOutputModelPath(compiledModelPath);

// Compile the model
compileOptions.CompileModel();

const OrtCompileApi* compileApi = ortApi.GetCompileApi();

// Prepare compilation options
OrtModelCompilationOptions* compileOptions = nullptr;
OrtStatus* status = compileApi->CreateModelCompilationOptionsFromSessionOption(env, sessionOptions, &compileOptions);
status = compileApi->ModelCompilationOptions_SetInputModelPath(compileOptions, modelPath.c_str());
status = compileApi->ModelCompilationOptions_SetOutputModelPath(compileOptions, compiledModelPath.c_str());

// Compile the model
status = compileApi->CompileModel(env, compileOptions);

// Clean up
compileApi->ReleaseModelCompilationOptions(compileOptions);

input_model_path = "path_to_your_model.onnx"
output_model_path = "path_to_your_compiled_model.onnx"

model_compiler = ort.ModelCompiler(
    options,
    input_model_path,
    embed_compiled_data_into_model=True,
    external_initializers_file_path=None,
)
model_compiler.compile_to_file(output_model_path)
assert os.path.exists(output_model_path)

步骤 5：运行模型推理

为设备上的本地硬件编译模型后，我们可以创建推理会话和推理模型。

// Create inference session using compiled model
using InferenceSession session = new(compiledModelPath, sessionOptions);

// Create inference session using compiled model
Ort::Session session(env, compiledModelPath.c_str(), sessionOptions);

# Create inference session using compiled model
session = ort.InferenceSession(output_model_path, sess_options=options)

步骤 6：分发应用

在分发应用之前，C# 和C++开发人员需要采取其他步骤，以确保安装应用时在用户的设备上安装 Windows ML 运行时。请参阅分发应用页面以了解详细信息。

模型编译

ONNX 模型表示为图形，其中节点对应于运算符（如矩阵乘法、卷积和其他数学过程），边缘定义它们之间的数据流。

此基于图形的结构允许通过允许转换（即将多个相关作合并为单个优化作）和图形修剪（即从图形中删除冗余节点）实现高效执行和优化。

模型编译是指利用执行提供程序（EP）将 ONNX 模型转换为可在设备的基础硬件上高效执行的优化表示形式的过程。

为编译而设计

下面是一些关于如何在应用程序中进行编译处理的想法。

编译性能。编译可能需要几分钟才能完成。因此，为确保任何 UI 保持响应性，请考虑在应用程序中将此作为后台操作执行。
用户界面更新。请考虑让用户知道应用程序是否正在执行任何编译工作，并在完成时通知它们。
优雅的回退机制。如果加载已编译的模型时出现问题，请尝试捕获故障的诊断数据，并让应用程序尽可能回退到使用原始模型，以便仍然可以使用应用程序的相关 AI 功能。

使用设备策略进行执行提供程序选择

除了显式选择 EP 之外，您还可以利用设备策略功能，这些策略提供了一种自然且面向结果的方式，以指定您希望 AI 工作负载的运行方式。为此，你将在SessionOptions.SetEpSelectionPolicy上使用OrtApi函数，并传入OrtExecutionProviderDevicePolicy值。可以使用多种值进行自动选择，例如MAX_PERFORMANCE，PREFER_NPUMAX_EFFICIENCY等等。有关可以使用的其他值，请参阅 ONNX OrtExecutionProviderDevicePolicy 文档。

// Configure the session to select an EP and device for MAX_EFFICIENCY which typically
// will choose an NPU if available with a CPU fallback.
var sessionOptions = new SessionOptions();
sessionOptions.SetEpSelectionPolicy(ExecutionProviderDevicePolicy.MAX_EFFICIENCY);

// Configure the session to select an EP and device for MAX_EFFICIENCY which typically
// will choose an NPU if available with a CPU fallback.
Ort::SessionOptions sessionOptions;
sessionOptions.SetEpSelectionPolicy(OrtExecutionProviderDevicePolicy_MAX_EFFICIENCY);

# Configure the session to select an EP and device for MAX_EFFICIENCY which typically
# will choose an NPU if available with a CPU fallback.
options = ort.SessionOptions()
options.set_provider_selection_policy(ort.OrtExecutionProviderDevicePolicy.MAX_EFFICIENCY)
assert options.has_providers()

提供有关 Windows ML 的反馈

我们很乐意听取你关于使用 Windows ML 的反馈！如果遇到任何问题，请使用 Windows 上的反馈中心应用报告问题。

应在 开发人员平台 -> Windows 机器学习 类别下提交反馈。

另请参阅

Microsoft.Windows.AI.MachineLearning 中的 Windows ML API

通过