你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn

将推理 SDK 与 Foundry Local 集成

重要

  • Foundry Local 以预览版提供。 公共预览版提供对活动部署中的功能的早期访问。
  • 正式发布 (GA) 之前,功能、方法和流程可能会发生更改或具有受限的功能。

Foundry Local 与各种推理 SDK(例如 OpenAI、Azure OpenAI、Langchain 等)集成。本指南介绍如何使用常用 SDK 将应用程序连接到本地运行的 AI 模型。

先决条件

  • 已安装 Foundry Local。 有关安装说明,请参阅 Foundry Local 入门文章。

安装 pip 包

安装以下 Python 包:

pip install openai
pip install foundry-local-sdk

小窍门

建议使用虚拟环境来避免包冲突。 可以使用 venvconda 创建虚拟环境。

将 OpenAI SDK 与 Foundry Local 配合使用

以下示例演示如何将 OpenAI SDK 与 Foundry Local 配合使用。 该代码初始化 Foundry Local 服务,加载模型,并使用 OpenAI SDK 生成响应。

将以下代码复制并粘贴到名为 app.py“Python”的 Python 文件中:

import openai
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded 
# to your end-user's device. 
alias = "phi-3.5-mini"

# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code uses the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key  # API key is not required for local usage
)
# Set the model to use and generate a response
response = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}]
)
print(response.choices[0].message.content)

使用以下命令运行代码:

python app.py

流式处理响应

如果要接收流式处理响应,可以按如下所示修改代码:

import openai
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded 
# to your end-user's device.
alias = "phi-3.5-mini"

# Create a FoundryLocalManager instance. This will start the Foundry 
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)

# The remaining code us es the OpenAI Python SDK to interact with the local model.

# Configure the client to use the local Foundry service
client = openai.OpenAI(
    base_url=manager.endpoint,
    api_key=manager.api_key  # API key is not required for local usage
)

# Set the model to use and generate a streaming response
stream = client.chat.completions.create(
    model=manager.get_model_info(alias).id,
    messages=[{"role": "user", "content": "What is the golden ratio?"}],
    stream=True
)

# Print the streaming response
for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)

可以使用与之前相同的命令运行代码:

python app.py

requests 与 Foundry Local 配合使用

# Install with: pip install requests
import requests
import json
from foundry_local import FoundryLocalManager

# By using an alias, the most suitable model will be downloaded 
# to your end-user's device. 
alias = "phi-3.5-mini"

# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)

url = manager.endpoint + "/chat/completions"

payload = {
    "model": manager.get_model_info(alias).id,
    "messages": [
        {"role": "user", "content": "What is the golden ratio?"}
    ]
}

headers = {
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])

安装 Node.js 包

需要安装以下 Node.js 包:

npm install openai
npm install foundry-local-sdk

Foundry Local SDK 允许管理 Foundry 本地服务和模型。

将 OpenAI SDK 与 Foundry Local 配合使用

以下示例演示如何将 OpenAI SDK 与 Foundry Local 配合使用。 该代码初始化 Foundry Local 服务,加载模型,并使用 OpenAI SDK 生成响应。

将以下代码复制并粘贴到名为 app.js 的 JavaScript 文件中:

import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

const openai = new OpenAI({
  baseURL: foundryLocalManager.endpoint,
  apiKey: foundryLocalManager.apiKey,
});

async function generateText() {
  const response = await openai.chat.completions.create({
    model: modelInfo.id,
    messages: [
      {
        role: "user",
        content: "What is the golden ratio?",
      },
    ],
  });

  console.log(response.choices[0].message.content);
}

generateText();

使用以下命令运行代码:

node app.js

流式处理响应

如果要接收流式处理响应,可以按如下所示修改代码:

import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

const openai = new OpenAI({
  baseURL: foundryLocalManager.endpoint,
  apiKey: foundryLocalManager.apiKey,
});

async function streamCompletion() {
    const stream = await openai.chat.completions.create({
      model: modelInfo.id,
      messages: [{ role: "user", content: "What is the golden ratio?" }],
      stream: true,
    });
  
    for await (const chunk of stream) {
      if (chunk.choices[0]?.delta?.content) {
        process.stdout.write(chunk.choices[0].delta.content);
      }
    }
}
  
streamCompletion();

使用以下命令运行代码:

node app.js

将提取 API 与 Foundry Local 配合使用

如果您偏好使用像 fetch 这样的 HTTP 客户端,可以按如下方式进行。

import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

async function queryModel() {
    const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
        },
        body: JSON.stringify({
            model: modelInfo.id,
            messages: [
                { role: "user", content: "What is the golden ratio?" },
            ],
        }),
    });

    const data = await response.json();
    console.log(data.choices[0].message.content);
}

queryModel();

流式处理响应

如果要使用提取 API 接收流式处理响应,可以按如下所示修改代码:

import { FoundryLocalManager } from "foundry-local-sdk";

// By using an alias, the most suitable model will be downloaded 
// to your end-user's device.
// TIP: You can find a list of available models by running the 
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";

// Create a FoundryLocalManager instance. This will start the Foundry 
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()

// Initialize the manager with a model. This will download the model 
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)

async function streamWithFetch() {
    const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            Accept: "text/event-stream",
        },
        body: JSON.stringify({
            model: modelInfo.id,
            messages: [{ role: "user", content: "what is the golden ratio?" }],
            stream: true,
        }),
    });

    const reader = response.body.getReader();
    const decoder = new TextDecoder();

    while (true) {
        const { done, value } = await reader.read();
        if (done) break;

        const chunk = decoder.decode(value);
        const lines = chunk.split("\n").filter((line) => line.trim() !== "");

        for (const line of lines) {
            if (line.startsWith("data: ")) {
                const data = line.substring(6);
                if (data === "[DONE]") continue;

                try {
                    const json = JSON.parse(data);
                    const content = json.choices[0]?.delta?.content || "";
                    if (content) {
                        // Print to console without line breaks, similar to process.stdout.write
                        process.stdout.write(content);
                    }
                } catch (e) {
                    console.error("Error parsing JSON:", e);
                }
            }
        }
    }
}

// Call the function to start streaming
streamWithFetch();

后续步骤