你当前正在访问 Microsoft Azure Global Edition 技术文档网站。 如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站,请访问 https://docs.azure.cn。
重要
- Foundry Local 以预览版提供。 公共预览版提供对活动部署中的功能的早期访问。
- 正式发布 (GA) 之前,功能、方法和流程可能会发生更改或具有受限的功能。
Foundry Local 与各种推理 SDK(例如 OpenAI、Azure OpenAI、Langchain 等)集成。本指南介绍如何使用常用 SDK 将应用程序连接到本地运行的 AI 模型。
先决条件
- 已安装 Foundry Local。 有关安装说明,请参阅 Foundry Local 入门文章。
安装 pip 包
安装以下 Python 包:
pip install openai
pip install foundry-local-sdk
小窍门
建议使用虚拟环境来避免包冲突。 可以使用 venv
或 conda
创建虚拟环境。
将 OpenAI SDK 与 Foundry Local 配合使用
以下示例演示如何将 OpenAI SDK 与 Foundry Local 配合使用。 该代码初始化 Foundry Local 服务,加载模型,并使用 OpenAI SDK 生成响应。
将以下代码复制并粘贴到名为 app.py
“Python”的 Python 文件中:
import openai
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "phi-3.5-mini"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code uses the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key # API key is not required for local usage
)
# Set the model to use and generate a response
response = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}]
)
print(response.choices[0].message.content)
使用以下命令运行代码:
python app.py
流式处理响应
如果要接收流式处理响应,可以按如下所示修改代码:
import openai
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "phi-3.5-mini"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
# The remaining code us es the OpenAI Python SDK to interact with the local model.
# Configure the client to use the local Foundry service
client = openai.OpenAI(
base_url=manager.endpoint,
api_key=manager.api_key # API key is not required for local usage
)
# Set the model to use and generate a streaming response
stream = client.chat.completions.create(
model=manager.get_model_info(alias).id,
messages=[{"role": "user", "content": "What is the golden ratio?"}],
stream=True
)
# Print the streaming response
for chunk in stream:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="", flush=True)
可以使用与之前相同的命令运行代码:
python app.py
将 requests
与 Foundry Local 配合使用
# Install with: pip install requests
import requests
import json
from foundry_local import FoundryLocalManager
# By using an alias, the most suitable model will be downloaded
# to your end-user's device.
alias = "phi-3.5-mini"
# Create a FoundryLocalManager instance. This will start the Foundry
# Local service if it is not already running and load the specified model.
manager = FoundryLocalManager(alias)
url = manager.endpoint + "/chat/completions"
payload = {
"model": manager.get_model_info(alias).id,
"messages": [
{"role": "user", "content": "What is the golden ratio?"}
]
}
headers = {
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
print(response.json()["choices"][0]["message"]["content"])
安装 Node.js 包
需要安装以下 Node.js 包:
npm install openai
npm install foundry-local-sdk
Foundry Local SDK 允许管理 Foundry 本地服务和模型。
将 OpenAI SDK 与 Foundry Local 配合使用
以下示例演示如何将 OpenAI SDK 与 Foundry Local 配合使用。 该代码初始化 Foundry Local 服务,加载模型,并使用 OpenAI SDK 生成响应。
将以下代码复制并粘贴到名为 app.js
的 JavaScript 文件中:
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
const openai = new OpenAI({
baseURL: foundryLocalManager.endpoint,
apiKey: foundryLocalManager.apiKey,
});
async function generateText() {
const response = await openai.chat.completions.create({
model: modelInfo.id,
messages: [
{
role: "user",
content: "What is the golden ratio?",
},
],
});
console.log(response.choices[0].message.content);
}
generateText();
使用以下命令运行代码:
node app.js
流式处理响应
如果要接收流式处理响应,可以按如下所示修改代码:
import { OpenAI } from "openai";
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
const openai = new OpenAI({
baseURL: foundryLocalManager.endpoint,
apiKey: foundryLocalManager.apiKey,
});
async function streamCompletion() {
const stream = await openai.chat.completions.create({
model: modelInfo.id,
messages: [{ role: "user", content: "What is the golden ratio?" }],
stream: true,
});
for await (const chunk of stream) {
if (chunk.choices[0]?.delta?.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
}
streamCompletion();
使用以下命令运行代码:
node app.js
将提取 API 与 Foundry Local 配合使用
如果您偏好使用像 fetch
这样的 HTTP 客户端,可以按如下方式进行。
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
async function queryModel() {
const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
model: modelInfo.id,
messages: [
{ role: "user", content: "What is the golden ratio?" },
],
}),
});
const data = await response.json();
console.log(data.choices[0].message.content);
}
queryModel();
流式处理响应
如果要使用提取 API 接收流式处理响应,可以按如下所示修改代码:
import { FoundryLocalManager } from "foundry-local-sdk";
// By using an alias, the most suitable model will be downloaded
// to your end-user's device.
// TIP: You can find a list of available models by running the
// following command in your terminal: `foundry model list`.
const alias = "phi-3.5-mini";
// Create a FoundryLocalManager instance. This will start the Foundry
// Local service if it is not already running.
const foundryLocalManager = new FoundryLocalManager()
// Initialize the manager with a model. This will download the model
// if it is not already present on the user's device.
const modelInfo = await foundryLocalManager.init(alias)
console.log("Model Info:", modelInfo)
async function streamWithFetch() {
const response = await fetch(foundryLocalManager.endpoint + "/chat/completions", {
method: "POST",
headers: {
"Content-Type": "application/json",
Accept: "text/event-stream",
},
body: JSON.stringify({
model: modelInfo.id,
messages: [{ role: "user", content: "what is the golden ratio?" }],
stream: true,
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n").filter((line) => line.trim() !== "");
for (const line of lines) {
if (line.startsWith("data: ")) {
const data = line.substring(6);
if (data === "[DONE]") continue;
try {
const json = JSON.parse(data);
const content = json.choices[0]?.delta?.content || "";
if (content) {
// Print to console without line breaks, similar to process.stdout.write
process.stdout.write(content);
}
} catch (e) {
console.error("Error parsing JSON:", e);
}
}
}
}
}
// Call the function to start streaming
streamWithFetch();