Once you have deployed a model in Azure AI Foundry, you can consume its capabilities via Azure AI Foundry APIs. There are two different endpoints and APIs to use models in Azure AI Foundry Models.
Models inference endpoint
The models inference endpoint (usually with the form https://<resource-name>.services.ai.azure.com/models
) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the Azure AI Model Inference API which all the models in Foundry Models support. It supports the following modalities:
- Text embeddings
- Image embeddings
- Chat completions
Routing
The inference endpoint routes requests to a given deployment by matching the parameter name
inside of the request to the name of the deployment. This means that deployments work as an alias of a given model under certain configurations. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
For example, if you create a deployment named Mistral-large
, then such deployment can be invoked as:
Install the package azure-ai-inference
using your package manager, like pip:
pip install azure-ai-inference
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential
client = ChatCompletionsClient(
endpoint="https://<resource>.services.ai.azure.com/models",
credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
)
Explore our samples and read the API reference documentation to get yourself started.
Install the package @azure-rest/ai-inference
using npm:
npm install @azure-rest/ai-inference
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
import ModelClient from "@azure-rest/ai-inference";
import { isUnexpected } from "@azure-rest/ai-inference";
import { AzureKeyCredential } from "@azure/core-auth";
const client = new ModelClient(
"https://<resource>.services.ai.azure.com/models",
new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL)
);
Explore our samples and read the API reference documentation to get yourself started.
Install the Azure AI inference library with the following command:
dotnet add package Azure.AI.Inference --prerelease
Import the following namespaces:
using Azure;
using Azure.Identity;
using Azure.AI.Inference;
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
ChatCompletionsClient client = new ChatCompletionsClient(
new Uri("https://<resource>.services.ai.azure.com/models"),
new AzureKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);
Explore our samples and read the API reference documentation to get yourself started.
Add the package to your project:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-inference</artifactId>
<version>1.0.0-beta.1</version>
</dependency>
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
ChatCompletionsClient client = new ChatCompletionsClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("https://<resource>.services.ai.azure.com/models")
.buildClient();
Explore our samples and read the API reference documentation to get yourself started.
Use the reference section to explore the API design and which parameters are available. For example, the reference section for Chat completions details how to use the route /chat/completions
to generate predictions based on chat-formatted instructions. Notice that the path /models
is included to the root of the URL:
Request
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json
from azure.ai.inference.models import SystemMessage, UserMessage
response = client.complete(
messages=[
SystemMessage(content="You are a helpful assistant."),
UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
],
model="mistral-large"
)
print(response.choices[0].message.content)
var messages = [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
];
var response = await client.path("/chat/completions").post({
body: {
messages: messages,
model: "mistral-large"
}
});
console.log(response.body.choices[0].message.content)
requestOptions = new ChatCompletionsOptions()
{
Messages = {
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph")
},
Model = "mistral-large"
};
response = client.Complete(requestOptions);
Console.WriteLine($"Response: {response.Value.Content}");
List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));
ChatCompletions chatCompletions = client.complete(new ChatCompletionsOptions(chatMessages));
for (ChatChoice choice : chatCompletions.getChoices()) {
ChatResponseMessage message = choice.getMessage();
System.out.println("Response:" + message.getContent());
}
Request
POST https://<resource>.services.ai.azure.com/models/chat/completions?api-version=2024-05-01-preview
api-key: <api-key>
Content-Type: application/json
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Explain Riemann's conjecture in 1 paragraph"
}
],
"model": "mistral-large"
}
Tip
Deployment routing isn't case sensitive.
Azure OpenAI inference endpoint
Azure AI Foundry also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference. Non-OpenAI models can also be used for compatible functionalities.
Azure OpenAI endpoints (usually with the form https://<resource-name>.openai.azure.com
) work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for Azure OpenAI API
Each deployment has a URL that is the concatenations of the Azure OpenAI base URL and the route /deployments/<model-deployment-name>
.
Install the package openai
using your package manager, like pip:
pip install openai --upgrade
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
import os
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint = "https://<resource>.services.ai.azure.com"
api_key=os.getenv("AZURE_INFERENCE_CREDENTIAL"),
api_version="2024-10-21",
)
Install the package openai
using npm:
npm install openai
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
import { AzureKeyCredential } from "@azure/openai";
const endpoint = "https://<resource>.services.ai.azure.com";
const apiKey = new AzureKeyCredential(process.env.AZURE_INFERENCE_CREDENTIAL);
const apiVersion = "2024-10-21"
const client = new AzureOpenAI({
endpoint,
apiKey,
apiVersion,
"deepseek-v3-0324"
});
Here, deepseek-v3-0324
is the name of a model deployment in the Azure AI Foundry resource.
Install the OpenAI library with the following command:
dotnet add package Azure.AI.OpenAI --prerelease
You can use the package to consume the model. The following example shows how to create a client to consume chat completions:
AzureOpenAIClient client = new(
new Uri("https://<resource>.services.ai.azure.com"),
new ApiKeyCredential(Environment.GetEnvironmentVariable("AZURE_INFERENCE_CREDENTIAL"))
);
Add the package to your project:
<dependency>
<groupId>com.azure</groupId>
<artifactId>azure-ai-openai</artifactId>
<version>1.0.0-beta.16</version>
</dependency>
Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:
OpenAIClient client = new OpenAIClientBuilder()
.credential(new AzureKeyCredential("{key}"))
.endpoint("https://<resource>.services.ai.azure.com")
.buildClient();
Use the reference section to explore the API design and which parameters are available. For example, the reference section for Chat completions details how to use the route /chat/completions
to generate predictions based on chat-formatted instructions:
Request
POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
api-key: <api-key>
Content-Type: application/json
Here, deepseek-v3-0324
is the name of a model deployment in the Azure AI Foundry resource.
response = client.chat.completions.create(
model="deepseek-v3-0324", # Replace with your model dpeloyment name.
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain Riemann's conjecture in 1 paragraph"}
]
)
print(response.model_dump_json(indent=2)
var messages = [
{ role: "system", content: "You are a helpful assistant" },
{ role: "user", content: "Explain Riemann's conjecture in 1 paragraph" },
];
const response = await client.chat.completions.create({ messages, model: "deepseek-v3-0324" });
console.log(response.choices[0].message.content)
ChatCompletion response = chatClient.CompleteChat(
[
new SystemChatMessage("You are a helpful assistant."),
new UserChatMessage("Explain Riemann's conjecture in 1 paragraph"),
]);
Console.WriteLine($"{response.Role}: {response.Content[0].Text}");
List<ChatRequestMessage> chatMessages = new ArrayList<>();
chatMessages.add(new ChatRequestSystemMessage("You are a helpful assistant"));
chatMessages.add(new ChatRequestUserMessage("Explain Riemann's conjecture in 1 paragraph"));
ChatCompletions chatCompletions = client.getChatCompletions("deepseek-v3-0324",
new ChatCompletionsOptions(chatMessages));
System.out.printf("Model ID=%s is created at %s.%n", chatCompletions.getId(), chatCompletions.getCreatedAt());
for (ChatChoice choice : chatCompletions.getChoices()) {
ChatResponseMessage message = choice.getMessage();
System.out.printf("Index: %d, Chat Role: %s.%n", choice.getIndex(), message.getRole());
System.out.println("Message:");
System.out.println(message.getContent());
}
Here, deepseek-v3-0324
is the name of a model deployment in the Azure AI Foundry resource.
Request
POST https://<resource>.services.ai.azure.com/openai/deployments/deepseek-v3-0324/chat/completions?api-version=2024-10-21
api-key: <api-key>
Content-Type: application/json
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant"
},
{
"role": "user",
"content": "Explain Riemann's conjecture in 1 paragraph"
}
]
}
Here, deepseek-v3-0324
is the name of a model deployment in the Azure AI Foundry resource.
Next steps