Edit

Share via


Use Foundry Models

Once you have deployed a model in Azure AI Foundry, you can consume its capabilities via Azure AI Foundry APIs. There are two different endpoints and APIs to use models in Azure AI Foundry Models.

Models inference endpoint

The models inference endpoint (usually with the form https://<resource-name>.services.ai.azure.com/models) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the Azure AI Model Inference API which all the models in Foundry Models support. It supports the following modalities:

  • Text embeddings
  • Image embeddings
  • Chat completions

Routing

The inference endpoint routes requests to a given deployment by matching the parameter name inside of the request to the name of the deployment. This means that deployments work as an alias of a given model under certain configurations. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.

An illustration showing how routing works for a Meta-llama-3.2-8b-instruct model by indicating such name in the parameter 'model' inside of the payload request.

For example, if you create a deployment named Mistral-large, then such deployment can be invoked as:

Install the package azure-ai-inference using your package manager, like pip:

pip install azure-ai-inference

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import os
from azure.ai.inference import ChatCompletionsClient
from azure.core.credentials import AzureKeyCredential

client = ChatCompletionsClient(
    endpoint="https://<resource>.services.ai.azure.com/models",
    credential=AzureKeyCredential(os.environ["AZURE_INFERENCE_CREDENTIAL"]),
)

Explore our samples and read the API reference documentation to get yourself started.

from azure.ai.inference.models import SystemMessage, UserMessage

response = client.complete(
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Explain Riemann's conjecture in 1 paragraph"),
    ],
    model="mistral-large"
)

print(response.choices[0].message.content)

Tip

Deployment routing isn't case sensitive.

Azure OpenAI inference endpoint

Azure AI Foundry also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference. Non-OpenAI models can also be used for compatible functionalities.

Azure OpenAI endpoints (usually with the form https://<resource-name>.openai.azure.com) work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for Azure OpenAI API

An illustration showing how Azure OpenAI deployments contain a single URL for each deployment.

Each deployment has a URL that is the concatenations of the Azure OpenAI base URL and the route /deployments/<model-deployment-name>.

Install the package openai using your package manager, like pip:

pip install openai --upgrade

Then, you can use the package to consume the model. The following example shows how to create a client to consume chat completions:

import os
from openai import AzureOpenAI
    
client = AzureOpenAI(
    azure_endpoint = "https://<resource>.services.ai.azure.com"
    api_key=os.getenv("AZURE_INFERENCE_CREDENTIAL"),  
    api_version="2024-10-21",
)
response = client.chat.completions.create(
    model="deepseek-v3-0324", # Replace with your model dpeloyment name.
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain Riemann's conjecture in 1 paragraph"}
    ]
)

print(response.model_dump_json(indent=2)

Next steps