Available Azure AI Foundry Models

2025-05-19

Azure AI Foundry Models gives you access to flagship models in Azure AI Foundry to consume them as APIs without hosting them on your infrastructure.

A selection of models is offered directly by Microsoft under Models Sold Directly by Azure which brings the most powerful options to developers to build AI applications. We also enable the breath of models by partnering with key players in the industry and bringing Models from Partners and Community.

Models Sold Directly by Azure

Models Sold Directly by Azure is a selection of flagship models offered directly by Microsoft. These models don't require integration with Azure Marketplace.

Azure OpenAI

Azure OpenAI in Azure AI Foundry Models offers a diverse set of models with different capabilities and price points. Learn more details at Azure OpenAI Model availability. These models include:

State-of-the-art models designed to tackle reasoning and problem-solving tasks with increased focus and capability
Models that can understand and generate natural language and code
Models that can transcribe and translate speech to text

Model	Type	Tier	Capabilities
o3-mini	chat-completion	Global standard	- Input: text and image (200,000 tokens) - Output: text (100,000 tokens) - Languages: `en`, `it`, `af`, `es`, `de`, `fr`, `id`, `ru`, `pl`, `uk`, `el`, `lv`, `zh`, `ar`, `tr`, `ja`, `sw`, `cy`, `ko`, `is`, `bn`, `ur`, `ne`, `th`, `pa`, `mr`, and `te`. - Tool calling: Yes - Response formats: Text, JSON, structured outputs
o1	chat-completion	Global standard	- Input: text and image (200,000 tokens) - Output: text (100,000 tokens) - Languages: `en`, `it`, `af`, `es`, `de`, `fr`, `id`, `ru`, `pl`, `uk`, `el`, `lv`, `zh`, `ar`, `tr`, `ja`, `sw`, `cy`, `ko`, `is`, `bn`, `ur`, `ne`, `th`, `pa`, `mr`, and `te`. - Tool calling: Yes - Response formats: Text, JSON, structured outputs
o1-preview	chat-completion	Global standard Standard	- Input: text (128,000 tokens) - Output: (32,768 tokens) - Languages: `en`, `it`, `af`, `es`, `de`, `fr`, `id`, `ru`, `pl`, `uk`, `el`, `lv`, `zh`, `ar`, `tr`, `ja`, `sw`, `cy`, `ko`, `is`, `bn`, `ur`, `ne`, `th`, `pa`, `mr`, and `te`. - Tool calling: Yes - Response formats: Text, JSON, structured outputs
o1-mini	chat-completion	Global standard Standard	- Input: text (128,000 tokens) - Output: (65,536 tokens) - Languages: `en`, `it`, `af`, `es`, `de`, `fr`, `id`, `ru`, `pl`, `uk`, `el`, `lv`, `zh`, `ar`, `tr`, `ja`, `sw`, `cy`, `ko`, `is`, `bn`, `ur`, `ne`, `th`, `pa`, `mr`, and `te`. - Tool calling: No - Response formats: Text
gpt-4o-realtime-preview	real-time	Global standard	- Input: control, text, and audio (131,072 tokens) - Output: text and audio (16,384 tokens) - Languages: `en` - Tool calling: Yes - Response formats: Text, JSON
gpt-4o	chat-completion	Global standard Standard Batch Provisioned Global provisioned Data Zone	- Input: text and image (131,072 tokens) - Output: text (16,384 tokens) - Languages: `en`, `it`, `af`, `es`, `de`, `fr`, `id`, `ru`, `pl`, `uk`, `el`, `lv`, `zh`, `ar`, `tr`, `ja`, `sw`, `cy`, `ko`, `is`, `bn`, `ur`, `ne`, `th`, `pa`, `mr`, and `te`. - Tool calling: Yes - Response formats: Text, JSON, structured outputs
gpt-4o-mini	chat-completion	Global standard Standard Batch Provisioned Global provisioned Data Zone	- Input: text, image, and audio (131,072 tokens) - Output: (16,384 tokens) - Languages: `en`, `it`, `af`, `es`, `de`, `fr`, `id`, `ru`, `pl`, `uk`, `el`, `lv`, `zh`, `ar`, `tr`, `ja`, `sw`, `cy`, `ko`, `is`, `bn`, `ur`, `ne`, `th`, `pa`, `mr`, and `te`. - Tool calling: Yes - Response formats: Text, JSON, structured outputs
text-embedding-3-large	embeddings	Global standard Standard Provisioned Global provisioned	- Input: text (8,191 tokens) - Output: Vector (3,072 dim.) - Languages: `en`
text-embedding-3-small	embeddings	Global standard Standard Provisioned Global provisioned	- Input: text (8,191 tokens) - Output: Vector (1,536 dim.) - Languages: `en`

See this model collection in Azure AI Foundry portal.

DeepSeek

DeepSeek family of models includes DeepSeek-R1, which excels at reasoning tasks using a step-by-step training process, such as language, scientific reasoning, and coding tasks.

Model	Type	Tier	Capabilities
DeepSeek-R1-0528	chat-completion	Global standard	- Input: text (163,840 tokens) - Output: text (163,840 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text
DeepSeek-V3-0324	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (131,072 tokens) - Languages: `en` and `zh` - Tool calling: Yes - Response formats: Text, JSON
DeepSeek-R1	chat-completion (with reasoning content)	Global standard	- Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text.
DeepSeek-V3 (Legacy)	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (131,072 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text, JSON

For a tutorial on DeepSeek-R1, see Tutorial: Get started with DeepSeek-R1 reasoning model in Azure AI Foundry Models.

See this model collection in Azure AI Foundry portal.

Microsoft

Microsoft models include various model groups such as MAI models, Phi models, healthcare AI models, and more. Some Microsoft models are offered as Models from Partners and Community. To see all the available Microsoft models, view the Microsoft model collection in Azure AI Foundry portal.

Model	Type	Tier	Capabilities
MAI-DS-R1	chat-completion (with reasoning content)	Global standard	- Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text.

Mistral AI

Mistral AI offers two categories of models: premium models including Mistral Large and Mistral Small and open models including Mistral Nemo. Some Mistral models are offered as Models from Partners and Community.

Model	Type	Tier	Capabilities
Codestral-2501	chat-completion	Global standard	- Input: text (262,144 tokens) - Output: text (4,096 tokens) - Languages: en - Tool calling: No - Response formats: Text

See this model collection in Azure AI Foundry portal.

Model	Type	Tier	Capabilities
Llama-4-Maverick-17B-128E-Instruct-FP8	chat-completion	Global standard	- Input: text and images (1M tokens) - Output: text (1M tokens) - Languages: `ar`, `en`, `fr`, `de`, `hi`, `id`, `it`, `pt`, `es`, `tl`, `th`, and `vi` - Tool calling: No* - Response formats: Text
Llama-3.3-70B-Instruct	chat-completion	Global standard	- Input: text (128,000 tokens) - Output: text (8,192 tokens) - Languages: `en`, `de`, `fr`, `it`, `pt`, `hi`, `es`, and `th` - Tool calling: No* - Response formats: Text

xAI

xAI's Grok 3 and Grok 3 Mini models are designed to excel in various enterprise domains. Grok 3, a non-reasoning model pre-trained by the Colossus datacenter, is tailored for business use cases such as data extraction, coding, and text summarization, with exceptional instruction-following capabilities. It supports a 131,072 token context window, allowing it to handle extensive inputs while maintaining coherence and depth, and is particularly adept at drawing connections across domains and languages. On the other hand, Grok 3 Mini is a lightweight reasoning model trained to tackle agentic, coding, mathematical, and deep science problems with test-time compute. It also supports a 131,072 token context window for understanding codebases and enterprise documents, and excels at using tools to solve complex logical problems in novel environments, offering raw reasoning traces for user inspection with adjustable thinking budgets.

Model	Type	Tier	Capabilities
grok-3	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: `en` - Tool calling: yes - Response formats: text
grok-3-mini	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: text (131,072 tokens) - Languages: `en` - Tool calling: yes - Response formats: text

Models from Partners and Community

Models from Partners and Community available for deployment with pay-as-you-go billing (for example, Cohere models) are offered by the model provider but hosted in Microsoft-managed Azure infrastructure and accessed via API in the Azure AI Foundry. Model providers define the license terms and set the price for use of their models, while Azure AI Foundry manages the hosting infrastructure.

Models from Partners and Community are offered through Azure Marketplace and requires additional configuration for enabling.

AI21 Labs

The Jamba family models are AI21's production-grade Mamba-based large language model (LLM) which uses AI21's hybrid Mamba-Transformer architecture. It's an instruction-tuned version of AI21's hybrid structured state space model (SSM) transformer Jamba model. The Jamba family models are built for reliable commercial use with respect to quality and performance.

Model	Type	Tier	Capabilities
AI21-Jamba-1.5-Mini	chat-completion	Global standard	- Input: text (262,144 tokens) - Output: (4,096 tokens) - Languages: `en`, `fr`, `es`, `pt`, `de`, `ar`, and `he` - Tool calling: Yes - Response formats: Text, JSON, structured outputs
AI21-Jamba-1.5-Large	chat-completion	Global standard	- Input: text (262,144 tokens) - Output: (4,096 tokens) - Languages: `en`, `fr`, `es`, `pt`, `de`, `ar`, and `he` - Tool calling: Yes - Response formats: Text, JSON, structured outputs

See this model collection in Azure AI Foundry portal.

Cohere

The Cohere family of models includes various models optimized for different use cases, including chat completions and embeddings. Cohere models are optimized for various use cases that include reasoning, summarization, and question answering.

Model	Type	Tier	Capabilities
Cohere-command-r-plus-08-2024	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` - Tool calling: Yes - Response formats: Text, JSON
Cohere-command-r-08-2024	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` - Tool calling: Yes - Response formats: Text, JSON
Cohere-command-r-plus (deprecated)	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` - Tool calling: Yes - Response formats: Text, JSON
Cohere-command-r (deprecated)	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar` - Tool calling: Yes - Response formats: Text, JSON
Cohere-embed-v3-english	embeddings image-embeddings	Global standard	- Input: text (512 tokens) - Output: Vector (1,024 dim.) - Languages: en
Cohere-embed-v3-multilingual	embeddings image-embeddings	Global standard	- Input: text (512 tokens) - Output: Vector (1,024 dim.) - Languages: `en`, `fr`, `es`, `it`, `de`, `pt-br`, `ja`, `ko`, `zh-cn`, and `ar`

See this model collection in Azure AI Foundry portal.

Core42

Core42 includes autoregressive bi-lingual LLMs for Arabic & English with state-of-the-art capabilities in Arabic.

Model	Type	Tier	Capabilities
jais-30b-chat	chat-completion	Global standard	- Input: text (8,192 tokens) - Output: (4,096 tokens) - Languages: en and ar - Tool calling: Yes - Response formats: Text, JSON

See this model collection in Azure AI Foundry portal.

Model	Type	Tier	Capabilities
Llama-3.2-11B-Vision-Instruct	chat-completion	Global standard	- Input: text and image (128,000 tokens) - Output: (8,192 tokens) - Languages: `en` - Tool calling: No* - Response formats: Text
Llama-3.2-90B-Vision-Instruct	chat-completion	Global standard	- Input: text and image (128,000 tokens) - Output: (8,192 tokens) - Languages: `en` - Tool calling: No* - Response formats: Text
Meta-Llama-3.1-405B-Instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (8,192 tokens) - Languages: `en`, `de`, `fr`, `it`, `pt`, `hi`, `es`, and `th` - Tool calling: No* - Response formats: Text
Meta-Llama-3.1-70B-Instruct (deprecated)	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (8,192 tokens) - Languages: `en`, `de`, `fr`, `it`, `pt`, `hi`, `es`, and `th` - Tool calling: No* - Response formats: Text
Meta-Llama-3.1-8B-Instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (8,192 tokens) - Languages: `en`, `de`, `fr`, `it`, `pt`, `hi`, `es`, and `th` - Tool calling: No* - Response formats: Text
Meta-Llama-3-70B-Instruct (deprecated)	chat-completion	Global standard	- Input: text (8,192 tokens) - Output: (8,192 tokens) - Languages: `en` - Tool calling: No* - Response formats: Text
Meta-Llama-3-8B-Instruct (deprecated)	chat-completion	Global standard	- Input: text (8,192 tokens) - Output: (8,192 tokens) - Languages: `en` - Tool calling: No* - Response formats: Text

Microsoft

Microsoft models include various model groups such as MAI models, Phi models, healthcare AI models, and more. To see all the available Microsoft models, view the Microsoft model collection in Azure AI Foundry portal.

Model	Type	Tier	Capabilities
MAI-DS-R1	chat-completion (with reasoning content)	Global standard	- Input: text (163,840 tokens) - Output: (163,840 tokens) - Languages: `en` and `zh` - Tool calling: No - Response formats: Text.
Phi-4-mini-instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `ar`, `zh`, `cs`, `da`, `nl`, `en`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` - Tool calling: No - Response formats: Text
Phi-4-multimodal-instruct	chat-completion	Global standard	- Input: text, images, and audio (131,072 tokens) - Output: (4,096 tokens) - Languages: `ar`, `zh`, `cs`, `da`, `nl`, `en`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` - Tool calling: No - Response formats: Text
Phi-4	chat-completion	Global standard	- Input: text (16,384 tokens) - Output: (16,384 tokens) - Languages: `en`, `ar`, `bn`, `cs`, `da`, `de`, `el`, `es`, `fa`, `fi`, `fr`, `gu`, `ha`, `he`, `hi`, `hu`, `id`, `it`, `ja`, `jv`, `kn`, `ko`, `ml`, `mr`, `nl`, `no`, `or`, `pa`, `pl`, `ps`, `pt`, `ro`, `ru`, `sv`, `sw`, `ta`, `te`, `th`, `tl`, `tr`, `uk`, `ur`, `vi`, `yo`, and `zh` - Tool calling: No - Response formats: Text
Phi-3.5-mini-instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en`, `ar`, `zh`, `cs`, `da`, `nl`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` - Tool calling: No - Response formats: Text
Phi-3.5-vision-instruct	chat-completion	Global standard	- Input: text and image (131,072 tokens) - Output: (4,096 tokens) - Languages: `en` - Tool calling: No - Response formats: Text
Phi-3.5-MoE-instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: `en`, `ar`, `zh`, `cs`, `da`, `nl`, `fi`, `fr`, `de`, `he`, `hu`, `it`, `ja`, `ko`, `no`, `pl`, `pt`, `ru`, `es`, `sv`, `th`, `tr`, and `uk` - Tool calling: No - Response formats: Text
Phi-3-mini-128k-instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en` - Tool calling: No - Response formats: Text
Phi-3-mini-4k-instruct	chat-completion	Global standard	- Input: text (4,096 tokens) - Output: (4,096 tokens) - Languages: `en` - Tool calling: No - Response formats: Text
Phi-3-small-8k-instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en` - Tool calling: No - Response formats: Text
Phi-3-medium-128k-instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en` - Tool calling: No - Response formats: Text
Phi-3-medium-4k-instruct	chat-completion	Global standard	- Input: text (4,096 tokens) - Output: (4,096 tokens) - Languages: `en` - Tool calling: No - Response formats: Text
Phi-3-small-128k-instruct	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en` - Tool calling: No - Response formats: Text

See the Microsoft model collection in Azure AI Foundry portal.

Mistral AI

Mistral AI offers two categories of models: premium models including Mistral Large and Mistral Small and open models including Mistral Nemo.

Model	Type	Tier	Capabilities
Mistral-small-2503	chat-completion	Global standard	- Input: text (32,768 tokens) - Output: text (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON
Mistral-Large-2411	chat-completion	Global standard	- Input: text (128,000 tokens) - Output: text (4,096 tokens) - Languages: `en`, `fr`, `de`, `es`, `it`, `zh`, `ja`, `ko`, `pt`, `nl`, and `pl` - Tool calling: Yes - Response formats: Text, JSON
Ministral-3B	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON
Mistral-Nemo	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: text (4,096 tokens) - Languages: `en`, `fr`, `de`, `es`, `it`, `zh`, `ja`, `ko`, `pt`, `nl`, and `pl` - Tool calling: Yes - Response formats: Text, JSON
Mistral-large-2407 (deprecated)	chat-completion	Global standard	- Input: text (131,072 tokens) - Output: (4,096 tokens) - Languages: `en`, `fr`, `de`, `es`, `it`, `zh`, `ja`, `ko`, `pt`, `nl`, and `pl` - Tool calling: Yes - Response formats: Text, JSON
Mistral-small (deprecated)	chat-completion	Global standard	- Input: text (32,768 tokens) - Output: text (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON
Mistral-large (deprecated)	chat-completion	Global standard	- Input: text (32,768 tokens) - Output: (4,096 tokens) - Languages: fr, de, es, it, and en - Tool calling: Yes - Response formats: Text, JSON

See this model collection in Azure AI Foundry portal.

NTT Data

tsuzumi is an autoregressive language optimized transformer. The tuned versions use supervised fine-tuning (SFT). tsuzumi handles both Japanese and English language with high efficiency.

Model	Type	Tier	Capabilities
tsuzumi-7b	chat-completion	Global standard	- Input: text (8,192 tokens) - Output: text (8,192 tokens) - Languages: `en` and `jp` - Tool calling: No - Response formats: Text

Open and protected models

The Azure AI model catalog offers a larger selection of models, from a bigger range of providers. As opposite to Azure AI Foundry Models where models are provided as APIs, these models might require you to host them on your infrastructure, including the creation of an AI hub and project, and providing the underlying compute quota to host the models.

Those models can be of open access or IP protected. In both cases, you have to deploy them in Managed Compute offerings in Azure AI Foundry.

Next steps

Get started today and deploy your fist model in Azure AI Foundry Models

Share via

Available Azure AI Foundry Models

Models Sold Directly by Azure

Azure OpenAI

DeepSeek

Microsoft

Mistral AI

Meta

xAI

Models from Partners and Community

AI21 Labs

Cohere

Core42

Meta

Microsoft

Mistral AI

NTT Data

Open and protected models

Next steps

Feedback

Additional resources