Vertex AI Model Optimizer

This guide shows you how to use the Vertex AI Model Optimizer, a dynamic endpoint that simplifies model selection by automatically choosing the best Gemini model for your needs. This document covers the following topics:

For more information on Model Optimizer pricing, see Pricing.

Benefits

Model Optimizer lets you:

  • Simplify model selection: Eliminates the need to choose a specific model for each application.
  • Optimize cost and quality: Lets you balance performance and budget according to your preferences.
  • Integrate seamlessly: Works with existing Gemini APIs and SDKs.
  • Track usage: Helps you monitor usage and identify potential cost savings.
  • Efficiently handle text-based tasks: Handles text-based tasks without requiring manual endpoint selection.

Supported models

The Model Optimizer routes requests to the following models:

  • Gemini 2.0 Flash (GA)
  • Gemini 2.5 Pro (Preview)

Language support

Model Optimizer supports all languages that are also supported by the Gemini models. (See Gemini Language support)

Supported use cases

The Model Optimizer supports text-only use cases, including:

  • Coding, including function calling and code execution
  • Summarization
  • Single and multi-turn chat
  • Question and answering

For limitations and how to handle them, see Handle unsupported features.

Getting started

To get started with Model Optimizer, see our quickstart Colab notebook.

Using Vertex AI Model Optimizer

Python

Install

pip install --upgrade google-genai

To learn more, see the SDK reference documentation.

Set environment variables to use the Gen AI SDK with Vertex AI:

# Replace the `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` values
# with appropriate values for your project.
export GOOGLE_CLOUD_PROJECT=GOOGLE_CLOUD_PROJECT
export GOOGLE_CLOUD_LOCATION=global
export GOOGLE_GENAI_USE_VERTEXAI=True

#     from google import genai
#     from google.genai.types import (
#         FeatureSelectionPreference,
#         GenerateContentConfig,
#         HttpOptions,
#         ModelSelectionConfig
#     )
#
#     client = genai.Client(http_options=HttpOptions(api_version="v1beta1"))
#     response = client.models.generate_content(
#         model="model-optimizer-exp-04-09",
#         contents="How does AI work?",
#         config=GenerateContentConfig(
#             model_selection_config=ModelSelectionConfig(
#                 feature_selection_preference=FeatureSelectionPreference.BALANCED  # Options: PRIORITIZE_QUALITY, BALANCED, PRIORITIZE_COST
#             ),
#         ),
#     )
#     print(response.text)
#     # Example response:
#     # Okay, let's break down how AI works. It's a broad field, so I'll focus on the ...
#     #
#     # Here's a simplified overview:
#     # ...

Handle unsupported features

Model Optimizer only supports text input and output. However, a request could include different modalities or tools that aren't supported. The following sections cover how Model Optimizer handles these unsupported features.

Multimodal requests

Requests that include prompts with multimodal data, such as video, images or audio, will throw an INVALID_ARGUMENT error.

Unsupported tools

Model Optimizer only supports function declaration for requests. If a request contains other tool types including google_maps, google_search, enterprise_web_search, retrieval, or browse, an INVALID_ARGUMENT error is thrown.

Send feedback

To send feedback about your experience with Model Optimizer, fill out our feedback survey.

If you have questions, technical issues, or feedback about Model Optimizer, contact model-optimizer-support@google.com.

Customer discussion group

To connect directly with the development team, you can join the Vertex AI Model Optimizer Listening Group, where you can learn about the product and help us understand how to make the features work better for you. The group's activities include:

  • Virtual workshops to learn more about the features.
  • Feedback surveys to share your needs and priorities.
  • 1:1 sessions with Google Cloud employees as we explore new features.

Activities are offered about once every 6-8 weeks. You can take part in as many or as few as you'd like, or you can opt out entirely at any time. To join the group, complete the Vertex AI Model Optimizer discussion group sign up form.