Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Tutorial: Vertex AI API in express mode

This guide shows you how to use the Vertex AI API in express mode to quickly try out core generative AI features, covering the following topics:

Install and initialize the SDK: Set up the Google Gen AI SDK and authenticate with your API key to use express mode.
Send requests to the model: Learn how to send non-streaming, streaming, and function calling requests to the Gemini for Google Cloud API.

Concepts

Express mode: A streamlined way to access Vertex AI features using an API key for rapid prototyping and development.
Streaming request: An API request that returns the response in chunks as it's generated, reducing perceived latency for users.
Non-streaming request: An API request that returns the complete response in a single chunk after all processing is finished.
Function calling: A feature that lets the model request a call to an external tool or function during a conversation to obtain more information.

Install and initialize the Google Gen AI SDK for express mode

The Google Gen AI SDK lets you use Google generative AI models to build AI-powered applications. When using Vertex AI in express mode, you install and initialize the google-genai package to authenticate with your generated API key.

Install the SDK

To install the Google Gen AI SDK for express mode, run the following commands. If you're using Colab, ignore any dependency conflicts and restart the runtime after installation.

# Developer TODO: If you're using Colab, uncomment the following lines:
# from google.colab import auth
# auth.authenticate_user()

!pip install google-genai

!pip install --force-reinstall -qq "numpy<2.0"

Initialize the client

Configure the API key for express mode and environment variables. For details on getting an API key, see Vertex AI in express mode overview.

from google import genai
from google.genai import types

# Developer TODO: Replace YOUR_API_KEY with your API key.
API_KEY = "YOUR_API_KEY"

client = genai.Client(
    vertexai=True, api_key=API_KEY
)

Send a request to the Gemini for Google Cloud API

You can send either streaming or non-streaming requests to the Gemini for Google Cloud API. The following table compares these two methods.

Request Type	Description	Pros	Cons	Use Case
Streaming	Returns the response in chunks as it is being processed.	Reduces perceived latency; ideal for interactive applications.	Requires handling multiple response chunks.	Chatbots, real-time content generation.
Non-streaming	Returns the entire response in a single chunk after processing is complete.	Simpler to handle a single response object.	Higher perceived latency for long responses.	Offline processing, summarizing long documents.

Send a streaming request

To send a streaming request, set stream=True and print the response in chunks as they arrive.

from google import genai
from google.genai import types

def generate():
  client = genai.Client(vertexai=True, api_key=YOUR_API_KEY)
  
  config=types.GenerateContentConfig(
      temperature=0,
      top_p=0.95,
      top_k=20,
      candidate_count=1,
      seed=5,
      max_output_tokens=100,
      stop_sequences=["STOP!"],
      presence_penalty=0.0,
      frequency_penalty=0.0,
      safety_settings=[
          types.SafetySetting(
              category="HARM_CATEGORY_HATE_SPEECH",
              threshold="BLOCK_ONLY_HIGH",
          )
      ],
  )
  for chunk in client.models.generate_content_stream(
    model="gemini-2.0-flash-001",
    contents="Explain bubble sort to me",
    config=config,
  ):
    print(chunk.text)

generate()

Send a non-streaming request

The following code sample defines a function that sends a non-streaming request to the gemini-2.0-flash-001 model. It shows you how to configure basic request parameters and safety settings.

from google import genai
from google.genai import types

def generate():
  client = genai.Client(vertexai=True, api_key=YOUR_API_KEY)
  
  config=types.GenerateContentConfig(
      temperature=0,
      top_p=0.95,
      top_k=20,
      candidate_count=1,
      seed=5,
      max_output_tokens=100,
      stop_sequences=["STOP!"],
      presence_penalty=0.0,
      frequency_penalty=0.0,
      safety_settings=[
          types.SafetySetting(
              category="HARM_CATEGORY_HATE_SPEECH",
              threshold="BLOCK_ONLY_HIGH",
          )
      ],
  )
  response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents="Explain bubble sort to me",
    config=config,
  )
  print(response.text)

generate()

Send a function calling request

The following code sample declares a function and passes it as a tool to the model. When the model determines that the function should be called, it returns a function call part in the response. You can then invoke the function with the provided arguments and pass the result back to the model to continue the conversation.

function_response_parts = [
    {
        'function_response': {
            'name': 'get_current_weather',
            'response': {
                'name': 'get_current_weather',
                'content': {'weather': 'super nice'},
            },
        },
    },
]
manual_function_calling_contents = [
    {'role': 'user', 'parts': [{'text': 'What is the weather in Boston?'}]},
    {
        'role': 'model',
        'parts': [{
            'function_call': {
                'name': 'get_current_weather',
                'args': {'___location': 'Boston'},
            }
        }],
    },
    {'role': 'user', 'parts': function_response_parts},
]
function_declarations = [{
    'name': 'get_current_weather',
    'description': 'Get the current weather in a city',
    'parameters': {
        'type': 'OBJECT',
        'properties': {
            '___location': {
                'type': 'STRING',
                'description': 'The ___location to get the weather for',
            },
            'unit': {
                'type': 'STRING',
                'enum': ['C', 'F'],
            },
        },
    },
}]

response = client.models.generate_content(
    model="gemini-2.0-flash-001",
    contents=manual_function_calling_contents,
    config=dict(tools=[{'function_declarations': function_declarations}]),
)
print(response.text)

Clean up

This tutorial does not create any Google Cloud resources, so no clean up is needed to avoid charges.

What's next

Try the Vertex AI Studio tutorial for Vertex AI in express mode.
See the complete API reference for Vertex AI in express mode.