Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Send requests to the model: Learn how to send non-streaming, streaming, and function calling requests to the Gemini for Google Cloud API.
Concepts
Express mode: A streamlined way to access Vertex AI features using an API key for rapid prototyping and development.
Streaming request: An API request that returns the response in chunks as it's generated, reducing perceived latency for users.
Non-streaming request: An API request that returns the complete response in a single chunk after all processing is finished.
Function calling: A feature that lets the model request a call to an external tool or function during a conversation to obtain more information.
Install and initialize the Google Gen AI SDK for express mode
The Google Gen AI SDK lets you use Google generative AI models to build AI-powered applications. When using Vertex AI in express mode, you install and initialize the google-genai package to authenticate with your generated API key.
Install the SDK
To install the Google Gen AI SDK for express mode, run the following commands. If you're using Colab, ignore any dependency conflicts and restart the runtime after installation.
# Developer TODO: If you're using Colab, uncomment the following lines:# from google.colab import auth# auth.authenticate_user()!pipinstallgoogle-genai!pipinstall--force-reinstall-qq"numpy<2.0"
Initialize the client
Configure the API key for express mode and environment variables. For details on getting an API key, see Vertex AI in express mode overview.
fromgoogleimportgenaifromgoogle.genaiimporttypes# Developer TODO: Replace YOUR_API_KEY with your API key.API_KEY="YOUR_API_KEY"client=genai.Client(vertexai=True,api_key=API_KEY)
Send a request to the Gemini for Google Cloud API
You can send either streaming or non-streaming requests to the Gemini for Google Cloud API. The following table compares these two methods.
Request Type
Description
Pros
Cons
Use Case
Streaming
Returns the response in chunks as it is being processed.
Reduces perceived latency; ideal for interactive applications.
Requires handling multiple response chunks.
Chatbots, real-time content generation.
Non-streaming
Returns the entire response in a single chunk after processing is complete.
Simpler to handle a single response object.
Higher perceived latency for long responses.
Offline processing, summarizing long documents.
Send a streaming request
To send a streaming request, set stream=True and print the response in chunks as they arrive.
fromgoogleimportgenaifromgoogle.genaiimporttypesdefgenerate():client=genai.Client(vertexai=True,api_key=YOUR_API_KEY)config=types.GenerateContentConfig(temperature=0,top_p=0.95,top_k=20,candidate_count=1,seed=5,max_output_tokens=100,stop_sequences=["STOP!"],presence_penalty=0.0,frequency_penalty=0.0,safety_settings=[types.SafetySetting(category="HARM_CATEGORY_HATE_SPEECH",threshold="BLOCK_ONLY_HIGH",)],)forchunkinclient.models.generate_content_stream(model="gemini-2.0-flash-001",contents="Explain bubble sort to me",config=config,):print(chunk.text)generate()
Send a non-streaming request
The following code sample defines a function that sends a non-streaming request to the gemini-2.0-flash-001 model. It shows you how to configure basic request parameters and safety settings.
fromgoogleimportgenaifromgoogle.genaiimporttypesdefgenerate():client=genai.Client(vertexai=True,api_key=YOUR_API_KEY)config=types.GenerateContentConfig(temperature=0,top_p=0.95,top_k=20,candidate_count=1,seed=5,max_output_tokens=100,stop_sequences=["STOP!"],presence_penalty=0.0,frequency_penalty=0.0,safety_settings=[types.SafetySetting(category="HARM_CATEGORY_HATE_SPEECH",threshold="BLOCK_ONLY_HIGH",)],)response=client.models.generate_content(model="gemini-2.0-flash-001",contents="Explain bubble sort to me",config=config,)print(response.text)generate()
Send a function calling request
The following code sample declares a function and passes it as a tool to the model. When the model determines that the function should be called, it returns a function call part in the response. You can then invoke the function with the provided arguments and pass the result back to the model to continue the conversation.
function_response_parts=[{'function_response':{'name':'get_current_weather','response':{'name':'get_current_weather','content':{'weather':'super nice'},},},},]manual_function_calling_contents=[{'role':'user','parts':[{'text':'What is the weather in Boston?'}]},{'role':'model','parts':[{'function_call':{'name':'get_current_weather','args':{'___location':'Boston'},}}],},{'role':'user','parts':function_response_parts},]function_declarations=[{'name':'get_current_weather','description':'Get the current weather in a city','parameters':{'type':'OBJECT','properties':{'___location':{'type':'STRING','description':'The ___location to get the weather for',},'unit':{'type':'STRING','enum':['C','F'],},},},}]response=client.models.generate_content(model="gemini-2.0-flash-001",contents=manual_function_calling_contents,config=dict(tools=[{'function_declarations':function_declarations}]),)print(response.text)
Clean up
This tutorial does not create any Google Cloud resources, so no clean up is
needed to avoid charges.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-05 UTC."],[],[]]