Tutorial: Vertex AI API in express mode

This guide shows you how to use the Vertex AI API in express mode to quickly try out core generative AI features, covering the following topics:

Concepts

  • Express mode: A streamlined way to access Vertex AI features using an API key for rapid prototyping and development.
  • Streaming request: An API request that returns the response in chunks as it's generated, reducing perceived latency for users.
  • Non-streaming request: An API request that returns the complete response in a single chunk after all processing is finished.
  • Function calling: A feature that lets the model request a call to an external tool or function during a conversation to obtain more information.

Send a request to the Gemini for Google Cloud API

You can send either streaming or non-streaming requests to the Gemini for Google Cloud API. The following table compares these two methods.

Request Type Description Pros Cons Use Case
Streaming Returns the response in chunks as it is being processed. Reduces perceived latency; ideal for interactive applications. Requires handling multiple response chunks. Chatbots, real-time content generation.
Non-streaming Returns the entire response in a single chunk after processing is complete. Simpler to handle a single response object. Higher perceived latency for long responses. Offline processing, summarizing long documents.

Clean up

This tutorial does not create any Google Cloud resources, so no clean up is needed to avoid charges.

What's next