Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.

Overview of self-deployed models

Model Garden offers both self-deployed open, partner, and custom models that you can deploy and serve on Vertex AI. These models are different from the model-as-a-service (MaaS) offerings, which are serverless and require no manual deployment.

When you self deploy models, you deploy them securely within your Google Cloud project and VPC network.

Self-deploy open models

Open models provide pretrained capabilities for various AI tasks, including Gemini models that excel in multimodal processing. An open model is freely available, you are free to publish its outputs, and it can be used anywhere as long as you adhere to its licensing terms. Vertex AI offers both open (also known as open weight) and open source models.

When you use an open model with Vertex AI, you use Vertex AI for your infrastructure. You can also use open models with other infrastructure products, such as PyTorch or Jax.

Open weight models

Many open models are considered open weight large language models (LLMs). Open models provide more transparency than models that aren't open weight. A model's weights are the numerical values stored in the model's neural network architecture that represent learned patterns and relationships from the data a model is trained on. The pretrained parameters, or weights, of open weight models are released. You can use an open weight model for inference and tuning while details such as the original dataset, model architecture, and training code aren't provided.

Open source models

Open models differ from open source AI models. While open models often expose the weights and the core numerical representation of learned patterns, they don't necessarily provide the full source code or training details. Providing weights offers a level of AI model transparency, allowing you to understand the model's capabilities without needing to build it yourself.

Self-deployed partner models

Model Garden helps you purchase and manage model licenses from partners who offer proprietary models as a self deploy option. After you purchase access to a model from Cloud Marketplace, you can choose to deploy on on-demand hardware or use your Compute Engine reservations and committed use discounts to meet your budget requirements. You are charged for model usage and for the Vertex AI infrastructure that you use.

To request usage of a self-deployed partner model, find the relevant model in the Model Garden console, click Contact sales, and then complete the form, which initiates contact with a Google Cloud sales representative.

For more information about deploying and using partner models, see Deploy a partner model and make prediction requests.

Considerations

Consider the following limitations when using self-deployed partner models:

Unlike with open models, you cannot export weights.
If you VPC Service Controls set up for your project, you can't upload models, which prevents you from deploying partner models.
For endpoints, only the shared public endpoint type is supported.

Deploy models with custom weights

Deploy models with custom weights is a Preview offering. You can fine tune models based on a predefined set of base models, and deploy your customized models on Vertex AI Model Garden. You can deploy your custom models using the custom weights import by uploading your model artifacts to a Cloud Storage bucket in your project, which is a one-click experience in Vertex AI.

Supported models

The public preview of Deploy models with custom weights is supported by the following base models:

Model name	Version
Llama	Llama-2: 7B, 13B Llama-3.1: 8B, 70B Llama-3.2: 1B, 3B Llama-4: Scout-17B, Maverick-17B CodeLlama-13B
Gemma	Gemma-2: 27B Gemma-3: 1B, 4B, 3-12B, 27B Medgemma: 4B, 27B-text
Qwen	Qwen2: 1.5B Qwen2.5: 0.5B, 1.5B, 7B, 32B Qwen3: 0.6B, 1.7B, 8B, 32B
Deepseek	Deepseek-R1 Deepseek-V3
Mistral and Mixtral	Mistral-7B-v0.1 Mixtral-8x7B-v0.1 Mistral-Nemo-Base-2407
Phi-4	Phi-4-reasoning

Limitations

Custom weights don't support the import of quantized models.

Model files

You must supply the model files in the Hugging Face weights format. For more information on the Hugging Face weights format, see Use Hugging Face Models.

If the required files aren't provided, the model deployment might fail.

This table lists the types of model files, which depend on the model's architecture:

Model file content	File type
Model configuration	`config.json`
Model weights	`.safetensors` `.bin`
Weights index	`*.index.json`
Tokenizer file(s)	`tokenizer.model` `tokenizer.json` `tokenizer_config.json`

Locations

You can deploy custom models in all regions from Model Garden services.

Prerequisites

This section demonstrates how to deploy your custom model.

Before you begin

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, on the project selector page, select or create a Google Cloud project.

Go to project selector

Verify that billing is enabled for your Google Cloud project.

Enable the Vertex AI API.

Enable the API

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, a Cloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

This tutorial assumes that you are using Cloud Shell to interact with Google Cloud. If you want to use a different shell instead of Cloud Shell, then perform the following additional configuration:

Install the Google Cloud CLI.
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
To initialize the gcloud CLI, run the following command:
```
gcloud init
```

Deploy the custom model

This section demonstrates how to deploy your custom model.

If you're using the command-line interface (CLI), Python, or JavaScript, replace the following variables with a value for your code samples to work:

REGION: Your region. For example, uscentral1.
MODEL_GCS: Your Google Cloud model. For example, gs://custom-weights-fishfooding/meta-llama/Llama-3.2-1B-Instruct.
PROJECT_ID: Your project ID.
MODEL_ID: Your model ID.
MACHINE_TYPE: Your machine type. For example, g2-standard-12.
ACCELERATOR_TYPE: Your accelerator type. For example, NVIDIA_L4.
ACCELERATOR_COUNT: Your accelerator count.
PROMPT: Your text prompt.

Console

The following steps show you how to use the Google Cloud console to deploy your model with custom weights.

In the Google Cloud console, go to the Model Garden page.

Go to Model Garden
Click Deploy model with custom weights. The Deploy a model with custom weights on Vertex AI pane appears.
In the Model source section, do the following:
1. Click Browse, and choose your bucket where your model is stored, and click Select.
2. Optional: Enter your model's name in the Model name field.
In the Deployment settings section, do the following:
1. From the Region field, select your region, and click OK.
2. In the Machine Spec field, select your machine specification, which is used to the deploy your model.
3. Optional: In the Endpoint name field, your model's endpoint appears by default. However, you can enter a different endpoint name in the field.
Click Deploy model with custom weights.

gcloud CLI

This command demonstrates how to deploy the model to a specific region.

gcloud ai model-garden models deploy --model=${MODEL_GCS} --region ${REGION}

This command demonstrates how to deploy the model to a specific region with its machine type, accelerator type, and accelerator count. If you want to select a specific machine configuration, then you must set all three fields.

gcloud ai model-garden models deploy --model=${MODEL_GCS} --machine-type=${MACHINE_TYE} --accelerator-type=${ACCELERATOR_TYPE} --accelerator-count=${ACCELERATOR_COUNT} --region ${REGION}

Python

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, ___location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy(
  machine_type="${MACHINE_TYPE}",
  accelerator_type="${ACCELERATOR_TYPE}",
  accelerator_count="${ACCELERATOR_COUNT}",
  model_display_name="custom-model",
  endpoint_display_name="custom-model-endpoint")

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

Alternatively, you don't have to pass a parameter to the custom_model.deploy() method.

import vertexai
from google.cloud import aiplatform
from vertexai.preview import model_garden

vertexai.init(project=${PROJECT_ID}, ___location=${REGION})
custom_model = model_garden.CustomModel(
  gcs_uri=GCS_URI,
)
endpoint = custom_model.deploy()

endpoint.predict(instances=[{"prompt": "${PROMPT}"}], use_dedicated_endpoint=True)

curl


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
}'

Alternatively, you can use the API to explicitly set the machine type.


curl -X POST \
  -H "Authorization: Bearer $(gcloud auth print-access-token)" \
  -H "Content-Type: application/json" \
  "https://${REGION}-aiplatform.googleapis.com/v1beta1/projects/${PROJECT_ID}/locations/${REGION}:deploy" \
  -d '{
    "custom_model": {
    "gcs_uri": "'"${MODEL_GCS}"'"
  },
  "destination": "projects/'"${PROJECT_ID}"'/locations/'"${REGION}"'",
  "model_config": {
     "model_user_id": "'"${MODEL_ID}"'",
  },
  "deploy_config": {
    "dedicated_resources": {
      "machine_spec": {
        "machine_type": "'"${MACHINE_TYPE}"'",
        "accelerator_type": "'"${ACCELERATOR_TYPE}"'",
        "accelerator_count": '"${ACCELERATOR_COUNT}"'
      },
      "min_replica_count": 1
    }
  }
}'

Learn more about self-deployed models in Vertex AI

For more information about Model Garden, see Overview of Model Garden.
For more information about deploying models, see Use models in Model Garden.
Use Gemma open models
Use Llama open models
Use Hugging Face open models