Speech to text REST API

2025-05-25

Speech to text REST API is used for batch transcription and custom speech.

Important

Speech to text REST API version 2024-11-15 is the latest version that's generally available.

Speech to text REST API version 2024-05-15-preview will be retired on a date to be announced.
Speech to text REST API v3.0, v3.1, v3.2, 3.2-preview.1, and 3.2-preview.2 will be retired on March 31st, 2026.

For more information about upgrading, see the Speech to text REST API v3.0 to v3.1, v3.1 to v3.2, and v3.2 to 2024-11-15 migration guides.

See the Speech to text REST API 2024-11-15 reference documentation

Use Speech to text REST API to:

Fast transcription: Transcribe audio files with returning results synchronously and much faster than real-time audio. Use the fast transcription API (/speechtotext/transcriptions:transcribe) in the scenarios that you need the transcript of an audio recording as quickly as possible with predictable latency, such as quick audio or video transcription or video translation.
Batch transcription: Transcribe audio files as a batch from multiple URLs or an Azure container. Use the batch transcription API (/speechtotext/transcriptions:submit) in the scenarios that you need to transcribe a large amount of audio in storage, such as a large number of files or a long audio file.
Custom speech: Upload your own data, test and train a custom model, compare accuracy between models, and deploy a model to a custom endpoint. Copy models to other subscriptions if you want colleagues to have access to a model that you built, or if you want to deploy a model to more than one region.

Speech to text REST API includes such features as:

Request logs for each endpoint.
Request the manifest of the models that you create, to set up on-premises containers.
Upload data from Azure storage accounts by using a shared access signature (SAS) URI.
Bring your own storage. Use your own storage accounts for logs, transcription files, and other data.
Some operations support webhook notifications. You can register your webhooks where notifications are sent.

Fast transcription

The following operation groups are applicable for fast transcription.

Operation group	Description
Transcriptions	Use Transcriptions - Transcribe to transcribe audio files. When you use fast transcription you send a single file per request. See Create a transcription for examples of how to create a transcription from a single audio file.

Batch transcription

The following operation groups are applicable for batch transcription.

Operation group	Description
Models	Use base models or custom models to transcribe audio files. You can use models with custom speech and batch transcription. For example, you can use a model trained with a specific dataset to transcribe audio files. See Train a model and custom speech model lifecycle for examples of how to train and manage custom speech models.
Transcriptions	Use Transcriptions - Submit to transcribe a large amount of audio in storage. When you use batch transcription you send multiple files per request or point to an Azure Blob Storage container with the audio files to transcribe. See Create a transcription for examples of how to create a transcription from multiple audio files.
Web hooks	Use web hooks to receive notifications about creation, processing, completion, and deletion events. You can use web hooks with custom speech and batch transcription. Web hooks apply to datasets, endpoints, evaluations, models, and transcriptions.

Custom speech

The following operation groups are applicable for custom speech.

Operation group	Description
Datasets	Use datasets to train and test custom speech models. For example, you can compare the performance of a custom speech trained with a specific dataset to the performance of a base model or custom speech model trained with a different dataset. See Upload training and testing datasets for examples of how to upload datasets.
Endpoints	Deploy custom speech models to endpoints. You must deploy a custom endpoint to use a custom speech model. See Deploy a model for examples of how to manage deployment endpoints.
Evaluations	Use evaluations to compare the performance of different models. For example, you can compare the performance of a custom speech model trained with a specific dataset to the performance of a base model or a custom model trained with a different dataset. See test recognition quality and test accuracy for examples of how to test and evaluate custom speech models.
Models	Use base models or custom models to transcribe audio files. You can use models with custom speech and batch transcription. For example, you can use a model trained with a specific dataset to transcribe audio files. See Train a model and custom speech model lifecycle for examples of how to train and manage custom speech models.
Web hooks	Use web hooks to receive notifications about creation, processing, completion, and deletion events. You can use web hooks with custom speech and batch transcription. Web hooks apply to datasets, endpoints, evaluations, models, and transcriptions.

Share via

Speech to text REST API

Fast transcription

Batch transcription

Custom speech

Related content

Feedback

Additional resources