Transcribe an Audio File

Universal-2 is live Dive into our research paper to see how we’re redefining speech AI accuracy. Read more here.

Overview

By the end of this tutorial, you’ll be able to:

Transcribe an audio file.
Enable Speaker Diarization to detect speakers in an audio file.

Here’s the full sample code for what you’ll build in this tutorial:

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"

transcriber = aai.Transcriber()

# You can use a local filepath:
# audio_file = "./example.mp3"

# Or use a publicly-accessible URL:
audio_file = (
    "https://assembly.ai/sports_injuries.mp3"
)

config = aai.TranscriptionConfig(speaker_labels=True)

transcript = transcriber.transcribe(audio_file, config)

if transcript.status == aai.TranscriptStatus.error:
    print(f"Transcription failed: {transcript.error}")
    exit(1)

print(transcript.text)

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Before you begin

To complete this tutorial, you need:

Python, TypeScript, Go, Java, .NET, or Ruby installed.
A free AssemblyAI account.

Step 1: Install the SDK

Install the package via pip:

pip install assemblyai

Step 2: Configure the SDK

In this step, you ‘ll create an SDK client and configure it to use your API key.

Browse to Account, and then click the text under Your API key to copy it.

Create a new Transcriber and configure it to use your API key. Replace YOUR_API_KEY with your copied API key.

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"

transcriber = aai.Transcriber()

Step 3: Submit audio for transcription

In this step, you’ll submit the audio file for transcription and wait until it’s completes. The time it takes to process an audio file depends on its duration and the enabled models. Most transcriptions complete within 45 seconds.

Specify a URL to the audio you want to transcribe. The URL needs to be accessible from AssemblyAI’s servers. For a list of supported formats, see FAQ.

audio_file = "https://assembly.ai/sports_injuries.mp3"

Local audio filesIf you want to use a local file, you can also specify a local path, for example:

audio_file = "./example.mp3"

YouTubeYouTube URLs are not supported. If you want to transcribe a YouTube video, you need to download the audio first.

To generate the transcript, pass the audio URL to transcribe().This may take a minute while we’re processing the audio.

transcript = transcriber.transcribe(audio_file)

Select the speech modelYou can select the class of models to use in order to make cost-performance tradeoffs best suited for your application. See Select the speech model.

If the transcription failed, the status of the transcription will be set to error. To see why it failed you can print the value of error.

if transcript.error:
    print(transcript.error)
    exit(1)

Print the complete transcript.

print(transcript.text)

Run the application and wait for it to finish.You’ve successfully transcribed your first audio file. You can see all submitted transcription jobs in the Processing queue.

Step 4: Enable additional AI models

You can extract even more insights from the audio by enabling any of our AI models using transcription options. In this step, you’ll enable the Speaker diarization model to detect who said what.

Create a TranscriptionConfig with speaker_labels set to True, and then pass it as the second argument to transcribe().

config = aai.TranscriptionConfig(speaker_labels=True)
transcript = transcriber.transcribe(audio_file, config)

In addition to the full transcript, you now have access to utterances from each speaker.

for utterance in transcript.utterances:
    print(f"Speaker {utterance.speaker}: {utterance.text}")

Many of the properties in the transcript object only become available after you enable the corresponding model. For more information, see the models under Speech-to-Text and Audio Intelligence.

Next steps

In this tutorial, you’ve learned how to generate a transcript for an audio file and how to extract speaker information by enabling the Speaker diarization model. Want to learn more?

For more ways to analyze your audio data, explore our Audio Intelligence models.
If you want to transcribe audio in real-time, see Transcribe streaming audio from a microphone.
To search, summarize, and ask questions on your transcripts with LLMs, see LeMUR.

Need some help?

If you get stuck, or have any other questions, we’d love to help you out. Ask our support team in our Discord server.

Introduction

Getting Started

Overview

Before you begin

Step 1: Install the SDK

Step 2: Configure the SDK

Step 3: Submit audio for transcription

Step 4: Enable additional AI models

Next steps

Need some help?

Introduction

Getting Started

​Overview

​Before you begin

​Step 1: Install the SDK

​Step 2: Configure the SDK

​Step 3: Submit audio for transcription

​Step 4: Enable additional AI models

​Next steps

​Need some help?

Overview

Before you begin

Step 1: Install the SDK

Step 2: Configure the SDK

Step 3: Submit audio for transcription

Step 4: Enable additional AI models

Next steps

Need some help?