This workflow is designed to be cost-effective, slicing the first 60 seconds of audio and running it through Nano ALD, which detects 99 languages, at a cost of $0.002 per transcript for this language detection workflow (not including the total transcription cost).

Performing ALD with this workflow has a few benefits:

  • Cost-effective language detection
  • Ability to detect 99 languages
  • Ability to use Nano as fallback if the language is not supported in Best
  • Ability to enable Audio Intelligence models if the language is supported
  • Ability to use LeMUR with LLM prompts in Spanish for Spanish audio

Before you begin

To complete this tutorial, you need:

The entire source code of this guide can be viewed here.

Step-by-step instructions

Install the Python SDK:

pip install assemblyai

Import the assemblyai package and set your API key:

import assemblyai as aaiaai.settings.api_key = "YOUR_API_KEY"

Create a set with all supported languages for Best. You can find them in our documentation here.

supported_languages_for_best = {    "en",    "es",    "fr",    "de",    "it",    "pt",    "nl",    "hi",    "ja",    "zh",    "fi",    "ko",    "pl",    "ru",    "tr",    "uk",    "vi",}

Define a Transcriber. Note that here we don’t pass in a global TranscriptionConfig, but later apply different ones during the transcribe() call.

transcriber = aai.Transcriber()

Define two helper functions:

  • detect_language() performs language detection on the first 60 seconds of the audio using Nano and returns the language code.
  • transcribe_file() performs the transcription using Best or Nano depending on the identified language.
def detect_language(audio_url):    config = aai.TranscriptionConfig(        audio_end_at=60000,  # first 60 seconds (in milliseconds)        language_detection=True,        speech_model=aai.SpeechModel.nano,    )    transcript = transcriber.transcribe(audio_url, config=config)    return transcript.json_response["language_code"]def transcribe_file(audio_url, language_code):    config = aai.TranscriptionConfig(        language_code=language_code,        speech_model=(            aai.SpeechModel.best            if language_code in supported_languages_for_best            else aai.SpeechModel.nano        ),    )    transcript = transcriber.transcribe(audio_url, config=config)    return transcript

Test the code with different audio files. Apply both helper functions sequentially to each file to first identify the language and then transcribe the file.

audio_urls = [    "https://storage.googleapis.com/aai-web-samples/public_benchmarking_portugese.mp3",    "https://storage.googleapis.com/aai-web-samples/public_benchmarking_spanish.mp3",    "https://storage.googleapis.com/aai-web-samples/slovenian_luka_doncic_interview.mp3",    "https://storage.googleapis.com/aai-web-samples/5_common_sports_injuries.mp3",]for audio_url in audio_urls:    language_code = detect_language(audio_url)    print("Identified language:", language_code)    transcript = transcribe_file(audio_url, language_code)    print("Transcript:", transcript.text[:100], "...")

Output:

Identified language: ptTranscript: e aí Olá pessoal, sejam bem-vindos a mais um vídeo e hoje eu vou ensinar-vos como fazer esta espada  ...Identified language: esTranscript: Precisamente sobre este caso, el diario estadounidense New York Times reveló este sábado un conjunto ...Identified language: slTranscript: Ni lepška, kaj videl tega otroka v mrekoj svojga okolja, da mu je uspil in to v takimi miri, da pač  ...Identified language: enTranscript: Runner's knee runner's knee is a condition characterized by pain behind or around the kneecap. It is ...