Topic Detection - AssemblyAI Docs

Quickstart

Enable Topic Detection by setting iab_categories to true in the transcription config.

import assemblyai as aai

aai.settings.api_key = "YOUR_API_KEY"

# audio_file = "./local_file.mp3"
audio_file = "https://assembly.ai/wildfires.mp3"

config = aai.TranscriptionConfig(iab_categories=True)

transcript = aai.Transcriber().transcribe(audio_file, config)

# Get the parts of the transcript that were tagged with topics
for result in transcript.iab_categories.results:
    print(result.text)
    print(f"Timestamp: {result.timestamp.start} - {result.timestamp.end}")
    for label in result.labels:
        print(f"{label.label} ({label.relevance})")

# Get a summary of all topics in the transcript
for topic, relevance in transcript.iab_categories.summary.items():
    print(f"Audio is {relevance * 100}% relevant to {topic}")

Example output

Smoke from hundreds of wildfires in Canada is triggering air quality alerts throughout the US. Skylines...
Timestamp: 250 - 28920
Home&Garden>IndoorEnvironmentalQuality (0.9881)
NewsAndPolitics>Weather (0.5561)
MedicalHealth>DiseasesAndConditions>LungAndRespiratoryHealth (0.0042)
...
Audio is 100.0% relevant to NewsAndPolitics>Weather
Audio is 93.78% relevant to Home&Garden>IndoorEnvironmentalQuality
...

API reference

Request

curl https://api.assemblyai.com/v2/transcript \
--header "Authorization: YOUR_API_KEY" \
--header "Content-Type: application/json" \
--data '{
  "audio_url": "YOUR_AUDIO_URL",
  "iab_categories": true
}'

Key	Type	Description
`iab_categories`	boolean	Enable Topic Detection.

Response

{
  iab_categories:true,
  iab_categories_result:{
  status:"success",
  results:[...],
  summary:{...}
  }
}

Key	Type	Description
`iab_categories_result`	object	The result of the Topic Detection model.
`iab_categories_result.status`	string	Is either `success`, or `unavailable` in the rare case that the Content Moderation model failed.
`iab_categories_result.results`	array	An array of the Topic Detection results.
`iab_categories_result.results[i].text`	string	The text in the transcript in which the i-th instance of a detected topic occurs.
`iab_categories_result.results[i].labels[j].relevance`	number	How relevant the j-th detected topic is in the i-th instance of a detected topic.
`iab_categories_result.results[i].labels[j].label`	string	The IAB taxonomical label for the j-th label of the i-th instance of a detected topic, where `>` denotes supertopic/subtopic relationship.
`iab_categories_result.results[i].timestamp.start`	number	The starting time in the audio file at which the i-th detected topic instance is discussed.
`iab_categories_result.results[i].timestamp.end`	number	The ending time in the audio file at which the i-th detected topic instance is discussed.
`iab_categories_result.summary`	object	Summary where each property is a detected topic.
`iab_categories_result.summary.topic`	number	The overall relevance of topic to the entire audio file.

The response also includes the request parameters used to generate the transcript.

Frequently asked questions

How does the Topic Detection model handle misspelled or unrecognized words?

Can I use the Topic Detection model to identify entities that aren't part of the IAB Taxonomy?

Why am I not getting any topic predictions for my audio file?

Why am I getting inaccurate or irrelevant topic predictions for my audio file?

Is AssemblyAI associated with IAB?

Audio Intelligence

​Quickstart

​Example output

​API reference​

​Request​

​Response