The Speech Recognition model enables you to transcribe spoken words into written text and is the foundation of all AssemblyAI products.
Status | Description |
---|---|
processing | The audio file is being processed. |
queued | The audio file is waiting to be processed. |
completed | The transcription has completed successfully. |
error | An error occurred while processing the audio file. |
error
, and the transcript includes an error
property explaining what went wrong.
Name | SDK Parameter | Description |
---|---|---|
Best (default) | aai.SpeechModel.best | Use our most accurate and capable models with the best results, recommended for most use cases. |
Nano | aai.SpeechModel.nano | Use our less accurate, but much lower cost models to produce your results. |
speech_model
in the transcription config:
punctuate
and format_text
to False
in the transcription config.
language_detection
to True
in the transcription config.
language_code
key to specify the language of the speech in your audio file.
set_custom_spelling()
on the transcription config. Each key-value pair specifies a mapping from a word or phrase to a new spelling or format. The key specifies the new spelling or format, and the corresponding value is the word or phrase you want to replace.
to
key is case-sensitive, but the value in the from
key isn’t. Additionally, the to
key must only contain one word, while the from
key can contain multiple words.word_boost
parameter in the transcription config.
You can also control how much weight to apply to each keyword or phrase. Include boost_param
in the transcription config with a value of low
, default
, or high
.
iphone seven
instead of iphone 7
.multichannel
to true
in your transcription config.
audio_channels
property with the number of different channels, and an additional utterances
property, containing a list of turn-by-turn utterances.Each utterance contains channel information, starting at 1.Additionally, each word in the words
array contains the channel identifier.chars_per_caption
parameter.
disfluencies
to true
in the transcription config.
filter_profanity
to true
in your transcription config.
Any profanity in the returned text
will be replaced with asterisks.
audio_start_from
and the audio_end_at
parameters in your transcription config.
speech_threshold
parameter. You can pass any value between 0 and 1.
If the percentage of speech in the audio file is below the provided threshold, the value of text
is None
and the response contains an error
message:
How can I make certain words more likely to be transcribed?
word_boost
parameter. Any term included has its likelihood of being transcribed boosted.Can I customize how words are spelled by the model?
Why am I receiving a 400 Bad Request error when making an API request?