Need more help?Check out our Knowledge Base for additional questions and answers.
What file types are supported by the AssemblyAI API? Are there recommended formats?
What file types are supported by the AssemblyAI API? Are there recommended formats?
The AssemblyAI API supports most common audio and video file formats. We recommend that you submit your audio in its native format without additional transcoding or file conversion. Transcoding or converting it to another format can sometimes result in a loss of quality, especially if you’re converting compressed formats like
.mp3
. The AssemblyAI API converts all files to 16khz uncompressed audio as part of our transcription pipeline.Note that when you upload a video to our API, the audio will be extracted from it and processed independently, so the list of supported video formats isn’t exhaustive. If you need support for a format that isn’t listed below, please contact our team at support@assemblyai.com.Supported audio file types | Supported video file types |
---|---|
.3ga | .webm |
.8svx | .mts, .m2ts, .ts |
.aac | .mov |
.ac3 | .mp2 |
.aif | .mp4, .m4p (with DRM), .m4v |
.aiff | .mxf |
.alac | |
.amr | |
.ape | |
.au | |
.dss | |
.flac | |
.flv | |
.m4a | |
.m4b | |
.m4p | |
.m4r | |
.mp3 | |
.mpga | |
.ogg, .oga, .mogg | |
.opus | |
.qcp | |
.tta | |
.voc | |
.wav | |
.wma | |
.wv |
What are the API limits on file size or file duration?
What are the API limits on file size or file duration?
Currently, there are two main limitations:
- Maximum file size for
/v2/transcript
endpoint: 5GB - Maximum duration: 10 hours
- Maximum file size for
/v2/upload
endpoint: 2.2GB
How long does transcription take?
How long does transcription take?
The vast majority of files will complete in under 45 seconds, with a Real-Time-Factor (RTF) as low as .008x.To put this into perspective:
- 1h3min (75MB) meeting → 35 seconds
- 3h15min (191MB) podcast → 133 seconds
- 8h21min (464MB) video course → 300 seconds
Can I get timestamps for individual words? How do timestamps work?
Can I get timestamps for individual words? How do timestamps work?
The response for a completed request includes
start
and end
keys. These timestamp values indicate when a given word, phrase, or sentence starts and ends. They are:- Measured in milliseconds
- Accurate to within about 400 milliseconds
How do Custom Vocabulary and Custom Spelling features work?
How do Custom Vocabulary and Custom Spelling features work?
Custom Vocabulary
- Allows submission of words/phrases to boost prediction likelihood
- Helps with under-represented terms in training data
- Controls word spelling/formatting in transcript text
- Works like find-and-replace functionality
How long are files stored and can they be deleted?
How long are files stored and can they be deleted?
File Storage:
- Files are encrypted in transit
- Deleted immediately after transcription completion
- Uploaded but untranscribed files are deleted after 24 hours
- Upload URLs become invalid after deletion
- Transcripts are stored encrypted at rest
- Can be deleted permanently via API request
- List all transcripts with a GET request
Can completed transcripts be deleted?
Can completed transcripts be deleted?
Completed transcripts are stored in our database, encrypted at rest, so that we can serve it to you and your application.To permanently delete the transcription from our database once you’ve retrieved it, you can make a
DELETE
request to the API.Can I get a list of all transcripts I have created?
Can I get a list of all transcripts I have created?
You can retrieve a list of all transcripts that you have created by making a
GET
request to the API.What's the difference between Speech-to-Text tiers?
What's the difference between Speech-to-Text tiers?
Best Tier
- Most robust and accurate offering
- Houses most powerful models
- Broadest range of capabilities
- Ideal for accuracy-critical use cases
- Fast, lightweight offering
- Supports 99 languages
- Cost-effective price point
- Best for extensive language needs
Do you offer discounts and how can I get support?
Do you offer discounts and how can I get support?
Discounts
- Volume discounts available for large-scale usage
- Contact support@assemblyai.com for eligibility
- JSON responses include
error
key with descriptive messages - Email support@assemblyai.com for assistance
- Include transcript IDs and detailed issue description when contacting support
How can I get more information about an error? How do I contact support?
How can I get more information about an error? How do I contact support?
Any time you make a request to the API, you should receive a JSON response. If you don’t receive the expected output, the JSON contains an
error
key with a message value describing the error.You can also can reach out to our support team any time by sending an email to support@assemblyai.com. When reaching out, please include a detailed description of any issues you’re experiencing as well as transcript IDs for affected requests, if possible.How do custom speech recognition models compare with general models?
How do custom speech recognition models compare with general models?
Custom models are rarely more accurate than the best general models due to several factors:
- General models are trained on massive datasets (600,000+ hours of speech data)
- Training data includes diverse audio types:
- Broadcast TV recordings
- Phone calls
- Zoom meetings
- Videos
- Various accents and speakers