FAQ - AssemblyAI Docs

What file types are supported by the AssemblyAI API? Are there recommended formats?

The AssemblyAI API supports most common audio and video file formats. We recommend that you submit your audio in its native format without additional transcoding or file conversion. Transcoding or converting it to another format can sometimes result in a loss of quality, especially if you’re converting compressed formats like .mp3. The AssemblyAI API converts all files to 16khz uncompressed audio as part of our transcription pipeline.Note that when you upload a video to our API, the audio will be extracted from it and processed independently, so the list of supported video formats isn’t exhaustive. If you need support for a format that isn’t listed below, please contact our team at support@assemblyai.com.

Supported audio file types	Supported video file types
.3ga	.webm
.8svx	.mts, .m2ts, .ts
.aac	.mov
.ac3	.mp2
.aif	.mp4, .m4p (with DRM), .m4v
.aiff	.mxf
.alac
.amr
.ape
.au
.dss
.flac
.flv
.m4a
.m4b
.m4p
.m4r
.mp3
.mpga
.ogg, .oga, .mogg
.opus
.qcp
.tta
.voc
.wav
.wma
.wv

What are the API limits on file size or file duration?

How long does transcription take?

Can I get timestamps for individual words? How do timestamps work?

How do Custom Vocabulary and Custom Spelling features work?

How long are files stored and can they be deleted?

Can completed transcripts be deleted?

Can I get a list of all transcripts I have created?

What's the difference between Speech-to-Text tiers?

Do you offer discounts and how can I get support?

How can I get more information about an error? How do I contact support?

How do custom speech recognition models compare with general models?

Concepts