Skip to main content
Need more help?Check out our Knowledge Base for additional questions and answers.
Currently, there are two main limitations:
  • Maximum file size for /v2/transcript endpoint: 5GB
  • Maximum duration: 10 hours
  • Maximum file size for /v2/upload endpoint: 2.2GB
The vast majority of files will complete in under 45 seconds, with a Real-Time-Factor (RTF) as low as .008x.To put this into perspective:
  • 1h3min (75MB) meeting → 35 seconds
  • 3h15min (191MB) podcast → 133 seconds
  • 8h21min (464MB) video course → 300 seconds
Files submitted for Streaming Speech-to-Text receive a response within a few hundred milliseconds.
The response for a completed request includes start and end keys. These timestamp values indicate when a given word, phrase, or sentence starts and ends. They are:
  • Measured in milliseconds
  • Accurate to within about 400 milliseconds
Custom Vocabulary
  • Allows submission of words/phrases to boost prediction likelihood
  • Helps with under-represented terms in training data
Custom Spelling
  • Controls word spelling/formatting in transcript text
  • Works like find-and-replace functionality
File Storage:
  • Files are encrypted in transit
  • Deleted immediately after transcription completion
  • Uploaded but untranscribed files are deleted after 24 hours
  • Upload URLs become invalid after deletion
Transcript Management:
Completed transcripts are stored in our database, encrypted at rest, so that we can serve it to you and your application.To permanently delete the transcription from our database once you’ve retrieved it, you can make a DELETE request to the API.
You can retrieve a list of all transcripts that you have created by making a GET request to the API.
Best Tier
  • Most robust and accurate offering
  • Houses most powerful models
  • Broadest range of capabilities
  • Ideal for accuracy-critical use cases
Nano Tier
  • Fast, lightweight offering
  • Supports 99 languages
  • Cost-effective price point
  • Best for extensive language needs
DiscountsSupport
  • JSON responses include error key with descriptive messages
  • Email support@assemblyai.com for assistance
  • Include transcript IDs and detailed issue description when contacting support
Any time you make a request to the API, you should receive a JSON response. If you don’t receive the expected output, the JSON contains an error key with a message value describing the error.You can also can reach out to our support team any time by sending an email to support@assemblyai.com. When reaching out, please include a detailed description of any issues you’re experiencing as well as transcript IDs for affected requests, if possible.
Custom models are rarely more accurate than the best general models due to several factors:
  • General models are trained on massive datasets (600,000+ hours of speech data)
  • Training data includes diverse audio types:
    • Broadcast TV recordings
    • Phone calls
    • Zoom meetings
    • Videos
    • Various accents and speakers
Custom models are mainly beneficial for audio with unique characteristics unseen by general models, though these cases are rare due to the comprehensive training of general models.Learn more about this topic on the AssemblyAI blog.