To apply LLMs to speech, you first need to transcribe the audio to text, which is what the AssemblyAI integration for LangChain helps you with.
ASSEMBLYAI_API_KEY
. You can get a free AssemblyAI API key from the AssemblyAI dashboard.
AssemblyAIAudioTranscriptLoader
from langchain.document_loaders
.
file_path
argument of the AssemblyAIAudioTranscriptLoader
.load
method to get the transcript as LangChain documents.load
method returns an array of documents, but by default, thereβs only one document in the array with the full transcript.
The transcribed text is available in the page_content
attribute:
metadata
contains the full JSON response with more meta information:
transcript_format
argument to load the transcript in different formats.
Depending on the format, load_data()
returns either one or more documents. These are the different TranscriptFormat
options:
TEXT
: One document with the transcription textSENTENCES
: Multiple documents, splits the transcription by each sentencePARAGRAPHS
: Multiple documents, splits the transcription by each paragraphSUBTITLES_SRT
: One document with the transcript exported in SRT subtitles formatSUBTITLES_VTT
: One document with the transcript exported in VTT subtitles formatconfig
argument to use different transcript features and audio intelligence models. Hereβs an example of using the config
argument to enable speaker labels, auto chapters, and entity detection:
ASSEMBLYAI_API_KEY
environment variable, you can also pass it as the api_key
argument.