Overview
By the end of this tutorial, you’ll be able to transcribe audio from your microphone in Python.Streaming Speech-to-Text is only available for English. See Supported languages.
Before you begin
To complete this tutorial, you need:- Python installed.
- An AssemblyAI account with a credit card set up.
Step 1: Install dependencies
PortAudio is a cross-platform library for streaming audio. The Python SDK uses PortAudio to stream audio from your microphone.
Step 2: Configure the API key
In this step, you ‘ll create an SDK client and configure it to use your API key.Browse to Account, and then click Copy API key under Copy your API key.
Step 3: Create a transcriber
In this step, you’ll set up a real-time transcriber object and callback functions that handle the different events.Create another function to handle transcripts. The real-time transcriber returns two types of transcripts: RealtimeFinalTranscript and RealtimePartialTranscript.
- Partial transcripts are returned as the audio is being streamed to AssemblyAI.
- Final transcripts are returned after a moment of silence.
Create a new
RealtimeTranscriber using the function you created.The
sample_rate is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network.We recommend the following sample rates:- Minimum quality:
8_000(8 kHz) - Medium quality:
16_000(16 kHz) - Maximum quality:
48_000(48 kHz)
Step 4: Connect the transcriber
Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.on_open function you created earlier will be called when the connection has been established.
Step 5: Record audio from microphone
In this step, you’ll configure your Python app to record audio from your microphone. You’ll use a helper class from the Python SDK that make this easier.Open a microphone stream. The
sample_rate needs to be the same value as the one you passed to RealtimeTranscriber.MicrophoneStream formats the audio data for you. If you want to stream data from elsewhere, make sure that your audio data is in the following format:- Single channel
- 16-bit signed integer PCM or mu-law encoding
Step 6: Close the connection
Finally, close the connection when you’re done to disconnect the transcriber.on_close function you created earlier will be called when the connection has been closed.