Overview
By the end of this tutorial, you’ll be able to transcribe audio from your microphone in TypeScript.Streaming Speech-to-Text is only available for English. See Supported languages.
Before you begin
To complete this tutorial, you need:- Node.js installed. You can check to see if it is installed with
node -v
. - TypeScript installed. You can check to see if it is installed with
tsc -v
- An with credit card set up.
Step 1: Install the SDK
Runnpm init
to create an NPM package, and then install the AssemblyAI package via NPM:
Step 2: Configure the API key
In this step, you’ll create an SDK client and configure it to use your API key.1
Browse to Account, and then click the text under Your API key to copy it.
2
Configure the SDK to use your API key. Create a file called
main.ts
and add the below code, replacing YOUR_API_KEY
with your copied API key.Step 3: Create a streaming service
1
Create a new streaming service from the AssemblyAI client. If you don’t set a sample rate, it defaults to 16 kHz.
The
sample_rate
is the number of audio samples per second, measured in hertz (Hz). Higher sample rates result in higher quality audio, which may lead to better transcripts, but also more data being sent over the network.We recommend the following sample rates:- Minimum quality:
8_000
(8 kHz) - Medium quality:
16_000
(16 kHz) - Maximum quality:
48_000
(48 kHz)
2
Create functions to handle events from the real-time service.
3
Create another function to handle transcripts. The real-time transcriber returns two types of transcripts: partial and final.
- Partial transcripts are returned as the audio is being streamed to AssemblyAI.
- Final transcripts are returned when the service detects a pause in speech.
You can configure the silence threshold for automatic utterance detection and programmatically force the end of an utterance to immediately get a Final transcript.
You can also use the
on("transcript.partial")
, and on("transcript.final")
callbacks to handle partial and final transcripts separately.Step 4: Connect the streaming service
Streaming Speech-to-Text uses WebSockets to stream audio to AssemblyAI. This requires first establishing a connection to the API.Step 5: Record audio from microphone
In this step, you’ll use SoX, a cross-platform audio library, to record audio from your microphone.1
Install SoX on your machine.
2
3
In the
on("open")
callback, create a new microphone stream. The sampleRate
needs to be the same value as the real-time service settings.The
SoxRecording
formats the audio data for you. If you want to stream data from elsewhere, make sure that your audio data is in the following format:- Single channel
- 16-bit signed integer PCM or mu-law encoding
4
Pipe the recording stream to the real-time stream to send the audio for transcription.
If you don’t use streams, you can also send buffers of audio data using
transcriber.sendAudio(buffer)
.Step 6: Disconnect the real-time service
When you are done, disconnect the transcriber to close the connection.tsc main.ts
to compile the JavaScript file, and then run node main.js
to run it.