This workflow will enable you to have speaker labels with the speaker’s name in your transcripts:

Before:Speaker A: G'day, bud.Speaker B: How are you? Very good.After:Ben: G'day, bud.Bryce: How are you? Very good.

Before you begin

To complete this tutorial, you need:

For the entire source code of this guide, see Speaker Identification.

Step-by-step instructions

Install the Python SDK:

pip install assemblyai

Import the assemblyai and re packages, and set your API key:

import assemblyai as aaiimport reaai.settings.api_key = "YOUR_API_KEY" 

Define a Transcriber, a TranscriptionConfig with speaker_labels set to True. Then, create a transcript.

transcriber = aai.Transcriber()config = aai.TranscriptionConfig(speaker_labels=True)audio_url = "https://www.listennotes.com/e/p/accd617c94a24787b2e0800f264b7a5e/"transcript = transcriber.transcribe(audio_url, config)

Process the transcript with speaker labels:

text_with_speaker_labels = ""for utt in transcript.utterances:    text_with_speaker_labels += f"Speaker {utt.speaker}:\n{utt.text}\n"

Count the unique speakers, then create a LemurQuestion for each speaker. Lastly, ask LeMUR the questions, specifying text_with_speaker_labels as the input_text.

unique_speakers = set(utterance.speaker for utterance in transcript.utterances)questions = []for speaker in unique_speakers:    questions.append(        aai.LemurQuestion(            question=f"Who is speaker {speaker}?",            answer_format="<First Name> <Last Name (if applicable)>"        )    )result = aai.Lemur().question(    questions,    input_text=text_with_speaker_labels,    context="Your task is to infer the speaker's name from the speaker-labelled transcript")

Map the speaker alphabets to their names from LeMUR:

speaker_mapping = {}for qa_response in result.response:    pattern = r"Who is speaker (\w)\?"    match = re.search(pattern, qa_response.question)    if match and match.group(1) not in speaker_mapping.keys():        speaker_mapping.update({match.group(1): qa_response.answer})

Print the transcript with Speaker names:

for utterance in transcript.utterances:    speaker_name = speaker_mapping[utterance.speaker]    print(f"{speaker_name}: {utterance.text}")

Output:

Ben Kingsley: G'day, folks. Ben Kingsley here in this throwback Tuesday bonus episode, ...Bryce: All right, folks, you're on the property couch, where each week, Ben and I give you the insider's guide to property investing. Hi, mate.Ben Kingsley: G'day, bud.Bryce: How are you? Very good. Hey, we should do a little sound check here, Ben...