DEEP DIVES & TECHNICAL GUIDES • APRIL 15, 2024

A Deep Dive into Attendee's Transcription Engine

From real-time closed captions to high-accuracy third-party models, understand the powerful options at your fingertips for meeting transcription.

Accurate meeting transcription is the backbone of modern AI-powered features like automated summaries, action item detection, and searchable conversation archives. But not all transcription methods are created equal. Depending on your needs for quality, latency, and cost, the right choice can make all the difference. Attendee is designed with this flexibility in mind, offering two distinct paths to get your meeting transcripts.

In this guide, we’ll explore Attendee’s dual transcription engines: high-accuracy **Third-party-based Transcription** and ultra-fast **Closed Caption-based Transcription**. We'll cover how each works, compare them side-by-side, and show you how to choose and configure the best option for your application.

The Two Paths to Transcription

Attendee gives you a choice between two fundamentally different ways of generating transcripts, each with its own strengths.

1. Third-party-based Transcription

This is the high-fidelity option. It works by capturing a separate, per-participant audio stream for everyone in the meeting. When a participant pauses, that audio segment is sent to a specialized AI transcription provider like Deepgram or OpenAI. This method provides perfect speaker identification (diarization) and the highest possible accuracy.

2. Closed Caption-based Transcription

This method is built for speed and efficiency. Instead of processing raw audio, the Attendee bot captures the built-in closed captions generated natively by the meeting platform (like Zoom or Google Meet). This approach offers the lowest possible latency because the bot is simply relaying text as soon as it appears on screen.

Side-by-Side Comparison

Choosing the right method depends on your product's specific requirements. Here's a quick breakdown of the key differences:

Feature Third-party-based Transcription Closed Caption-based Transcription
Source Per-participant raw audio streams Built-in platform closed captions
Transcription Quality High (e.g., Deepgram, OpenAI) Standard (Platform-dependent)
Latency Higher (due to audio processing) Lower (Near real-time)
Cost Incurs provider costs Free
Speaker Identification Perfect, based on audio stream Perfect, based on platform data
Word-level Timestamps Yes (most providers) No
Setup Requires third-party API key No setup required

Putting It Into Practice: Configuration

Configuring your preferred transcription method is straightforward and done in two places: the Attendee dashboard for your credentials, and the API call itself for per-meeting settings.

Step 1: Add Credentials in the Dashboard

If you're using a third-party provider, you first need to add your API key. Navigate to the Settings → Credentials page in your Attendee dashboard and enter the key for your chosen service (e.g., Deepgram, OpenAI, Gladia, or Assembly AI).

Step 2: Configure Transcription in the API Call

You specify your transcription settings within the transcription_settings object when you create a bot. If you don't provide this object, Attendee will default to closed-caption-based transcription.

Here’s an example of launching a bot to use Deepgram's "nova-2" model for English transcription:

API Request: POST /api/v1/bots
{
  "meeting_url": "https://zoom.us/j/...",
  "bot_name": "My Transcribing Bot",
  "transcription_settings": {
    "provider": "deepgram",
    "deepgram_settings": {
      "language": "en-US",
      "model": "nova-2"
    }
  }
}

Step 3: Receive Real-Time Transcripts with Webhooks

Regardless of the method you choose, you can receive real-time updates by subscribing to the transcript.update webhook. Each time a new utterance is finalized, Attendee will send a payload to your endpoint.

Webhook Payload: transcript.update
{
  "idempotency_key": "evt_...",
  "bot_id": "bot_3hfP0PXEsNinIZmh",
  "trigger": "transcript.update",
  "data": {
    "speaker_name": "Noah Duncan",
    "speaker_uuid": "16778240",
    "timestamp_ms": 1079,
    "duration_ms": 7710,
    "transcription": {
      "transcript": "You can totally record this, buddy. Go for it, man.",
      "words": [ ... ]
    }
  }
}

Choosing the Right Third-Party Provider

If you opt for the higher quality of third-party transcription, Attendee supports several leading providers. Here’s a quick guide to help you choose:

Integrate Smarter Transcription Today

With Attendee's flexible transcription engine, you have the power to choose the perfect balance of quality, speed, and cost for your application. Dive into our documentation for detailed API references and provider-specific options.

View Transcription Documentation