A Deep Dive into Attendee's Transcription Engine

Accurate meeting transcription is the backbone of modern AI-powered features like automated summaries, action item detection, and searchable conversation archives. But not all transcription methods are created equal. Depending on your needs for quality, latency, and cost, the right choice can make all the difference. Attendee is designed with this flexibility in mind, offering two distinct paths to get your meeting transcripts.

In this guide, we’ll explore Attendee’s dual transcription engines: high-accuracy **Third-party-based Transcription** and ultra-fast **Closed Caption-based Transcription**. We'll cover how each works, compare them side-by-side, and show you how to choose and configure the best option for your application.

The Two Paths to Transcription

Attendee gives you a choice between two fundamentally different ways of generating transcripts, each with its own strengths.

1. Third-party-based Transcription

This is the high-fidelity option. It works by capturing a separate, per-participant audio stream for everyone in the meeting. When a participant pauses, that audio segment is sent to a specialized AI transcription provider like Deepgram or OpenAI. This method provides perfect speaker identification (diarization) and the highest possible accuracy.

Quality: Generally very high, depending on the chosen provider.
Features: Supports advanced features like word-level timestamps and automatic language detection.
Cost: Incurs costs from the third-party provider, billed to the API key you provide.

2. Closed Caption-based Transcription

This method is built for speed and efficiency. Instead of processing raw audio, the Attendee bot captures the built-in closed captions generated natively by the meeting platform (like Zoom or Google Meet). This approach offers the lowest possible latency because the bot is simply relaying text as soon as it appears on screen.

Latency: Near-instantaneous, perfect for real-time display.
Cost: Completely free, as it uses the platform's existing functionality.
Quality: Generally lower quality than premium third-party models.

Side-by-Side Comparison

Choosing the right method depends on your product's specific requirements. Here's a quick breakdown of the key differences:

Feature	Third-party-based Transcription	Closed Caption-based Transcription
Source	Per-participant raw audio streams	Built-in platform closed captions
Transcription Quality	High (e.g., Deepgram, OpenAI)	Standard (Platform-dependent)
Latency	Higher (due to audio processing)	Lower (Near real-time)
Cost	Incurs provider costs	Free
Speaker Identification	Perfect, based on audio stream	Perfect, based on platform data
Word-level Timestamps	Yes (most providers)	No
Setup	Requires third-party API key	No setup required

Putting It Into Practice: Configuration

Configuring your preferred transcription method is straightforward and done in two places: the Attendee dashboard for your credentials, and the API call itself for per-meeting settings.

Step 1: Add Credentials in the Dashboard

If you're using a third-party provider, you first need to add your API key. Navigate to the Settings → Credentials page in your Attendee dashboard and enter the key for your chosen service (e.g., Deepgram, OpenAI, Gladia, or Assembly AI).

Step 2: Configure Transcription in the API Call

You specify your transcription settings within the transcription_settings object when you create a bot. If you don't provide this object, Attendee will default to closed-caption-based transcription.

Here’s an example of launching a bot to use Deepgram's "nova-2" model for English transcription:

API Request: POST /api/v1/bots

{
  "meeting_url": "https://zoom.us/j/...",
  "bot_name": "My Transcribing Bot",
  "transcription_settings": {
    "provider": "deepgram",
    "deepgram_settings": {
      "language": "en-US",
      "model": "nova-2"
    }
  }
}

Step 3: Receive Real-Time Transcripts with Webhooks

Regardless of the method you choose, you can receive real-time updates by subscribing to the transcript.update webhook. Each time a new utterance is finalized, Attendee will send a payload to your endpoint.

Webhook Payload: transcript.update

{
  "idempotency_key": "evt_...",
  "bot_id": "bot_3hfP0PXEsNinIZmh",
  "trigger": "transcript.update",
  "data": {
    "speaker_name": "Noah Duncan",
    "speaker_uuid": "16778240",
    "timestamp_ms": 1079,
    "duration_ms": 7710,
    "transcription": {
      "transcript": "You can totally record this, buddy. Go for it, man.",
      "words": [ ... ]
    }
  }
}

Choosing the Right Third-Party Provider

If you opt for the higher quality of third-party transcription, Attendee supports several leading providers. Here’s a quick guide to help you choose:

Deepgram: A great all-rounder known for its speed, good quality, and competitive pricing. It supports transcribing multilingual speech within the same audio. Offers free credits for new users.
Gladia: Similar to Deepgram with excellent quality and broad language support, though often more expensive. Also supports multilingual speech. Offers a free monthly transcription allowance.
AssemblyAI: Known for its highly accurate word-level timestamps and competitive pricing. A solid choice if timestamp precision is a priority. Offers free credits for new users.
OpenAI (Whisper): The most affordable option, but often less accurate than specialized providers and lacks word-level timestamps. It can, however, handle multilingual speech well.

Integrate Smarter Transcription Today

With Attendee's flexible transcription engine, you have the power to choose the perfect balance of quality, speed, and cost for your application. Dive into our documentation for detailed API references and provider-specific options.

View Transcription Documentation