Accurate meeting transcription is the backbone of modern AI-powered features like automated summaries, action item detection, and searchable conversation archives. But not all transcription methods are created equal. Depending on your needs for quality, latency, and cost, the right choice can make all the difference. Attendee is designed with this flexibility in mind, offering two distinct paths to get your meeting transcripts.
In this guide, we’ll explore Attendee’s dual transcription engines: high-accuracy **Third-party-based Transcription** and ultra-fast **Closed Caption-based Transcription**. We'll cover how each works, compare them side-by-side, and show you how to choose and configure the best option for your application.
The Two Paths to Transcription
Attendee gives you a choice between two fundamentally different ways of generating transcripts, each with its own strengths.
1. Third-party-based Transcription
This is the high-fidelity option. It works by capturing a separate, per-participant audio stream for everyone in the meeting. When a participant pauses, that audio segment is sent to a specialized AI transcription provider like Deepgram or OpenAI. This method provides perfect speaker identification (diarization) and the highest possible accuracy.
- Quality: Generally very high, depending on the chosen provider.
- Features: Supports advanced features like word-level timestamps and automatic language detection.
- Cost: Incurs costs from the third-party provider, billed to the API key you provide.
2. Closed Caption-based Transcription
This method is built for speed and efficiency. Instead of processing raw audio, the Attendee bot captures the built-in closed captions generated natively by the meeting platform (like Zoom or Google Meet). This approach offers the lowest possible latency because the bot is simply relaying text as soon as it appears on screen.
- Latency: Near-instantaneous, perfect for real-time display.
- Cost: Completely free, as it uses the platform's existing functionality.
- Quality: Generally lower quality than premium third-party models.
Side-by-Side Comparison
Choosing the right method depends on your product's specific requirements. Here's a quick breakdown of the key differences:
Feature | Third-party-based Transcription | Closed Caption-based Transcription |
---|---|---|
Source | Per-participant raw audio streams | Built-in platform closed captions |
Transcription Quality | High (e.g., Deepgram, OpenAI) | Standard (Platform-dependent) |
Latency | Higher (due to audio processing) | Lower (Near real-time) |
Cost | Incurs provider costs | Free |
Speaker Identification | Perfect, based on audio stream | Perfect, based on platform data |
Word-level Timestamps | Yes (most providers) | No |
Setup | Requires third-party API key | No setup required |
Putting It Into Practice: Configuration
Configuring your preferred transcription method is straightforward and done in two places: the Attendee dashboard for your credentials, and the API call itself for per-meeting settings.
Step 1: Add Credentials in the Dashboard
If you're using a third-party provider, you first need to add your API key. Navigate to the Settings → Credentials page in your Attendee dashboard and enter the key for your chosen service (e.g., Deepgram, OpenAI, Gladia, or Assembly AI).
Step 2: Configure Transcription in the API Call
You specify your transcription settings within the transcription_settings
object when you create a bot. If you don't provide this object, Attendee will default to closed-caption-based transcription.
Here’s an example of launching a bot to use Deepgram's "nova-2" model for English transcription:
{
"meeting_url": "https://zoom.us/j/...",
"bot_name": "My Transcribing Bot",
"transcription_settings": {
"provider": "deepgram",
"deepgram_settings": {
"language": "en-US",
"model": "nova-2"
}
}
}
Step 3: Receive Real-Time Transcripts with Webhooks
Regardless of the method you choose, you can receive real-time updates by subscribing to the transcript.update
webhook. Each time a new utterance is finalized, Attendee will send a payload to your endpoint.
{
"idempotency_key": "evt_...",
"bot_id": "bot_3hfP0PXEsNinIZmh",
"trigger": "transcript.update",
"data": {
"speaker_name": "Noah Duncan",
"speaker_uuid": "16778240",
"timestamp_ms": 1079,
"duration_ms": 7710,
"transcription": {
"transcript": "You can totally record this, buddy. Go for it, man.",
"words": [ ... ]
}
}
}
Choosing the Right Third-Party Provider
If you opt for the higher quality of third-party transcription, Attendee supports several leading providers. Here’s a quick guide to help you choose:
- Deepgram: A great all-rounder known for its speed, good quality, and competitive pricing. It supports transcribing multilingual speech within the same audio. Offers free credits for new users.
- Gladia: Similar to Deepgram with excellent quality and broad language support, though often more expensive. Also supports multilingual speech. Offers a free monthly transcription allowance.
- AssemblyAI: Known for its highly accurate word-level timestamps and competitive pricing. A solid choice if timestamp precision is a priority. Offers free credits for new users.
- OpenAI (Whisper): The most affordable option, but often less accurate than specialized providers and lacks word-level timestamps. It can, however, handle multilingual speech well.
Integrate Smarter Transcription Today
With Attendee's flexible transcription engine, you have the power to choose the perfect balance of quality, speed, and cost for your application. Dive into our documentation for detailed API references and provider-specific options.
View Transcription Documentation