How does the narration stay in sync with the animation?

The narration and the animation are generated from the same storyboard, so each spoken sentence is mapped to a specific animation beat. When the LLM writes the Manim Scene, it also writes the script line for that beat. The renderer measures how long each beat runs, and the text-to-speech audio for that line is timed to start when the beat begins — stretching a pause with self.wait() if the speech is longer than the visual. This shared-storyboard approach is far more reliable than recording a voiceover separately and trying to line it up in an editor afterward.

Can I use my own voice instead of a synthetic one?

Most AI math-video tools default to a synthetic TTS voice because it requires zero recording effort and re-renders instantly when you change the script. Many also support choosing from multiple voices or languages. If you need your own voice, the common workflow is to export the silent animation and the generated script, then record narration over it in any editor. QuantumSketch focuses on the fully automated path so a finished, narrated MP4 comes out in one step.

How AI Narrates Math Videos (TTS) | QuantumSketch

AI narrates a math video by writing a script from your prompt, converting it to speech with text-to-speech (TTS), and aligning that audio to the animation's timing. The narration and visuals come from one shared storyboard, so they stay in sync automatically.

The narration pipeline

Prompt → storyboard (beats) → per-beat script + Manim code → render → TTS → align → merge

The crucial design choice: script and animation are generated together, beat by beat. Beat 3 ("now we shrink h toward zero") has both a Manim Scene and its narration line. They were never separate, so they never drift.

Why shared-storyboard beats post-hoc voiceover

The old way: animate, then record a voiceover, then fight to line them up in a video editor. Every re-edit breaks the sync.

The AI way: each beat knows its own duration. TTS audio for that beat is placed at the beat's start. If speech runs longer than the visual, the renderer inserts a self.wait(); if shorter, the next beat waits. FFmpeg then mixes the audio track onto the video.

What makes narration sound good

| Factor | What to do | |---|---| | Pacing | One idea per beat; don't cram | | Pronunciation | Spell tricky terms phonetically in the prompt | | Tone | Ask for "calm, explanatory, like 3Blue1Brown" | | Length | Keep each beat's line to 1–2 sentences |

Your prompt controls the script

Because the LLM writes the narration, a clear prompt produces clear narration. "Explain the central limit theorem like I'm 15, then show the histogram converging" yields a friendlier script than a bare equation dump. Learn the technique in Writing Prompts for AI Math Animations.

Get a narrated video in one step

→ quantumsketch.app turns your prompt into a fully narrated MP4 — animation, script, and TTS handled for you. See the full flow in Manim Without Code.

Written by Shihab Shahriar Antor · Shahriar Labs

The narration pipeline

Why shared-storyboard beats post-hoc voiceover

What makes narration sound good

Your prompt controls the script

Get a narrated video in one step

FAQ