How AI Narrates Math Videos (TTS) | QuantumSketch
AI narrates math videos by generating a script from your prompt, synthesizing speech with TTS, and aligning the audio to the animation's timing.
AI narrates a math video by writing a script from your prompt, converting it to speech with text-to-speech (TTS), and aligning that audio to the animation's timing. The narration and visuals come from one shared storyboard, so they stay in sync automatically.
The narration pipeline
Prompt โ storyboard (beats) โ per-beat script + Manim code โ render โ TTS โ align โ merge
The crucial design choice: script and animation are generated together, beat by beat. Beat 3 ("now we shrink h toward zero") has both a Manim Scene and its narration line. They were never separate, so they never drift.
Why shared-storyboard beats post-hoc voiceover
The old way: animate, then record a voiceover, then fight to line them up in a video editor. Every re-edit breaks the sync.
The AI way: each beat knows its own duration. TTS audio for that beat is placed at the beat's start. If speech runs longer than the visual, the renderer inserts a self.wait(); if shorter, the next beat waits. FFmpeg then mixes the audio track onto the video.
What makes narration sound good
| Factor | What to do | |---|---| | Pacing | One idea per beat; don't cram | | Pronunciation | Spell tricky terms phonetically in the prompt | | Tone | Ask for "calm, explanatory, like 3Blue1Brown" | | Length | Keep each beat's line to 1โ2 sentences |
Your prompt controls the script
Because the LLM writes the narration, a clear prompt produces clear narration. "Explain the central limit theorem like I'm 15, then show the histogram converging" yields a friendlier script than a bare equation dump. Learn the technique in Writing Prompts for AI Math Animations.
Get a narrated video in one step
โ quantumsketch.app turns your prompt into a fully narrated MP4 โ animation, script, and TTS handled for you. See the full flow in Manim Without Code.
Written by Shihab Shahriar Antor ยท Shahriar Labs
FAQ
Q.How does the narration stay in sync with the animation?
The narration and the animation are generated from the same storyboard, so each spoken sentence is mapped to a specific animation beat. When the LLM writes the Manim Scene, it also writes the script line for that beat. The renderer measures how long each beat runs, and the text-to-speech audio for that line is timed to start when the beat begins โ stretching a pause with self.wait() if the speech is longer than the visual. This shared-storyboard approach is far more reliable than recording a voiceover separately and trying to line it up in an editor afterward.
Q.Can I use my own voice instead of a synthetic one?
Most AI math-video tools default to a synthetic TTS voice because it requires zero recording effort and re-renders instantly when you change the script. Many also support choosing from multiple voices or languages. If you need your own voice, the common workflow is to export the silent animation and the generated script, then record narration over it in any editor. QuantumSketch focuses on the fully automated path so a finished, narrated MP4 comes out in one step.