How LLMs Write Manim Code | QuantumSketch
LLMs write Manim code by turning a prompt into a storyboard of beats, then emitting Scene classes for each โ validated by actually running the code, not guessing.
LLMs write Manim code by turning your prompt into a storyboard of beats, then emitting a Manim Scene class for each โ and crucially, the code is run, not just generated. Execution makes the output real and deterministic.
The pipeline
Prompt โ LLM storyboard โ Manim code per beat โ execute/render โ TTS โ FFmpeg merge
- Plan. The model breaks the concept into 4โ6 narrative beats.
- Write code. For each beat it picks the right Mobjects (
Axes,MathTex) and animations (Transform,FadeIn). - Execute. The Python runs in a sandbox with Manim, LaTeX, and FFmpeg installed โ video chunks.
- Narrate. TTS voices the per-beat script โ see How AI Narrates Math Videos.
- Merge. FFmpeg stitches chunks + audio into one MP4.
Why "execute, don't guess" matters
Asking a model to describe a video gives you hallucinated frames. Asking it to write code that runs gives you proof: if the MP4 renders, the code worked. A bad API call fails the render immediately, so the pipeline can repair and retry.
| Approach | Reliability | |---|---| | Model describes video | Low โ hallucination | | Model writes + runs Manim | High โ verified by render |
What the model is good and bad at
- Good: picking the right Manim primitives, sequencing beats, writing valid
MathTex. - Needs iteration: layout and pacing โ refined via the prompt, not luck.
See it in action
QuantumSketch runs this exact loop. The core is open-source โ read Inside the manim-coding-skill.
Written by Shihab Shahriar Antor ยท Shahriar Labs
FAQ
Q.How does an AI turn a text prompt into a Manim animation?
It works in stages. First the language model reads your prompt and plans a storyboard โ breaking the concept into a handful of narrative beats. Then, for each beat, it writes a Manim Scene class using the right Mobjects and animations (Axes, MathTex, Transform, FadeIn). That generated Python is executed in a sandbox with Manim, LaTeX, and FFmpeg installed, which renders each beat to a video chunk. A text-to-speech step voices the per-beat script, and FFmpeg merges everything into a final MP4. The key is that the code is actually run, so the output is real, deterministic Manim โ not a guess at what the video should look like.
Q.What stops the AI from generating Manim code that doesn't run?
Execution and iteration. Because the generated code is run inside a real Manim environment, a syntax error or bad API call surfaces immediately as a failed render rather than silently shipping. Good pipelines catch that error and regenerate or repair the code until it runs and produces frames. This execute-and-verify loop is why a Manim-generating tool is more reliable than asking a model to describe a video: the rendered MP4 is proof the code worked. The remaining failure mode is layout, not correctness, and that's refined through the prompt.