How to Visualize Gradient Descent | QuantumSketch
Visualize gradient descent as a ball rolling downhill on a loss surface, taking steps proportional to the slope until it settles in a minimum. Here's the animation.
Visualize gradient descent as a ball rolling downhill on a loss surface, taking steps proportional to the slope until it settles in a minimum. Steep slope → big step; flat bottom → tiny steps → it stops.
The core idea
Gradient descent minimizes a loss function by repeatedly stepping downhill:
∇L is the slope (gradient), η is the learning rate (step size). Subtracting the gradient moves you toward lower loss.
The animation, beat by beat
- Draw the loss curve — a parabola (1D) or bowl (3D surface).
- Drop the ball at a random start.
- Show the slope as a tangent arrow at the ball.
- Step downhill by −η·slope; repeat. Steps shrink as the slope flattens.
- Settle in the minimum.
The learning-rate lesson
| Learning rate | Behavior | |---|---| | Too small | Slow crawl | | Good | Smooth, fast convergence | | Too large | Overshoots, oscillates or diverges |
Animating all three on the same curve makes the trade-off unforgettable — this is the intuition behind training every neural network. See Visualize a Neural Network.
Manim building blocks
axes.plot for the loss curve, a Dot for the ball, always_redraw for the tangent arrow, and a ValueTracker stepping the parameter. For 3D, Surface + ThreeDScene.
The prompt
"Show gradient descent as a ball on the loss curve L(θ)=θ², stepping downhill by −η·slope, comparing small, good, and too-large learning rates."
→ Render it at quantumsketch.app. Related: Animate the Derivative.
Written by Shihab Shahriar Antor · Shahriar Labs
FAQ
Q.What's the simplest way to picture gradient descent?
Picture a ball rolling downhill on a curved surface. The surface is the loss function — height is the error for a given set of parameters — and gradient descent rolls the ball toward the lowest point. At each step the algorithm measures the slope (the gradient) and moves a step downhill proportional to that slope times the learning rate. Steep slopes give big steps; near the bottom the slope flattens and steps shrink, so the ball settles into a minimum. Animating the ball's path on a 2D parabola or a 3D bowl makes both the steps and the role of the learning rate visible.
Q.How do I show what the learning rate does in gradient descent?
Animate the same descent at different learning rates side by side. With a small rate, the ball takes tiny careful steps and converges slowly but smoothly. With a good rate, it descends efficiently. With too large a rate, it overshoots the minimum and bounces back and forth or even diverges up the walls. Showing these three behaviors on the same loss curve makes the learning-rate trade-off obvious. Describe it as a prompt and QuantumSketch renders all three as a narrated Manim comparison.