What's the simplest way to picture gradient descent?

Picture a ball rolling downhill on a curved surface. The surface is the loss function — height is the error for a given set of parameters — and gradient descent rolls the ball toward the lowest point. At each step the algorithm measures the slope (the gradient) and moves a step downhill proportional to that slope times the learning rate. Steep slopes give big steps; near the bottom the slope flattens and steps shrink, so the ball settles into a minimum. Animating the ball's path on a 2D parabola or a 3D bowl makes both the steps and the role of the learning rate visible.

How do I show what the learning rate does in gradient descent?

Animate the same descent at different learning rates side by side. With a small rate, the ball takes tiny careful steps and converges slowly but smoothly. With a good rate, it descends efficiently. With too large a rate, it overshoots the minimum and bounces back and forth or even diverges up the walls. Showing these three behaviors on the same loss curve makes the learning-rate trade-off obvious. Describe it as a prompt and QuantumSketch renders all three as a narrated Manim comparison.

How to Visualize Gradient Descent | QuantumSketch

Visualize gradient descent as a ball rolling downhill on a loss surface, taking steps proportional to the slope until it settles in a minimum. Steep slope → big step; flat bottom → tiny steps → it stops.

The core idea

Gradient descent minimizes a loss function by repeatedly stepping downhill:

$\theta \leftarrow \theta - \eta \nabla L(\theta)$

∇L is the slope (gradient), η is the learning rate (step size). Subtracting the gradient moves you toward lower loss.

The animation, beat by beat

Draw the loss curve — a parabola (1D) or bowl (3D surface).
Drop the ball at a random start.
Show the slope as a tangent arrow at the ball.
Step downhill by −η·slope; repeat. Steps shrink as the slope flattens.
Settle in the minimum.

The learning-rate lesson

| Learning rate | Behavior | |---|---| | Too small | Slow crawl | | Good | Smooth, fast convergence | | Too large | Overshoots, oscillates or diverges |

Animating all three on the same curve makes the trade-off unforgettable — this is the intuition behind training every neural network. See Visualize a Neural Network.

Manim building blocks

axes.plot for the loss curve, a Dot for the ball, always_redraw for the tangent arrow, and a ValueTracker stepping the parameter. For 3D, Surface + ThreeDScene.

The prompt

"Show gradient descent as a ball on the loss curve L(θ)=θ², stepping downhill by −η·slope, comparing small, good, and too-large learning rates."

→ Render it at quantumsketch.app. Related: Animate the Derivative.

Written by Shihab Shahriar Antor · Shahriar Labs

The core idea

The animation, beat by beat

The learning-rate lesson

Manim building blocks

The prompt

FAQ