Arithmetic All the Way Down

When we learn about Deep Learning today, we often start at a high level of abstraction. We write torch.nn.Linear(784, 128), and magically, a matrix of weights is created. We call .backward(), and gradients flow.

But for many of us, the math remains abstract. We know that it works, but we don't always intuit how it works.

A few years ago, Andrej Karpathy released micrograd, a tiny Autograd engine that implements backpropagation on a scalar level. It stripped away the complex matrix optimizations of PyTorch to show the raw, beating heart of a neural network: simple addition, multiplication, and the chain rule.

I wanted to take that concept a step further. I didn't just want to calculate the gradients; I wanted to see them flow.

I built Micrograd 3D, a fully interactive, in-browser visualizer that renders the computational graph of neural networks—from a single neuron up to a miniature GPT-2—in three-dimensional space.

Why Visualization Matters

In standard Deep Learning frameworks, a "Layer" is a black box. In Micrograd 3D, a layer is an explosion of connectivity.

By visualizing the scalar graph, we turn abstract concepts into tangible physics:

Forward Pass: Watch data propagate from inputs to output.
Backward Pass: See the gradients (visualized as node size or color intensity) flow backward. You can literally see the "vanishing gradient" problem happen in real-time if your network is too deep or your learning rate is too low.
Topology: Understand how complex architectures like Transformers are actually just massive, tangled webs of simple arithmetic operations.

Under the Hood: The Tech Stack

Building a tool that can render thousands of nodes at 60 FPS in the browser required a careful choice of technologies.

1. The Engine (TypeScript)

I ported the original Python logic to TypeScript. This gives us strict typing for the Value objects that make up the graph. I implemented a complete suite of operations (add, mul, tanh, relu, exp) and even built higher-level abstractions like MLP, RNNCell, and MultiHeadAttention from scratch.

2. The Renderer (Three.js & React)

Rendering 2,000 DOM elements for a graph would crash a browser instantly. I used React-Force-Graph backed by Three.js.

To optimize performance, I utilized Instanced Meshes and Shared Materials. Instead of creating a unique geometry for every neuron, the app reuses textures and materials, allowing us to visualize heavier models like the Transformer Block without turning your laptop into a space heater.

3. The "Glass Box" Architecture

The hardest part of visualizing autograd is keeping the graph stable. If you re-create objects every frame, the force-directed physics simulation resets, and the graph "explodes" visually.

I implemented an Object Pooling strategy in the engine. Nodes are cached based on deterministic IDs (e.g., neuron_0_weight_1). This means that as you train the model and values update, the structure remains stable, allowing you to track specific weights over time.

From 1+1 to GPT-2

The project includes a progression of demos to build intuition:

The Basics: We start with x + y. You see three nodes. You see how the gradient of + distributes equally to inputs.
The Neuron: We introduce weights, biases, and activation functions (ReLU or Tanh).
The MLP: We connect neurons together. You can see the complexity explode exponentially.
Language Models:
- Bigram: A simple lookup table.
- RNN: A network with a "memory" (hidden state) that loops back on itself.
- Transformers: The boss level. I implemented Self-Attention using scalar operations. You can actually see the Query, Key, and Value vectors interacting in the graph. It looks like a hairball, but it's our hairball.

The Playground

One of my favorite features is the Custom Script Editor. You aren't limited to the pre-built models. You can write JavaScript directly in the browser to define your own computational graph:

// Live coding in the browser
const a = new Value(2.0, [], 'a');
const b = new Value(-3.0, [], 'b');
const c = a.mul(b, 'c');
const d = c.tanh();
d.backward();

As you type, the 3D graph updates instantly. It's the ultimate sandbox for understanding the chain rule.

Final Thoughts

Building Micrograd 3D taught me that complexity is often just simplicity stacked high. A Transformer model, which seems like alien technology, is really just thousands of additions and multiplications arranged in a very specific pattern.

I hope this tool helps students and engineers build the same intuition.