One memory access costs 1,000 multiply-accumulates. That single fact shapes every chip, kernel, and serving system in production. These notes trace the connections — from transistors to tokens.
sections
- cornered chips — custom silicon for workloads GPUs can’t touch
- hardware — where all 92 billion transistors actually live
- inference — your $30K GPU spends most of its time waiting
- systems — the 10-50x gap between a naive kernel and an expert one
- semiconductors — when leading-edge silicon costs more per transistor, not less
- context — every AI chip bet has been made before
start here
- systolic arrays — the computational primitive behind every AI accelerator
- breaking down blackwell — NVIDIA’s B200, from transistors to inference cost
- DIY TPU v1 — reverse-engineering Google’s first AI chip from scratch
tools
- interactive visualizations — charts, calculators, and die maps
- about