One memory access costs 1,000 multiply-accumulates. That single fact shapes every chip, kernel, and serving system in production. These notes trace the connections — from transistors to tokens.

sections

  • cornered chips — custom silicon for workloads GPUs can’t touch
  • hardware — where all 92 billion transistors actually live
  • inference — your $30K GPU spends most of its time waiting
  • systems — the 10-50x gap between a naive kernel and an expert one
  • semiconductors — when leading-edge silicon costs more per transistor, not less
  • context — every AI chip bet has been made before

start here

  1. systolic arrays — the computational primitive behind every AI accelerator
  2. breaking down blackwell — NVIDIA’s B200, from transistors to inference cost
  3. DIY TPU v1 — reverse-engineering Google’s first AI chip from scratch

tools