Hardware

Every AI chip ever shipped — tensor cores, MXUs, matrix cores, whatever the marketing name — is fundamentally a systolic array surrounded by a memory hierarchy. The interesting questions are never about peak FLOPS. They’re about how big you make the array, how you feed it data, and what you’re willing to give up in programmability to keep it busy.

This section covers the architecture of AI accelerators from the transistor level up. GPU die economics, the real cost of the compute-vs-memory wall, why 96% of an H100’s transistors “aren’t doing math” and why that stat is both correct and useless. Understanding hardware at this level matters because the constraints are physical, not engineering choices — data movement costs 1000x more energy than arithmetic, HBM is the single most expensive component on any AI chip, and yield curves determine which architectures are even manufacturable. These are the boundaries every system designer works within.

The Blackwell breakdown walks through where all 92 billion transistors actually live and stress-tests NVIDIA’s bandwidth claims against real workloads. The systolic arrays piece covers the computational primitive that the entire industry converged on. The supply chain analysis connects chip architecture to the packaging and memory sourcing that actually determine whether you can buy the thing.

Breaking Down Blackwell — NVIDIA B200 architecture breakdown and specs
AI Hardware Deep Dive — broad survey of compute platforms
Systolic Arrays — the computational primitive behind every AI accelerator
Roune: AI Chip Design — key ideas from the best doc on chip design
Blackwell B200 Supply Chain — CoWoS yields, HBM3e sourcing
TSMC N2 Economics — GAA transistors, cost curve, who benefits
DIY TPU v1 — reverse-engineering Google’s first AI chip from scratch

See also: Cornered Chips — four domain-specific chip architecture proposals (ARIA, VDX-1, PhysDiffuse-1, ATLAS) for workloads that GPUs handle poorly.

Alan's PKB

Explorer

Roune: Designing AI Chip Hardware and Software

Systolic Arrays

Breaking Down Blackwell

The Physics of Intelligence

Blackwell Supply Chain

DIY TPU v1: Reverse-Engineering Google's First AI Chip