<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
    <channel>
      <title>Alan&#039;s PKB</title>
      <link>https://caballo-compute.pages.dev</link>
      <description>Last 10 notes on Alan&#039;s PKB</description>
      <generator>Quartz -- quartz.jzhao.xyz</generator>
      <item>
    <title>Semiconductors</title>
    <link>https://caballo-compute.pages.dev/semiconductors/</link>
    <guid>https://caballo-compute.pages.dev/semiconductors/</guid>
    <description><![CDATA[ The semiconductor industry runs on a cost curve that used to be predictable and no longer is. ]]></description>
    <pubDate>Sun, 03 May 2026 17:40:30 GMT</pubDate>
  </item><item>
    <title>Systems</title>
    <link>https://caballo-compute.pages.dev/systems/</link>
    <guid>https://caballo-compute.pages.dev/systems/</guid>
    <description><![CDATA[ Knowing what a GPU can do in theory is very different from making it do that thing in practice. ]]></description>
    <pubDate>Sun, 03 May 2026 17:40:30 GMT</pubDate>
  </item><item>
    <title>Hardware</title>
    <link>https://caballo-compute.pages.dev/hardware/</link>
    <guid>https://caballo-compute.pages.dev/hardware/</guid>
    <description><![CDATA[ Every AI chip ever shipped — tensor cores, MXUs, matrix cores, whatever the marketing name — is fundamentally a systolic array surrounded by a memory hierarchy. ]]></description>
    <pubDate>Sun, 03 May 2026 17:40:30 GMT</pubDate>
  </item><item>
    <title>Alan&#039;s PKB</title>
    <link>https://caballo-compute.pages.dev/</link>
    <guid>https://caballo-compute.pages.dev/</guid>
    <description><![CDATA[ One memory access costs 1,000 multiply-accumulates. That single fact shapes every chip, kernel, and serving system in production. ]]></description>
    <pubDate>Sun, 03 May 2026 17:40:30 GMT</pubDate>
  </item><item>
    <title>Inference</title>
    <link>https://caballo-compute.pages.dev/inference/</link>
    <guid>https://caballo-compute.pages.dev/inference/</guid>
    <description><![CDATA[ The dirty secret of LLM inference is that your $30,000 GPU spends most of its time waiting for memory, not doing math. ]]></description>
    <pubDate>Sun, 03 May 2026 17:40:30 GMT</pubDate>
  </item><item>
    <title>Context</title>
    <link>https://caballo-compute.pages.dev/context/</link>
    <guid>https://caballo-compute.pages.dev/context/</guid>
    <description><![CDATA[ The AI hardware boom feels unprecedented, but almost none of the underlying dynamics are new. ]]></description>
    <pubDate>Sun, 03 May 2026 17:40:30 GMT</pubDate>
  </item><item>
    <title>Cornered Chips</title>
    <link>https://caballo-compute.pages.dev/cornered-chips/</link>
    <guid>https://caballo-compute.pages.dev/cornered-chips/</guid>
    <description><![CDATA[ As AI workloads fragment from “train a big transformer” into specialized inference regimes, the hardware must fragment too. GPUs remain the default. ]]></description>
    <pubDate>Sun, 03 May 2026 17:40:30 GMT</pubDate>
  </item><item>
    <title>JEPA-R: A Latent Prediction Chip for Robotics</title>
    <link>https://caballo-compute.pages.dev/cornered-chips/jepa-robotics-chip</link>
    <guid>https://caballo-compute.pages.dev/cornered-chips/jepa-robotics-chip</guid>
    <description><![CDATA[ Why robot world models should predict in latent space at 50 Hz on a 15W edge ASIC (JEPA-R), render pixels only on demand via a datacenter diffusion chip (VDX-1), and how a 2,500-20,000x compute asymmetry between prediction and rendering justifies two chips instead of one. ]]></description>
    <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
  </item><item>
    <title>SpecDecode-1: A Speculative Decoding ASIC</title>
    <link>https://caballo-compute.pages.dev/cornered-chips/speculative-decode-chip</link>
    <guid>https://caballo-compute.pages.dev/cornered-chips/speculative-decode-chip</guid>
    <description><![CDATA[ Every inference chip treats speculative decoding as a software optimization on general-purpose hardware. SpecDecode-1 treats it as the primary design target — dedicated silicon where the draft-verify-accept loop is the fundamental operation, not an afterthought. A detailed architecture proposal with five purpose-built subsystems: a Groq-style SRAM-only draft accelerator, a tree-attention-native verifier engine, a multi-gigabyte hardware-paged KV-cache pool, an FSM-based token tree manager, and HBM3e for weight streaming. ]]></description>
    <pubDate>Sun, 03 May 2026 00:00:00 GMT</pubDate>
  </item><item>
    <title>Nonlinear Silicon</title>
    <link>https://caballo-compute.pages.dev/cornered-chips/nonlinear-silicon</link>
    <guid>https://caballo-compute.pages.dev/cornered-chips/nonlinear-silicon</guid>
    <description><![CDATA[ Kuramoto oscillators, dynamical systems, and the post-digital AI chip — why synchronization may replace matrix multiplication ]]></description>
    <pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate>
  </item>
    </channel>
  </rss>