Nonlinear Silicon

Modern AI hardware is a cargo cult built around matrix multiplication. For sixty years, the silicon industry optimized one primitive — the multiply-accumulate — and the machine learning field designed its architectures to match. Convolutions are matrix multiplies. Attention is matrix multiplies. MLPs are matrix multiplies. When NVIDIA shipped CUDA in 2007 and the deep learning community discovered that GPUs could parallelize backpropagation, the marriage was consummated: neural network architectures would be designed around GPU primitives, not the other way around. The entire trillion-dollar AI hardware stack — HBM, tensor cores, NVLink, InfiniBand, 1000W TDP envelopes — exists to move numbers into a systolic array, multiply them, and move the results out. The computation itself, the actual floating-point multiply, costs 3.7 pJ. Moving the operand from DRAM costs 640 pJ. The industry spends 170x more energy on logistics than on math.

There is another way. The brain runs 10^15 synaptic operations per second on 20 watts — roughly 5-50 TOPS/W, which is 10-20x better than an H100. It does not multiply matrices. It synchronizes oscillations, propagates spikes, settles energy landscapes, and exploits the continuous dynamics of ion channels and dendritic trees. Physics does the computation for free; the only cost is setting up the initial conditions.

This article examines the thesis that the next disruption in AI hardware will come not from better digital accelerators but from nonlinear analog systems — oscillator networks, dynamical solvers, thermodynamic samplers — where computation is encoded in the physical dynamics of the substrate itself.

The Physics Case: Why Continuous Dynamics Win

The efficiency argument for analog compute rests on three compounding factors.

Factor 1: The Landauer gap. The thermodynamic minimum energy to erase one bit of information is kT ln(2) ~ 3 zeptojoules (3 x 10^-21 J) at room temperature. Current digital CMOS operates at 10^5 to 10^6 times above this floor. A single 7nm FinFET switching event dissipates ~10 attojoules (10^-17 J); a full digital MAC at the system level costs 5-20 picojoules on a GPU, 1-5 pJ on a dedicated digital ASIC. Analog multiply-accumulate in resistive crossbar arrays (IBM, 2023) achieves sub-10 femtojoules per MAC — 100x below a digital GPU MAC and within 10^3 of Landauer. The gap between analog and Landauer is large; the gap between digital and analog is already 100x and growing at each node, because digital switching energy scales poorly below 5nm while analog signal processing scales with voltage squared.

Factor 2: Local compute eliminates data movement. The 170x energy ratio between DRAM access and arithmetic exists because digital architectures separate memory and compute. In-memory analog compute — performing multiplication as current through a resistive element whose conductance encodes a weight — eliminates the data movement entirely. The weight never leaves the device. IBM’s phase-change memory crossbar demonstrated sub-10 fJ/MAC precisely because no bus, no cache hierarchy, no SRAM exists between the weight and the multiply. The data movement savings alone are 10-100x.

Factor 3: Thermodynamic sampling replaces iterative algorithms. Generative models (diffusion, VAEs, Boltzmann machines) require sampling from complex distributions. Digital hardware approximates this with pseudo-random number generators and iterative MCMC chains — thousands of sequential multiply-add steps per sample, the same iterative cost that drives the VDX-1 and PhysDiffuse-1 chip proposals. A physical system at thermal equilibrium samples from its Boltzmann distribution natively, in the time it takes to reach equilibrium (nanoseconds to microseconds for electronic systems). The speedup for sampling-heavy workloads is 10-1000x, depending on the distribution complexity and the digital algorithm’s mixing time.

The compound claim: analog compute (10x over digital per MAC) x local compute (10-100x from eliminating data movement) x thermodynamic sampling (10-1000x for generative/optimization tasks) = 1,000x to 100,000x theoretical energy efficiency advantage. No one has demonstrated the full stack. But the individual factors are measured, not projected.

Energy per Operation: Landauer Floor to GPU

Logarithmic scale — each bar segment represents ~3 orders of magnitude

Analog / Physical Digital Thermodynamic limit

10⁻²¹ J (zJ)10⁻¹⁸ J (aJ)10⁻¹⁵ J (fJ)10⁻¹² J (pJ)10⁻⁹ J (nJ)

Landauer limitkT ln(2), 300K

3 zJ (3×10⁻²¹ J)

Optical CIMsub-photon multiply

0.24 aJ (2.4×10⁻¹⁹ J)

Supercon. AQFPadiabatic logic

40 aJ (4×10⁻¹⁸ J)

Brain synapsebiological

~10 fJ

Analog CIMIBM crossbar

<10 fJ/MAC

Digital ASICcustom inference

1-5 pJ/MAC

GPU MACH100 tensor core

5-20 pJ/MAC

DRAM read32-bit access

640 pJ

The 100x analog gap: IBM's resistive crossbar achieves <10 fJ/MAC vs. 5-20 pJ on GPU — a 500-2000x reduction. Even against dedicated digital ASICs (1-5 pJ), analog is 100-500x more efficient per operation. The compounding factor: analog eliminates the 640 pJ DRAM read entirely because weights live in the device.

Sources: Landauer (1961), IBM analog AI (Nature 2023), AQFP (Yokohama, 2019), McMahon sub-photon CIM (Nature Communications 2022). All values at room temperature except AQFP (4K cryogenic).

Kuramoto Oscillators as Compute Primitive

The Kuramoto model describes N coupled oscillators, each with a natural frequency ω_i and phase θ_i:

dθ_i/dt = ω_i + (K/N) Σ_j sin(θ_j − θ_i)

K is the coupling strength. When K exceeds a critical threshold K_c, the oscillators spontaneously synchronize — their phases lock into coherent patterns despite having different natural frequencies. This phase transition from disorder to order is the computational event.

Synchronization is classification. Map input features to natural frequencies. Map learned weights to coupling strengths. Let the system evolve. The steady-state phase pattern — which oscillators synchronize with which — encodes the output class. Different inputs produce different synchronization patterns. The system performs inference by relaxing to an attractor, not by multiplying matrices.

Connection to Hopfield networks. A Hopfield network stores memories as energy minima of a system with binary spins (+1/−1) and symmetric coupling weights W_ij. Replace binary spins with continuous phases θ_i and energy with the Kuramoto order parameter: E = −(K/2N) Σ_i,j cos(θ_i − θ_j). The Kuramoto model is a continuous, dynamical generalization of the Hopfield network. Phases replace spins. Synchronization replaces energy minimization. The attractor landscape is richer — a single Kuramoto network can encode more patterns than a Hopfield network of the same size because the continuous phase variable carries more information per node than a binary spin.

AKOrN (ICLR 2025, oral). Miyato, Lowe, Geiger, and Welling demonstrated that Kuramoto dynamics can be embedded inside standard deep networks by replacing threshold activations with oscillatory phase updates. AKOrN showed improvements across unsupervised object discovery, adversarial robustness, calibrated uncertainty quantification, and reasoning tasks — competitive with conventional architectures — while using oscillatory synchronization as the nonlinear computation. The key insight: the Kuramoto coupling update rule is differentiable, so the network trains end-to-end with standard backpropagation. The oscillatory layers add expressivity by allowing the network to represent temporal/phase relationships that pointwise activations cannot.

Intel COCOA. Intel Labs fabricated a coupled-oscillator chip on 22nm FinFET (COCOA — Coupled Oscillator Convolution Accelerator). Ring oscillator pairs encode weights as frequency differences; convolution reduces to measuring phase alignment after a synchronization period. The chip demonstrated image convolution at sub-milliwatt power — orders of magnitude below equivalent digital implementations.

Kuramoto Synchronization N=10 oscillators: random phases → phase-locked cluster (K > K_c)

DISORDERED (K < K_c)

Each oscillator at its own frequency

Order parameter r ≈ 0.1

⟶

Increase K
past K_c

SYNCHRONIZED (K > K_c)

Two phase-locked clusters emerge

Order parameter r ≈ 0.85 — classified

dθ_i/dt = ω_i + (K/N) Σ sin(θ_j − θ_i) — The Kuramoto model. Inputs map to natural frequencies ω_i. Weights map to couplings K_ij. The steady-state phase pattern IS the classification output. No matrix multiply required.

Hardware Implementations: What Exists Today

Oscillator-based computing is not theoretical. Six physical substrates have demonstrated working prototypes.

CMOS ring oscillator arrays. Moy, Ahmed, Chiu et al. (Nature Electronics, 2022) fabricated a 1,968-node coupled oscillator chip in standard CMOS. Each node is a ring oscillator whose frequency is voltage-tunable; coupling is implemented through shared current mirrors. The chip solved combinatorial optimization problems (Max-Cut, graph coloring) by mapping problem structure to coupling topology and letting the oscillators find the ground state through synchronization dynamics. Power consumption: tens of milliwatts for a problem size that would require watts on a digital FPGA. The critical advantage: CMOS compatibility. This is not exotic fabrication — it uses the same process, same tools, same foundries as every digital chip.

Spin-torque nano-oscillators (STNOs). Nanoscale magnetic tunnel junctions driven by spin-polarized current produce GHz-frequency oscillations whose phase and frequency depend on applied current and magnetic field. Romera et al. (Nature, 2018) demonstrated four coupled STNOs performing vowel recognition through synchronization patterns. Power: microwatts per oscillator. Speed: nanosecond-scale synchronization at GHz dynamics. The physics is intrinsically nonlinear (the Landau-Lifshitz-Gilbert equation governing magnetization dynamics maps directly to Kuramoto coupling). The limitation: scaling beyond small arrays requires solving fabrication uniformity — each STNO’s natural frequency must be controlled to within 1% for reliable computation.

VO2 phase-change oscillators. Vanadium dioxide undergoes an insulator-to-metal transition at ~68C, creating a natural relaxation oscillator when biased in the transition region. Parihar et al. (Scientific Reports, 2017) demonstrated coupled VO2 oscillators solving graph coloring. Switching energy: sub-picojoule. CMOS-compatible fabrication (VO2 can be deposited on standard silicon substrates via sputtering). The transition temperature is tunable through doping, enabling frequency control. Challenges: cycle-to-cycle variability of ~5-10% in transition voltage, and the need for thermal management to maintain the oscillators near the phase-change boundary.

Superconducting AQFP (Adiabatic Quantum Flux Parametron). Yokohama National University demonstrated adiabatic superconducting logic at 40 attojoules per operation — 100 to 1,000x below room-temperature CMOS. Coupled Josephson junction oscillators naturally implement the Kuramoto model at microwave frequencies (1-10 GHz). The energy advantage is overwhelming: AQFP operates within 100x of Landauer at 4K. The constraint is cryogenics: the system requires liquid helium cooling, adding ~10W of overhead per watt of computation. For a large-scale AI accelerator, the cryogenic overhead may be amortized if the chip displaces enough digital silicon.

Optical parametric oscillators (OPOs). Coherent Ising machines (CIMs) use networks of degenerate OPOs in a fiber-ring cavity. Each OPO pulse encodes a spin as its optical phase (0 or pi). McMahon et al. (Science, 2016) demonstrated a fully programmable 100-spin CIM; NTT and collaborators (Science Advances, 2021) scaled this to 100,000 spins — by far the largest oscillator-based computation to date. A separate result from McMahon’s group (Nature Communications, 2022) demonstrated sub-single-photon optical multiply at 2.4 x 10^-19 J per operation, only ~100x above Landauer. Speed: round-trip time of the fiber cavity determines the iteration rate (~MHz). CIMs have outperformed D-Wave’s quantum annealer and simulated annealing on certain Max-Cut instances. Limitation: the fiber cavity is meters long; integration into a chip-scale device requires photonic integrated circuits (PICs) which are still maturing.

Soliton micro-combs. Xu et al. (Nature, 2021) demonstrated an optical neural network using soliton microcombs — self-reinforcing pulse trains in a microring resonator — achieving 11 TOPS (trillions of operations per second) for convolutional image processing at sub-milliwatt optical power. The micro-comb provides massively parallel wavelength channels (tens to hundreds of channels from a single resonator), each carrying an independent multiply-accumulate in the optical domain. The speed is limited by the electronic DAC/ADC interfaces, not the optics.

Beyond Transformers: Architectures That Exploit Nonlinear Hardware

The transformer is a digital artifact. Its architecture — linear projections, softmax attention, feedforward MLPs — was designed for GPU-efficient parallelism. Several emerging architectures are structurally better matched to nonlinear analog hardware.

State-space models (Mamba/S4). SSMs are discretized ordinary differential equations: dx/dt = Ax + Bu, y = Cx + Du. Digital hardware discretizes time into steps and computes the update rule iteratively — the approach taken by ATLAS’s dedicated scan units for world model inference. Analog hardware can solve the continuous ODE natively — an RC circuit with tunable time constants implements the state equation in real-time, with the signal propagation being the computation. The discretization error that digital SSMs must manage (choice of step size, zero-order hold vs. bilinear transform) vanishes entirely. An analog SSM chip would process continuous sensor streams with zero latency and zero discretization loss.

Liquid neural networks. Developed at MIT by Hasani et al., liquid neural networks use ODEs with input-dependent, tunable time constants: dx/dt = -[1/tau(x,I)] * x + f(x,I,theta). The time constants tau control how fast each neuron responds. In analog hardware, tau maps directly to an RC time constant — literally a resistor and capacitor whose values are set by the weights. A liquid neural network IS an analog circuit. The architecture was designed for continuous-time control (autonomous vehicles, robotics); it maps 1:1 onto analog oscillator hardware.

Kolmogorov-Arnold Networks (KANs). KANs replace the fixed activations and linear weights of MLPs with learnable nonlinear functions on edges. In a standard MLP, the nonlinearity (ReLU, GELU) is fixed and the weights are linear. In a KAN, each edge applies a learned spline or basis function. In analog hardware, the transfer function of a device (the I-V curve of a transistor, the conductance-voltage curve of a memristor, the frequency-coupling curve of an oscillator) IS a nonlinear function. KAN edges map directly to device physics. Instead of linearizing a device and compensating for nonlinearity (the digital approach), a KAN-aware analog chip exploits device nonlinearity as a feature.

Energy-based models (EBMs). Hopfield networks, Boltzmann machines, and modern energy-based models define an energy function E(x) and perform inference by minimizing it. Digital hardware runs gradient descent iteratively. A physical system with the same energy landscape relaxes to the minimum naturally — current flows downhill, oscillators find phase-locked states, spins align with local fields. The inference time is the physical relaxation time (nanoseconds to microseconds), not the number of gradient steps.

Reservoir computing. Any complex dynamical system — a bucket of water, a piece of silicon with random defects, a network of oscillators — can serve as a feature extractor if it has fading memory and nonlinear response. The input is injected into the system; the system’s high-dimensional dynamical state is read out; a simple linear layer maps the readout to the target. The reservoir itself is never trained — only the readout layer. This means the analog hardware does not need to support backpropagation at all. Any sufficiently complex physics works. Oscillator networks are natural reservoirs: their high-dimensional phase space, nonlinear coupling, and sensitivity to initial conditions produce rich, input-dependent dynamics.

Equilibrium propagation. Scellier and Bengio (2017) proved that a physical system at equilibrium can compute gradients for training by comparing its free-phase state with a weakly-clamped state. The same hardware that performs inference also performs training — no separate backward pass, no backpropagation circuit, no gradient tape. The energy difference between free and clamped phases IS the gradient. This eliminates the biggest barrier to analog training: the need for a separate, high-precision backward pass. Equilibrium propagation has been demonstrated in resistive networks (Kendall et al., 2020) and analog oscillator arrays.

The Unconventional AI Thesis

The most capitalized bet on this paradigm is Unconventional AI, founded by Naveen Rao with a $475M seed round at a $4.5B valuation. Rao’s track record is the signal: he founded Nervana Systems (sold to Intel for ~$408M in 2016; Intel later discontinued the Nervana NNP line in favor of the separately acquired Habana Labs’ Gaudi), then founded MosaicML (sold to Databricks for $1.3B in 2023, became DBRX and Mosaic Inference). Two exits totaling ~$1.7B, both in AI hardware/infrastructure.

The research core reportedly includes Peter McMahon (Cornell, the CIM architect who demonstrated 100K-spin optical computing and sub-photon energy per multiply) and Sara Achour (Stanford, pioneer of analog program compilation — her work on automated synthesis of analog circuits from high-level specifications is the closest thing to an “analog CUDA” that exists). [Note: McMahon and Achour involvement is based on industry reports; they were not named in the company’s public funding announcement.]

The thesis, distilled: neural co-evolution. Do not design a model and then build hardware for it (the GPU path). Do not build hardware and then force models onto it (the neuromorphic path). Co-design both iteratively. Start with a physics substrate. Discover what computations it does cheaply. Design a model architecture that exploits those computations. Optimize the substrate for that architecture. Repeat.

The specific bet: time as a first-class citizen. In digital hardware, time is an enemy — clock skew, setup/hold violations, synchronization overhead. In oscillator hardware, time IS computation. The temporal dynamics of the system — how oscillator phases evolve, when synchronization occurs, how long transients last — carry the information. A system that uses dynamics rather than fighting them can process continuous signals (audio, video, sensor streams) with zero discretization overhead and energy proportional to the information content rather than the sample rate.

No technical details of Unconventional AI’s architecture have been disclosed. But the team composition and the research pedigree point toward an optical or mixed optical-electronic oscillator system with co-designed model architectures that look closer to liquid neural networks or Kuramoto-inspired layers than to transformers.

The Competitive Landscape

Nonlinear / Post-Digital AI Hardware Landscape

Positioned by total funding (vertical) and technology maturity (horizontal)

$500M+ $400M $200M $50M $10M

Research Prototype Taped Out Revenue

Unconventional AI

$475M seed · $4.5B val
Oscillator / dynamics
Rao + McMahon + Achour

Lightmatter

$400M+ raised · $4.4B val
Photonic interconnects
Passage & Envise chips

Normal Computing

$50M · Thermo. SPU
Taped out ASIC
s-transform sampling

Extropic

$14M · Thermodynamic
Boltzmann machines at kT
Tunneling-based sampling

Mythic

Analog CIM · Pivoted
Licensing as memBrain IP
Flash-based MAC arrays

Rain AI

Neuromorphic · ~$30M
Underfunded vs. ambition
Digital-analog hybrid

The gap: No dominant oscillator-computing startup has reached revenue. Unconventional AI is the first to combine serious funding ($475M) with a dynamics-first thesis. Lightmatter leads in photonic maturity but focuses on interconnects, not compute. Normal Computing and Extropic target thermodynamic sampling — complementary but distinct from oscillator networks.

Seven companies are placing distinct bets on post-digital AI hardware.

Unconventional AI ($475M, dynamics-first). The largest capitalized bet on nonlinear compute. Team combines Rao’s hardware commercialization experience with McMahon’s sub-photon optical computing and Achour’s analog compilation. No product disclosed. The valuation implies investors expect a platform, not a point solution.

Lightmatter ($400M+, $4.4B valuation). The furthest along in photonic AI hardware. Passage is a photonic interconnect chiplet; Envise is a photonic compute chip using Mach-Zehnder interferometer (MZI) meshes for matrix multiplication. Lightmatter’s current focus is interconnects rather than nonlinear compute — they replace NVLink with photons, not MACs with oscillators. But the optical platform could pivot toward OPO-based nonlinear compute if the market demands it.

Normal Computing ($50M, thermodynamic SPU). Builds a stochastic processing unit (SPU) that performs probabilistic inference using physical noise as a computational resource rather than fighting it. Taped out an ASIC. The approach: encode probability distributions in analog voltages and let thermal noise perform Monte Carlo sampling natively. Targets diffusion models and Bayesian inference. Complementary to oscillator computing — Normal attacks sampling, oscillators attack optimization and classification.

Extropic ($14M, thermodynamic computing). Founded by Guillaume Verdon (ex-Google X, TensorFlow Quantum). Building Boltzmann machines that operate at the thermal noise floor — using tunneling junctions and superconducting circuits to sample from complex distributions at kT energy per sample. The physics is sound (a physical Boltzmann machine at thermal equilibrium samples from its Boltzmann distribution by definition) but the engineering challenge is extreme: maintaining coherent thermal computation at scale.

Mythic (analog CIM, pivoted). Originally built flash-based analog compute-in-memory chips that stored neural network weights as charge on floating gates and performed multiply-accumulate by measuring drain current. Achieved ~25 TOPS at 3W (8.3 TOPS/W) for edge inference — competitive with digital ASICs. Failed to find product-market fit and pivoted to licensing the core technology as “memBrain” IP. The Mythic story is instructive: analog CIM works in the lab but lacks the software ecosystem to compete with NVIDIA at scale.

Rain AI (neuromorphic, ~$30M). Targets neuromorphic edge inference with a digital-analog hybrid architecture. Underfunded relative to its ambition. The neuromorphic path (spiking neural networks, event-driven computation) is scientifically promising but commercially unproven — Intel’s Loihi and IBM’s TrueNorth have not achieved significant market traction after a decade of investment.

The conspicuous gap. No startup focuses primarily on oscillator-based computation for mainstream AI workloads. Unconventional AI is the closest but has not disclosed an architecture. The academic results (1,968-node CMOS arrays, 100K-spin optical CIM, AKOrN at ICLR) have not been commercialized. The window is open.

Architecture Proposal: OscNet-1

A concrete chip design for oscillator-based AI inference at the edge.

Design Philosophy

OscNet-1 does not attempt to replace the GPU for transformer training. It targets a specific regime: low-power, always-on pattern recognition and classification at the edge — sensor fusion, anomaly detection, associative memory, continuous signal processing. The regime where analog’s energy advantage compounds most aggressively because (a) power budgets are milliwatts, not kilowatts, (b) batch size is always 1, (c) input data is already analog (sensors, microphones, accelerometers), and (d) latency requirements are microseconds.

Core Specifications

Parameter	Specification
Oscillator array	4,096 CMOS ring oscillators in 64x64 grid
Coupling	Programmable memristive crossbar (TiO2 RRAM)
Topology	Hierarchical: all-to-all within 8x8 tiles, sparse inter-tile
Process	GlobalFoundries 22nm FD-SOI
Die area	~10 mm^2
Power	<100 mW (oscillator core ~30 mW, periphery ~70 mW)
Inference latency	1-10 microseconds (synchronization settling time)
Precision	4-6 effective bits per coupling weight
Training	On-chip equilibrium propagation (no backprop circuit)
Digital periphery	12-bit ADC per tile (x64), SPI/I2C interface, RISC-V control core
Applications	Always-on wake-word, vibration anomaly detection, radar gesture recognition, EEG classification

Oscillator Core

4,096 voltage-controlled ring oscillators, each consisting of 5 inverter stages with a varactor-tuned frequency range of 100 MHz - 1 GHz. The natural frequency of each oscillator is set by a DAC-controlled bias voltage, mapping input features to frequencies. At 22nm FD-SOI, each oscillator occupies ~20 um^2 including the varactor; the full 4,096-oscillator array fits in 0.08 mm^2.

The oscillators are organized into 64 tiles of 64 oscillators each (8x8 grid of 8x8 tiles). Within a tile, coupling is all-to-all: each oscillator pair is connected through a memristive element in a crossbar array. The crossbar for one tile is 64x64 = 4,096 memristors. Between tiles, coupling is sparse: each tile connects to its 8 neighbors through a configurable 64x64 sparse crossbar with ~10% density (~410 connections per inter-tile link, ~3,280 inter-tile connections per tile).

Total memristive elements: 64 tiles × 4,096 intra-tile + sparse inter-tile ≈ 262,144 intra + ~105,000 inter ≈ 367,000 programmable weights. Each memristor stores a conductance value encoding the coupling strength K_ij between oscillators i and j. The Kuramoto coupling term sin(θ_j − θ_i) is implemented physically: the current flowing between two oscillators through the memristor is inherently a function of their phase difference, modulated by the memristor’s conductance.

Mixed-Signal Architecture

Analog core. The oscillator array and memristive crossbar operate entirely in the analog domain. No clocking, no digitization during inference. The system evolves continuously from input injection to synchronization.

Digital periphery. Each 8x8 tile has a 12-bit SAR ADC that digitizes the phase of each oscillator at the output (measuring zero-crossing times relative to a reference oscillator). 64 ADCs total, running at 100 MHz sample rate, provide the digital readout. A small RISC-V core (RV32IMC, ~0.1 mm^2) manages input loading, output readout, tile configuration, and communication with the host system.

Input interface. Analog sensor inputs connect directly to oscillator bias voltages through a programmable gain amplifier — no ADC required on the input path for analog sensors. For digital inputs, a 64-channel 8-bit DAC converts digital feature vectors to oscillator frequencies.

Training via equilibrium propagation. During training, the chip alternates between free phase (input applied, system relaxes to equilibrium, output phases recorded) and clamped phase (desired output weakly imposed on a subset of oscillators, system relaxes again). The weight update for memristor (i,j) is proportional to the difference in correlation between oscillators i and j across the two phases. This requires only local information — each memristor’s update depends only on the phases of its two connected oscillators in the free and clamped states. No global backpropagation. No gradient tape. The memristive crossbar stores the weights AND computes the gradients through the same physical dynamics.

Estimated Performance

Energy per inference. The oscillator core draws ~30 mW. At 10-microsecond inference latency, energy per inference is ~300 nanojoules. A comparable digital classifier (4,096-node, 367K weights, INT8) on an ARM Cortex-M7 at 400 MHz consumes ~50 microjoules per inference — roughly 170x more energy.

Throughput. At 10 microseconds per inference, OscNet-1 delivers 100,000 classifications per second. For always-on applications with 16 kHz sample rate (audio) or 100 Hz sensor rate, this is 6-1,000x overprovisioned, allowing aggressive duty-cycling to reduce average power below 1 mW.

Accuracy. The 4-6 effective bit precision per weight limits the achievable accuracy on complex tasks. Benchmarks from AKOrN (ICLR 2025) suggest that oscillator-based networks can match digital accuracy on CIFAR-100 scale tasks. For the target applications (wake-word detection, anomaly classification), 4-6 bit precision is sufficient — quantization-aware training at INT4 achieves within 1-2% of FP32 accuracy on these tasks.

Risks and Open Problems

No analog CUDA. This is the single largest barrier. NVIDIA’s dominance rests as much on CUDA/cuDNN/TensorRT as on the silicon itself. Oscillator hardware has no programming model, no compiler, no debugger, no profiler. Sara Achour’s work on analog compilation (automatically synthesizing analog circuit configurations from high-level specifications) is a start, but it targets analog signal processing, not oscillator-based neural networks. The first team to ship a usable SDK for oscillator compute will own the category.

Precision limited to 4-6 effective bits per device. Memristive conductance values drift over time (1-5% over 10^6 seconds), vary device-to-device (sigma/mu of 3-8%), and exhibit cycle-to-cycle write noise (1-3% coefficient of variation). Digital systems achieve 32-bit precision effortlessly. The practical ceiling for analog oscillator compute is 4-6 bits without calibration, 6-8 bits with periodic recalibration. This is sufficient for inference on quantized models but inadequate for training large networks without hybrid digital-analog schemes.

Drift and variability require per-chip calibration. Every OscNet-1 chip will behave slightly differently due to process variation in both the oscillators (frequency spread of 5-10% at fixed bias) and the memristors (conductance spread of 10-20% at fixed write voltage). Per-chip calibration — measuring the actual device characteristics and compensating in the weight mapping — adds manufacturing cost and limits interchangeability. The brain has the same property (no two brains are identical); the question is whether the market accepts analog-chip individuality.

Scaling beyond 10K oscillators is undemonstrated. The largest CMOS oscillator array to date has 1,968 nodes. Optical CIMs have reached 100K spins but in a fiber cavity, not on-chip. Scaling a CMOS oscillator array to 100K+ nodes raises synchronization challenges: the time for global synchronization scales as O(N/K) for mean-field coupling and O(diameter/K) for local coupling. For a 100K-node locally-coupled network, synchronization time may exceed 100 microseconds, eroding the latency advantage. Hierarchical coupling topologies (OscNet-1’s tile-based approach) are the likely solution but remain unvalidated at scale.

The cargo cult can go both ways. Just as the digital community built neural networks around GPU primitives, analog enthusiasts risk building analog hardware around problems it solves well while ignoring the workloads that matter commercially. The transformer — 90%+ of AI inference revenue — is fundamentally a matrix-multiply architecture. An oscillator chip that is 1,000x more efficient at Ising optimization but cannot run a transformer will not capture meaningful market share. The path to commercial relevance requires either (a) demonstrating that oscillator-native architectures (liquid nets, Kuramoto layers, reservoir computing) achieve competitive accuracy on commercially valuable tasks, or (b) building hybrid chips where oscillator cores handle the nonlinear components (attention softmax, sampling, optimization subroutines) while digital cores handle the linear algebra.

The Thesis

The 60-year marriage between digital logic and artificial intelligence was never ordained by physics. It was a historical accident: von Neumann architectures existed, matrix multiplication mapped onto them, and the CUDA ecosystem locked in the dependency. The brain does not multiply matrices. It synchronizes oscillations. Physics does computation for free — the only cost is setting up the boundary conditions.

The thermodynamic floor is 10^5x below where digital silicon operates today. Analog oscillator hardware has demonstrated 100x energy advantage per operation. Coupled oscillator arrays have solved optimization problems, performed convolution, classified patterns, and scaled to 100K nodes in optical implementations. Emerging architectures — SSMs, liquid neural networks, KANs, energy-based models — are structurally better matched to continuous-time analog substrates than to clocked digital arrays. Equilibrium propagation eliminates the need for digital backpropagation circuits.

The market is wide open. No dominant oscillator-computing company exists. The $475M bet at Unconventional AI is the first serious capital commitment to the thesis. The physics works. The math works. The question is whether the engineering — fabrication yield, programming models, precision management, system integration — can be solved before the digital incumbents close the efficiency gap from above.

The next chip that matters will not have a clock.

Additional Reading

Artificial Kuramoto Oscillatory Neurons (AKOrN) — Miyato et al., ICLR 2025 Oral
Intel COCOA: Coupled CMOS Oscillator Array — Nikonov et al.
Deep Physical Neural Networks (Physics-Aware Training) — Wright et al., Nature 2022
Neural Ordinary Differential Equations — Chen et al., NeurIPS 2018
Liquid Time-Constant Networks — Hasani et al., AAAI 2021
Mamba: Selective State Spaces — Gu & Dao 2023
Consistency Models — Song et al. 2023
Unconventional AI $475M Seed Round — TechCrunch, Dec 2025
Coupled Oscillators for Computing: Review — Csaba & Porod, Applied Physics Reviews 2020
Vowel Recognition with Four Coupled STNOs — Romera et al., Nature 2018
1,968-node Coupled Ring Oscillator Circuit — Moy, Ahmed, Chiu et al., Nature Electronics 2022
100-spin Coherent Ising Machine — McMahon et al., Science 2016
Sub-photon Optical Neural Network — McMahon et al., Nature Communications 2022
11 TOPS Photonic Convolutional Accelerator — Xu et al., Nature 2021
VO2 Oscillators for Graph Coloring — Parihar et al., Scientific Reports 2017

Alan's PKB

Explorer

Nonlinear Silicon

Nonlinear Silicon

The Physics Case: Why Continuous Dynamics Win

Kuramoto Oscillators as Compute Primitive

Hardware Implementations: What Exists Today

Beyond Transformers: Architectures That Exploit Nonlinear Hardware

The Unconventional AI Thesis

The Competitive Landscape

Architecture Proposal: OscNet-1

Design Philosophy

Core Specifications

Oscillator Core

Mixed-Signal Architecture

Estimated Performance

Risks and Open Problems

The Thesis

Additional Reading

Graph View

Table of Contents

Backlinks