Alan's PKB
Search
Search
Dark mode
Light mode
Explorer
Tag: inference
8 items with this tag.
May 03, 2026
SpecDecode-1: A Speculative Decoding ASIC
hardware
inference
speculative-decoding
chip-design
ASIC
KV-cache
tree-attention
May 01, 2026
The Case for an Agentic Inference Chip
hardware
inference
agentic-ai
chip-design
memory-architecture
speculative-decoding
ARIA
KV-cache
Apr 13, 2026
Interactive Visualizations
tools
interactive
GPU
B200
memory
KV
roofline
inference
systolic
Apr 12, 2026
Breaking Down Blackwell
nvidia
blackwell
b200
systolic-arrays
inference
gpu-architecture
peak-flops
the
where
5th
power
stress-testing
decode
NVL72
what
project
interesting
Apr 11, 2026
InferBench
inference
benchmarking
asic
architecture
research
why
workload
LLM
diffusion
MoE
vision
systolic
SIMT
In-Memory
dataflow
reconfigurable
worked
NVIDIA
groq
google
comparison
key
proposed
useful
cost
energy
flexibility
validation
calibration
publication
Apr 11, 2026
Inference Optimization Stack
inference
optimization
quantization
cuda
blackwell
moe
kv-cache
synthesis
research
1
2
3
TrtLLMGen
thunder
4
5
6
7
baseline
with
8
9
10
Apr 11, 2026
SpectralQuant KV Cache
kv-cache
quantization
inference
attention
compression
spectral-methods
transformer-internals
research
executive
1
2
participation
the
what
3
4
task
statistical
distribution
5
why
connection
6
KIVI
loki
KV-CoRE
RoPE
random
Rate-Distortion
7
memory
throughput
calibration
compatibility
8
tested
layer
Training-Time
dynamic
interaction
theoretical
9
Apr 11, 2026
TrtLLMGen MoE Kernels
nvidia
tensorrt-llm
flashinfer
moe
cuda
blackwell
sm100
inference
open-source
mlperf
research
1
the
where
2
why
3
4
what
NVIDIA
5
MLPerf
InferenceX
6
7
Short-Term
Medium-Term
8
9