TPUs are Google’s custom chips optimized for fast matrix multiplication and energy efficiency using systolic arrays and pipelining.
They rely on ahead-of-time compilation (XLA) and large on-chip scratchpad memories instead of traditional caches to reduce expensive memory accesses.
A TPUv4 chip contains two TensorCores with 128×128 matrix units, vector units, and high-bandwidth memory stacks.
TPUs scale from single chips to trays, racks, and pods using high-speed interconnects (ICI) and optical switching (OCS) for flexible, high-bandwidth communication.
Multi-pod configurations use data-center networks for inter-pod communication, enabling training of large models like PaLM with minimal code changes via XLA
Get notified when new stories are published for "🇺🇸 Hacker News English"