← WattsB4BotsAnatomy № I · The GPU

Anatomy — a blueprint series

The GPU,
in nine scales.

From a single electrical switch to a gigawatt data-center campus — the same machine, drawn at nine magnifications. Plain-language specs, metric and imperial, a real-world size you already know, costs in 2026 dollars, and the public company closest to every tier.

One machine, nine enclosuresleft to right · small to large

switch
chip
memory
module
board
server
rack
hall
campus

Scale ladderNine tiers · ~9 orders of magnitude

№	Object	Linear scale	Power	Ticker(s)
0	FinFET transistor	~50 nm	—	ASML*
I	GPU die	27 × 30 mm	—	NVDA · TSM
II	HBM3e stack	11 × 11 mm	—	MU
III	SXM5 module	78 × 105 mm	700–1,000 W	NVDA
IV	HGX baseboard	544 × 406 mm	6–10 kW	NVDA · ALAB
V	DGX chassis (8U)	356 × 482 × 897 mm	10 kW	SMCI · DELL
VI	NVL72 rack	2 × 0.6 × 1.2 m	120 kW	VRT · ANET
VII	Data hall	60 × 210 m	10–100 MW	EQIX · DLR
VIII	Hyperscaler campus	hundreds of acres	1–10 GW	CEG · VST · TLN

* Tier 0 has no direct public proxy — the closest trade is the lithography tool vendor (ASML, not on our 37).

Plate № 0Tier 0 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

Fin· Vertical silicon ridge (~5 nm wide)the tiny wall of silicon that the current flows along
Gate· Metal electrode wrapping the finthe switch — a voltage here lets current through, no voltage blocks it
Source· n-doped contact (electron reservoir)where electrons start
Drain· n-doped contact (electron exit)where electrons end up when the switch is open
Substrate· Bulk silicon waferthe slab of silicon everything else is etched into

Transistors per H100 dieFills in real time

Counting…

If each transistor were a grain of sand, one die would hold enough sand to fill a half-ton pickup truck.

Fig. 0 — The transistorschematic · not to scale

A single FinFET switch, drawn as if you were holding it up to the light under an electron microscope — the vertical silicon fin, the gate wrapped around it, and the source and drain that let current pass when the gate says so.

The transistor

The atom of computing — a switch so small you need an electron microscope to see it.

Feature size: ~5 nm / ~0.000002 in
Cell pitch: ~50 nm (a virus is 10× bigger)
Density: ~98 million transistors per mm²
Process: TSMC 4N (NVIDIA-custom 5-nm-class fab)

A FinFET is a tiny electrical switch — the "fin" is a vertical silicon ridge; the switch opens or closes roughly two billion times a second. Every calculation a GPU does boils down to billions of these switches flipping in coordinated patterns. Everything at bigger tiers in this page is just an ever-larger arrangement of this one trick.

Used for

the smallest electrical switch in a computer
every arithmetic operation a GPU performs

Approx cost, 2026

—

Not priced individually. The cost lives at the die level above.

— no direct public ticker

Plate № ITier 1 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

SM grid· 132 Streaming Multiprocessorsthe compute tiles — each runs thousands of math operations in parallel
L2 cache + crossbar· 50 MB on-die SRAM, central data routershared fast memory and the switchboard that moves data between SMs
HBM PHY· 6 × memory-interface stripswhere the die talks to the stacks of RAM bolted next to it
NVLink PHY· high-speed chip-to-chip linkhow one GPU talks to seven others in a server, direct at wire speed
Silicon substrate· TSMC 4N / 4NP waferthe raw slab of silicon that everything is etched onto

Fig. I — The dieschematic · not to scale

The H100 die with its logic blocks drawn roughly to internal proportion — streaming multiprocessor grid, memory controllers on the edges, and the central crossbar that routes data between them.

The die

Eighty billion switches laid out on a single square of silicon.

Size: 27 × 30 mm / 1.06 × 1.18 in
Area: 814 mm² (about the size of a postage stamp)
Process: TSMC 4N (H100) · 4NP (B200)
Transistors: 80 B (H100) · 208 B (B200, 2 dies fused)
SMs (compute clusters): 132 active · 16,896 parallel cores

One chip — the thing NVIDIA actually designs and TSMC actually makes. Every mention of "an H100" or "a Blackwell" refers to this piece of silicon. The chip is organized into 132 compute clusters called SMs (Streaming Multiprocessors), each running thousands of calculations in parallel. Everything on later plates exists to keep this one chip fed with power and data.

Used for

the unit NVIDIA sells as one GPU
the compute engine of every AI model today

Approx cost, 2026

Silicon cost: $3k–5k · Sold as a finished module: $25k–40k

Die-level manufacturing cost vs. market price of a complete H100.

NVDAdesigns the chip TSMfabricates the silicon

Alternates

AMDInstinct MI300X / MI350
Blackwell's only GPU rival at scale
—Google TPU v6 (Trillium)
custom silicon, internal to GCP
—AWS Trainium 2
custom silicon, internal to AWS
INTCGaudi 3
distant third; shrinking roadmap

Plate № IITier 2 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

DRAM die × 12· Stacked memory layers · 32 banks per die (HBM3e, 12-Hi)twelve ultra-thin memory chips, each a grid of memory banks, stacked like a deck of cards
TSV column· Through-silicon via — copper pillarvertical wires punched straight through every die to move data between layers
Microbumps· Solder micro-balls, ~25 µm pitchthe tiny balls that bond each die to the one above it
Buffer die· Logic die — routes to GPUthe layer at the bottom that talks to the GPU on behalf of the whole stack
Interposer substrate· Silicon base, hosts stack + GPUthe shared plate of silicon the stack sits on, next to the GPU die

Fig. II — The HBM stackschematic · not to scale

The HBM stack seen from the side — twelve DRAM dies stacked like pages of a book, bonded by microscopic through-silicon vias, resting on a buffer die that speaks to the GPU next door.

The HBM stack

The most expensive memory on Earth, stacked twelve layers high.

What it is: HBM3e — High-Bandwidth Memory, 3rd-gen enhanced
Stack: 12 DRAM chips on top of each other = 36 GB
Footprint: ~11 × 11 × 0.7 mm / 0.43 × 0.43 × 0.03 in
Speed: ~1.2 TB/s per stack (a fast home SSD: ~7 GB/s)
Per B200: 8 stacks · 192 GB memory · 8 TB/s total

A GPU is only as fast as the memory feeding it. Regular computer memory (the RAM sticks you'd put in a desktop) is too slow and too far from the chip. HBM solves this by stacking 12 memory chips directly on top of the buffer, wired together through microscopic vertical holes called TSVs (Through-Silicon Vias). It's sitting millimeters from the GPU die — the shortest possible path. Memory bandwidth, not raw math speed, is what limits how fast most AI models actually run.

Used for

holding the AI model's weights while the GPU runs
passing data to and from the compute cores at full speed

Approx cost, 2026

~$350–400 per stack · roughly $10 per GB

HBM is sold out through 2026; prices are rising around 20% year over year.

MUMicron — HBM3e maker, gaining share vs. SK hynix

Alternates

—SK hynix (000660.KS)
incumbent; ~50% HBM share, NVIDIA-preferred
—Samsung (005930.KS)
third supplier; qualifying on HBM3e

Plate № IIITier 3 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

Heat spreader· Copper IHSthe metal lid — pulls heat off the chip
HBM × 6· 12-die high-bandwidth memory stacksthe GPU's ultra-fast RAM, built by Micron / SK hynix / Samsung
GPU die· ~80 B transistors · TSMC 4N siliconthe actual chip doing the math
Substrate· Organic packagethe green fiber board that routes signals out of the die
SXM5 PCB· Mezzanine boardwhat plugs into the server's baseboard

Fig. III — The moduleschematic · not to scale

An SXM5 module viewed from above: the GPU die centered, six to eight HBM stacks clustered tight around it, and the mezzanine connector that mates it to the baseboard below.

III

The module

One GPU, bundled — die, memory, power stage, all on one board.

Form factor: SXM5 — NVIDIA's mezzanine socket standard
Size: ~78 × 105 mm / ~3.1 × 4.1 in
Power draw: 700 W (H100) · 1,000 W (B200) — a space heater is ~1,500 W
Memory sites: 6 HBM stacks (H100) · 8 (B200)

The brick of every AI cluster. When engineers say "an H100" or "a B200," they mean this whole module — the chip, the memory, and the power delivery all on one small board. Consumer graphics cards like the RTX 5090 do the same job at a smaller scale: same architecture family, smaller die, slower memory, slotted into a gaming PC instead of a datacenter server.

Used for

gaming & creative workstations (consumer cards)
local AI model inference for hobbyists
every AI datacenter cluster (datacenter cards)

Approx cost, 2026

Datacenter: $25k–40k · Gaming (RTX 5090): $2,000–3,900

The RTX 5090's official price is $1,999; retail prices ran 75%+ above that during the 2026 memory shortage.

NVDAdesigns and sells the module

Alternates

AMDInstinct MI300X (OAM form factor)
open-standard mezzanine; ~192 GB HBM3
—Google TPU v6 board
liquid-cooled, pod-only, not sold
—AWS Trainium 2 board
AWS-only, not sold retail

Plate № IVTier 4 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

SXM5 module × 8· Blackwell B200 GPU modulesthe eight GPU bricks from the previous plate, bolted to one board
NVSwitch × 4· NVLink fabric ASICs · 900 GB/s bisectionthe switchboard chips that let every GPU talk to every other at full speed
NVLink traces· Multi-lane differential pairs, 200 Gb/s eachthe amber wires you see pulsing — packets moving between GPUs
PCB· Multi-layer high-frequency substratethe giant circuit board everything is soldered to
Power & signal bus· Mezzanine headers to the chassis belowhow 10 kilowatts of power and the outside network get in

Fig. IV — The baseboardschematic · not to scale

The HGX baseboard seen flat on a bench — eight SXM5 modules in a 2×4 grid, the four NVSwitch fabric chips between them, and the high-speed traces that let every GPU talk to every other at 900 GB/s.

The baseboard

Eight GPUs wired together so tightly they act like one giant chip.

What it is: HGX — NVIDIA's 8-GPU reference board
Population: 8 × SXM5 modules + 4 × NVSwitch chips
Internal fabric: 900 GB/s NVLink bisection (all-to-all)
Size: ~544 × 406 mm / ~21.4 × 16 in
Power draw: ~6.5 kW (H100) · ~10.4 kW (B200)

The motherboard for AI. Eight modules share one ultra-fast internal network called NVLink, so they act as a single 8-GPU brain rather than eight separate ones. This is what "eight H100s" means in practice. The signal-boosting retimer chips that keep those multi-gigahertz wires clean at this speed are their own little market — that's where tickers like ALAB and CRDO fit.

Used for

the core of every NVIDIA-reference AI server
tightly-coupled multi-GPU training jobs

Approx cost, 2026

~$200k–300k per complete baseboard

Silicon dominates the bill; the rest is PCB, connectors, and thermal interface.

NVDAreference design & assembly ALABAstera Labs — signal retimers CRDOCredo — active optical cables

Alternates

AMDInstinct Universal Baseboard (UBB)
8× MI300X open-standard analogue
—Meta / Microsoft custom boards
internal Ironwood, Maia, MTIA platforms

Plate № VTier 5 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

HGX baseboard· 8 × B200 SXM5 · the previous platethe 8-GPU board from plate IV, resting in the top of the box
CPU tray· 2 × Intel Xeon Platinum · 2 TB DDR5two conventional CPUs and a small warehouse of regular RAM — they feed the GPUs
ConnectX-7 NICs × 8· 400 Gb/s InfiniBand / Ethernetnetwork cards — one per GPU — for talking to the other servers in the cluster
Power supplies × 6· 3,300 W redundant PSUs · 208 V AC inputsix beefy power bricks; any one can fail without taking the server down
Fan wall· High-static-pressure axial fansthe roar you hear in a data center — the fans pulling air across everything inside

Fig. V — The chassisschematic · not to scale

The DGX chassis drawn front-on — the HGX baseboard across the top, dual Xeons and terabytes of system RAM in the middle, six power supplies along the bottom, and the 400 Gb/s network cards bristling out the back.

The chassis

Eight GPUs, two CPUs, a rack-mount box that weighs as much as a motorcycle.

Form factor: 8U rack-mount (8 rack units tall)
Size: 356 × 482 × 897 mm / 14 × 19 × 35.3 in
Weight: 130.5 kg / 287.6 lb
Power draw: 10.2 kW max (6 × 3,300 W power supplies)
CPU + RAM: 2 × Xeon Platinum · 2 TB system RAM
Network: 8 × 400 Gb/s NICs — for talking to other servers

One AI server — the unit you actually buy, ship, bolt into a rack, and plug in. DGX is NVIDIA's complete-server product; HGX is the same thing sold through Dell, Supermicro, HPE. Enterprises run a handful for production inference; universities and research labs run a few dozen; cloud providers rent them by the hour ($30–90/hour for eight GPUs).

Used for

enterprise AI inference (production)
university & biotech research clusters
cloud on-demand rental (the "hourly GPU" you rent)

Approx cost, 2026

$350k–550k per box

DGX H100 list price ≈ $500k; HGX builds from partners land $350–500k.

SMCISupermicro — the reference builder DELLPowerEdge XE9680 line HPEHPE Cray AI servers

Alternates

LNVGYLenovo ThinkSystem SR685a V3
AMD MI300X + NVIDIA HGX builder
—Quanta, Wiwynn, Foxconn ODMs
Taiwan ODMs; white-label for hyperscalers

Plate № VITier 6 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

Compute tray × 18· 2 × GB200 superchip per tray · 4 GPUs · 2 Grace CPUseach amber-glowing slice holds four GPUs and two CPUs
NVSwitch tray × 9· NVLink fabric trays · 130 TB/s bisectionthe middle slices are the fabric — they wire every GPU to every other
Coolant manifolds· Direct-liquid-cooling supply & return · ~40 °C outthe amber pipes; warm water leaves here, gets cooled on the roof, comes back cold
Top of rack· Spine-switch uplinks to other NVL72 rackswhere this rack plugs into the rest of the data hall
Busbar & PDU· ~480 V DC busbar · 120 kW sustainedthe power rail — enough electricity for a block of houses, carried in one bar

Fig. VI — The rackschematic · not to scale

The NVL72 rack drawn in profile — eighteen compute trays and nine switch trays stacked in an upright cabinet, liquid-cooling manifolds running top-to-bottom on both sides, hot coolant leaving through the top.

The rack

Seventy-two GPUs plumbed together, in a liquid-cooled box the size of a refrigerator.

Product: GB200 NVL72 — 72-GPU rack, NVIDIA's flagship AI unit
Population: 72 Blackwell GPUs + 36 Grace CPUs
Layout: 18 × 1U compute trays + 9 switch trays
Internal fabric: 130 TB/s NVLink — all 72 GPUs act as one
Power: 120 kW — 100% direct-liquid-cooled (no fans)
Size: ~2 × 0.6 × 1.2 m / ~79 × 24 × 47 in
Weight: ~1,360 kg / 3,000 lb
Performance: 1.4 exaFLOPS — a trillion-trillion math ops/sec

Frontier AI training happens here now. One NVL72 rack is a supercomputer by any pre-2020 definition — hyperscalers order them in blocks of thousands. The rack contains no fans at all; every watt of heat leaves through liquid manifolds on its sides to rooftop chillers. Vertiv's liquid-cooling business lives in this tier.

Used for

frontier-model training (GPT-5, Grok-4-class)
hyperscaler production inference at scale

Approx cost, 2026

$3.0M–3.4M per rack (GB200) · $6M+ (GB300)

HSBC estimate; next-generation VR200/VR300 racks are roadmapped at $5M–$8.8M each.

VRTVertiv — liquid cooling & CDUs ANETArista — scale-out switching between racks NVDAreference rack design

Alternates

—Google TPU v7 Ironwood pod
~9,000 accelerators, custom optical fabric
AMDMI350 / MI400 rack platforms
AMD's answer to NVL72, 2026+ ramp
—AWS Trainium 2 UltraCluster
Amazon-internal rack; AMZN capex story

Plate № VIITier 7 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

Rack rows· ~192 NVL72 racks · ~13,800 GPUsevery amber sliver is one rack from the previous plate, lined up in rows
In-row CDUs· Coolant distribution units · ~4 MW eachthe fridge-sized boxes that pump chilled water into every rack's manifolds
Power gallery· Medium-voltage transformers + UPSwhere utility power steps down to something the servers can use
Spine switches· Rack-to-rack InfiniBand / Ethernet fabricthe top-of-hall switches that let racks talk to each other at scale
Rooftop chillers· Evaporative cooling · 4,000+ tonswhere the heat from every GPU finally leaves the building as warm air or steam

One reference AI data hall60 × 210 m · 135,600 sq ft

~192

NVL72 racks

13,824

Blackwell GPUs

23 MW

critical IT load

Twenty-three megawatts is roughly the steady electricity draw of eighteen thousand American homes — poured into one building, for one training run.

Fig. VII — The data hallschematic · not to scale

The data hall seen from above — rows of racks on a raised floor, coolant distribution units along the aisles, transformers and switchgear in the power gallery, rooftop chillers venting heat to the sky.

VII

The data hall

A warehouse of racks with its own substation and coolant plant.

Floor area: 50,000 – 1,000,000+ sq ft
Reference hall: 60 × 210 m / 197 × 689 ft = 12,600 m² / 135,600 sq ft
Critical load: 10–100 MW (1 MW ≈ 800 US homes' steady draw)
Power density: 1,000–2,000 watts per sq ft (liquid-cooled AI halls)
Cooling: In-row CDUs (Coolant Distribution Units), rear-door heat exchangers, rooftop chillers

The hall is the unit of buildout — one foundation, one cooling loop, one electrical bus, thousands of racks all inside. A single training cluster for the next generation of frontier models now spans tens of thousands of GPUs across multiple halls on one campus. The colo (colocation) REITs — Equinix, Digital Realty — build and lease this kind of space to hyperscalers.

Used for

training runs for frontier AI models
colocation real estate leased to hyperscalers & enterprises

Approx cost, 2026

$10–12M per MW (standard) · $20M+ per MW (AI-ready liquid)

A 100 MW facility costs $900M – $1.5B to build, all-in.

EQIXEquinix — neutral colocation REIT DLRDigital Realty — hyperscale REIT VRTVertiv — cooling & power equipment

Plate № VIIITier 8 · of VIII

Loading plate…

LegendNumbers correspond to markers in fig.

Data halls × 8· 125 MW each · ~1,500 NVL72 racks per halleach rectangle is one of the halls from plate VII — eight of them on one site
On-site substation· 345 kV → 34.5 kV step-down · 1 GW capacitywhere high-voltage power from the grid becomes usable campus power
Cooling ponds· Open-loop evaporative ponds · ~1 M gal/daythe heat leaves as water vapor here — a surprising amount of water
Backup generators· Diesel + battery UPS · 96+ hours runtimeemergency power for when the grid hiccups
Grid tie-in· Long-dated PPA · dedicated 345 kV linethe umbilical cord back to the regional electric grid

One gigawatt AI campus~1.2 km² · ~300 acres

1 GW

IT load · 750,000 homes

~600,000

Blackwell-class GPUs

~1 M

gallons water / day

Stargate, announced in 2025 at $500 billion, is ten of these — ten gigawatts of AI capacity by 2029. The bottleneck is no longer silicon, it's the substation, the transmission right-of-way, and the twenty-year power contract that makes the whole thing possible.

Fig. VIII — The campusschematic · not to scale

The campus seen as a site plan — multiple data halls arrayed on a grid, the on-site substation linking to the regional grid, cooling ponds and transmission right-of-way visible at the edges.

VIII

The campus

A small city's worth of electricity, poured into one math problem.

Footprint: hundreds of acres · multi-building
Power: 1 – 10 gigawatts (1 GW ≈ 750,000 US homes' steady draw)
Current projects: Meta Prometheus · Microsoft Mount Pleasant · Stargate (5 sites)
Stargate target: 10 GW aggregate by 2029 — equivalent to ten nuclear reactors

A gigawatt campus uses about as much electricity as 750,000 American homes put together. The bottleneck at this scale is no longer silicon — it's substations, transmission rights-of-way, and long-dated PPAs (Power Purchase Agreements, which are multi-year contracts to buy electricity from a specific generator). This is the end of the AI trade chain, and the reason power utilities made the watchlist: CEG (nuclear), VST (merchant generation), TLN (24/7 baseload).

Used for

training the next generation of frontier models ($500M–$1B+ per run)
long-horizon hyperscaler inference capacity

Approx cost, 2026

$45B – $55B per gigawatt of capacity

Stargate's 10 GW goal implies $400B+ in infrastructure spend over five years.

CEGConstellation — nuclear PPAs VSTVistra — merchant generation TLNTalen Energy — 24/7 baseload

Nine orders of magnitude, drawn the same way twice: silicon switches, stacked — and the electricity to run them.

Browse the 37 company files →See the supply-chain plate Live watchlist

Colophon

Illustrations are schematic — engraved interpretations, not engineering drawings to the millimeter. Die aspect ratios, module mechanical dimensions, and memory package footprints use publicly available reference points and partner documentation; exact figures behind PCI-SIG membership are approximated.

Costs are 2026 estimates from public market reporting; they move. Nothing here is investment advice.

Sources: TSMC · SK hynix · Micron · NVIDIA · DGX docs · DataCenterFrontier · HSBC supply-chain notes · Phosphor Icons

Follow the build

Stick around.

The Anatomy is a tier-by-tier teardown of the AI stack, starting with the GPU. Subscribe to hear when the next tier lands.

One email per new explainer. Nothing else.