← WattsB4BotsAnatomy № I · The GPU

Anatomy · Plate I · The GPU

Anatomy — a blueprint series

The GPU,
in nine scales.

From a single electrical switch to a gigawatt data-center campus — the same machine, drawn at nine magnifications. Plain-language specs, metric and imperial, a real-world size you already know, costs in 2026 dollars, and the public company closest to every tier.

One machine, nine enclosures
  1. switch
  2. chip
  3. memory
  4. module
  5. board
  6. server
  7. rack
  8. hall
  9. campus
Scale ladderNine tiers · ~9 orders of magnitude
ObjectLinear scalePowerTicker(s)
0FinFET transistor~50 nmASML*
IGPU die27 × 30 mmNVDA · TSM
IIHBM3e stack11 × 11 mmMU
IIISXM5 module78 × 105 mm700–1,000 WNVDA
IVHGX baseboard544 × 406 mm6–10 kWNVDA · ALAB
VDGX chassis (8U)356 × 482 × 897 mm10 kWSMCI · DELL
VINVL72 rack2 × 0.6 × 1.2 m120 kWVRT · ANET
VIIData hall60 × 210 m10–100 MWEQIX · DLR
VIIIHyperscaler campushundreds of acres1–10 GWCEG · VST · TLN

* Tier 0 has no direct public proxy — the closest trade is the lithography tool vendor (ASML, not on our 37).

Plate № 0Tier 0 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. Fin· Vertical silicon ridge (~5 nm wide)the tiny wall of silicon that the current flows along
  2. Gate· Metal electrode wrapping the finthe switch — a voltage here lets current through, no voltage blocks it
  3. Source· n-doped contact (electron reservoir)where electrons start
  4. Drain· n-doped contact (electron exit)where electrons end up when the switch is open
  5. Substrate· Bulk silicon waferthe slab of silicon everything else is etched into
Transistors per H100 dieFills in real time

0

Counting…

If each transistor were a grain of sand, one die would hold enough sand to fill a half-ton pickup truck.

Fig. 0The transistorschematic · not to scale

A single FinFET switch, drawn as if you were holding it up to the light under an electron microscope — the vertical silicon fin, the gate wrapped around it, and the source and drain that let current pass when the gate says so.

0

The transistor

The atom of computing — a switch so small you need an electron microscope to see it.

Feature size
~5 nm / ~0.000002 in
Cell pitch
~50 nm (a virus is 10× bigger)
Density
~98 million transistors per mm²
Process
TSMC 4N (NVIDIA-custom 5-nm-class fab)

A FinFET is a tiny electrical switch — the "fin" is a vertical silicon ridge; the switch opens or closes roughly two billion times a second. Every calculation a GPU does boils down to billions of these switches flipping in coordinated patterns. Everything at bigger tiers in this page is just an ever-larger arrangement of this one trick.

Used for

  • the smallest electrical switch in a computer
  • every arithmetic operation a GPU performs

Approx cost, 2026

Not priced individually. The cost lives at the die level above.

— no direct public ticker
Plate № ITier 1 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. SM grid· 132 Streaming Multiprocessorsthe compute tiles — each runs thousands of math operations in parallel
  2. L2 cache + crossbar· 50 MB on-die SRAM, central data routershared fast memory and the switchboard that moves data between SMs
  3. HBM PHY· 6 × memory-interface stripswhere the die talks to the stacks of RAM bolted next to it
  4. NVLink PHY· high-speed chip-to-chip linkhow one GPU talks to seven others in a server, direct at wire speed
  5. Silicon substrate· TSMC 4N / 4NP waferthe raw slab of silicon that everything is etched onto
Fig. IThe dieschematic · not to scale

The H100 die with its logic blocks drawn roughly to internal proportion — streaming multiprocessor grid, memory controllers on the edges, and the central crossbar that routes data between them.

I

The die

Eighty billion switches laid out on a single square of silicon.

Size
27 × 30 mm / 1.06 × 1.18 in
Area
814 mm² (about the size of a postage stamp)
Process
TSMC 4N (H100) · 4NP (B200)
Transistors
80 B (H100) · 208 B (B200, 2 dies fused)
SMs (compute clusters)
132 active · 16,896 parallel cores

One chip — the thing NVIDIA actually designs and TSMC actually makes. Every mention of "an H100" or "a Blackwell" refers to this piece of silicon. The chip is organized into 132 compute clusters called SMs (Streaming Multiprocessors), each running thousands of calculations in parallel. Everything on later plates exists to keep this one chip fed with power and data.

Used for

  • the unit NVIDIA sells as one GPU
  • the compute engine of every AI model today

Approx cost, 2026

Silicon cost: $3k–5k · Sold as a finished module: $25k–40k

Die-level manufacturing cost vs. market price of a complete H100.

Alternates

  • AMDInstinct MI300X / MI350
    Blackwell's only GPU rival at scale
  • Google TPU v6 (Trillium)
    custom silicon, internal to GCP
  • AWS Trainium 2
    custom silicon, internal to AWS
  • INTCGaudi 3
    distant third; shrinking roadmap
Plate № IITier 2 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. DRAM die × 12· Stacked memory layers · 32 banks per die (HBM3e, 12-Hi)twelve ultra-thin memory chips, each a grid of memory banks, stacked like a deck of cards
  2. TSV column· Through-silicon via — copper pillarvertical wires punched straight through every die to move data between layers
  3. Microbumps· Solder micro-balls, ~25 µm pitchthe tiny balls that bond each die to the one above it
  4. Buffer die· Logic die — routes to GPUthe layer at the bottom that talks to the GPU on behalf of the whole stack
  5. Interposer substrate· Silicon base, hosts stack + GPUthe shared plate of silicon the stack sits on, next to the GPU die
Fig. IIThe HBM stackschematic · not to scale

The HBM stack seen from the side — twelve DRAM dies stacked like pages of a book, bonded by microscopic through-silicon vias, resting on a buffer die that speaks to the GPU next door.

II

The HBM stack

The most expensive memory on Earth, stacked twelve layers high.

What it is
HBM3e — High-Bandwidth Memory, 3rd-gen enhanced
Stack
12 DRAM chips on top of each other = 36 GB
Footprint
~11 × 11 × 0.7 mm / 0.43 × 0.43 × 0.03 in
Speed
~1.2 TB/s per stack (a fast home SSD: ~7 GB/s)
Per B200
8 stacks · 192 GB memory · 8 TB/s total

A GPU is only as fast as the memory feeding it. Regular computer memory (the RAM sticks you'd put in a desktop) is too slow and too far from the chip. HBM solves this by stacking 12 memory chips directly on top of the buffer, wired together through microscopic vertical holes called TSVs (Through-Silicon Vias). It's sitting millimeters from the GPU die — the shortest possible path. Memory bandwidth, not raw math speed, is what limits how fast most AI models actually run.

Used for

  • holding the AI model's weights while the GPU runs
  • passing data to and from the compute cores at full speed

Approx cost, 2026

~$350–400 per stack · roughly $10 per GB

HBM is sold out through 2026; prices are rising around 20% year over year.

Alternates

  • SK hynix (000660.KS)
    incumbent; ~50% HBM share, NVIDIA-preferred
  • Samsung (005930.KS)
    third supplier; qualifying on HBM3e
Plate № IIITier 3 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. Heat spreader· Copper IHSthe metal lid — pulls heat off the chip
  2. HBM × 6· 12-die high-bandwidth memory stacksthe GPU's ultra-fast RAM, built by Micron / SK hynix / Samsung
  3. GPU die· ~80 B transistors · TSMC 4N siliconthe actual chip doing the math
  4. Substrate· Organic packagethe green fiber board that routes signals out of the die
  5. SXM5 PCB· Mezzanine boardwhat plugs into the server's baseboard
Fig. IIIThe moduleschematic · not to scale

An SXM5 module viewed from above: the GPU die centered, six to eight HBM stacks clustered tight around it, and the mezzanine connector that mates it to the baseboard below.

III

The module

One GPU, bundled — die, memory, power stage, all on one board.

Form factor
SXM5 — NVIDIA's mezzanine socket standard
Size
~78 × 105 mm / ~3.1 × 4.1 in
Power draw
700 W (H100) · 1,000 W (B200) — a space heater is ~1,500 W
Memory sites
6 HBM stacks (H100) · 8 (B200)

The brick of every AI cluster. When engineers say "an H100" or "a B200," they mean this whole module — the chip, the memory, and the power delivery all on one small board. Consumer graphics cards like the RTX 5090 do the same job at a smaller scale: same architecture family, smaller die, slower memory, slotted into a gaming PC instead of a datacenter server.

Used for

  • gaming & creative workstations (consumer cards)
  • local AI model inference for hobbyists
  • every AI datacenter cluster (datacenter cards)

Approx cost, 2026

Datacenter: $25k–40k · Gaming (RTX 5090): $2,000–3,900

The RTX 5090's official price is $1,999; retail prices ran 75%+ above that during the 2026 memory shortage.

Alternates

  • AMDInstinct MI300X (OAM form factor)
    open-standard mezzanine; ~192 GB HBM3
  • Google TPU v6 board
    liquid-cooled, pod-only, not sold
  • AWS Trainium 2 board
    AWS-only, not sold retail
Plate № IVTier 4 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. SXM5 module × 8· Blackwell B200 GPU modulesthe eight GPU bricks from the previous plate, bolted to one board
  2. NVSwitch × 4· NVLink fabric ASICs · 900 GB/s bisectionthe switchboard chips that let every GPU talk to every other at full speed
  3. NVLink traces· Multi-lane differential pairs, 200 Gb/s eachthe amber wires you see pulsing — packets moving between GPUs
  4. PCB· Multi-layer high-frequency substratethe giant circuit board everything is soldered to
  5. Power & signal bus· Mezzanine headers to the chassis belowhow 10 kilowatts of power and the outside network get in
Fig. IVThe baseboardschematic · not to scale

The HGX baseboard seen flat on a bench — eight SXM5 modules in a 2×4 grid, the four NVSwitch fabric chips between them, and the high-speed traces that let every GPU talk to every other at 900 GB/s.

IV

The baseboard

Eight GPUs wired together so tightly they act like one giant chip.

What it is
HGX — NVIDIA's 8-GPU reference board
Population
8 × SXM5 modules + 4 × NVSwitch chips
Internal fabric
900 GB/s NVLink bisection (all-to-all)
Size
~544 × 406 mm / ~21.4 × 16 in
Power draw
~6.5 kW (H100) · ~10.4 kW (B200)

The motherboard for AI. Eight modules share one ultra-fast internal network called NVLink, so they act as a single 8-GPU brain rather than eight separate ones. This is what "eight H100s" means in practice. The signal-boosting retimer chips that keep those multi-gigahertz wires clean at this speed are their own little market — that's where tickers like ALAB and CRDO fit.

Used for

  • the core of every NVIDIA-reference AI server
  • tightly-coupled multi-GPU training jobs

Approx cost, 2026

~$200k–300k per complete baseboard

Silicon dominates the bill; the rest is PCB, connectors, and thermal interface.

Alternates

  • AMDInstinct Universal Baseboard (UBB)
    8× MI300X open-standard analogue
  • Meta / Microsoft custom boards
    internal Ironwood, Maia, MTIA platforms
Plate № VTier 5 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. HGX baseboard· 8 × B200 SXM5 · the previous platethe 8-GPU board from plate IV, resting in the top of the box
  2. CPU tray· 2 × Intel Xeon Platinum · 2 TB DDR5two conventional CPUs and a small warehouse of regular RAM — they feed the GPUs
  3. ConnectX-7 NICs × 8· 400 Gb/s InfiniBand / Ethernetnetwork cards — one per GPU — for talking to the other servers in the cluster
  4. Power supplies × 6· 3,300 W redundant PSUs · 208 V AC inputsix beefy power bricks; any one can fail without taking the server down
  5. Fan wall· High-static-pressure axial fansthe roar you hear in a data center — the fans pulling air across everything inside
Fig. VThe chassisschematic · not to scale

The DGX chassis drawn front-on — the HGX baseboard across the top, dual Xeons and terabytes of system RAM in the middle, six power supplies along the bottom, and the 400 Gb/s network cards bristling out the back.

V

The chassis

Eight GPUs, two CPUs, a rack-mount box that weighs as much as a motorcycle.

Form factor
8U rack-mount (8 rack units tall)
Size
356 × 482 × 897 mm / 14 × 19 × 35.3 in
Weight
130.5 kg / 287.6 lb
Power draw
10.2 kW max (6 × 3,300 W power supplies)
CPU + RAM
2 × Xeon Platinum · 2 TB system RAM
Network
8 × 400 Gb/s NICs — for talking to other servers

One AI server — the unit you actually buy, ship, bolt into a rack, and plug in. DGX is NVIDIA's complete-server product; HGX is the same thing sold through Dell, Supermicro, HPE. Enterprises run a handful for production inference; universities and research labs run a few dozen; cloud providers rent them by the hour ($30–90/hour for eight GPUs).

Used for

  • enterprise AI inference (production)
  • university & biotech research clusters
  • cloud on-demand rental (the "hourly GPU" you rent)

Approx cost, 2026

$350k–550k per box

DGX H100 list price ≈ $500k; HGX builds from partners land $350–500k.

Alternates

  • LNVGYLenovo ThinkSystem SR685a V3
    AMD MI300X + NVIDIA HGX builder
  • Quanta, Wiwynn, Foxconn ODMs
    Taiwan ODMs; white-label for hyperscalers
Plate № VITier 6 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. Compute tray × 18· 2 × GB200 superchip per tray · 4 GPUs · 2 Grace CPUseach amber-glowing slice holds four GPUs and two CPUs
  2. NVSwitch tray × 9· NVLink fabric trays · 130 TB/s bisectionthe middle slices are the fabric — they wire every GPU to every other
  3. Coolant manifolds· Direct-liquid-cooling supply & return · ~40 °C outthe amber pipes; warm water leaves here, gets cooled on the roof, comes back cold
  4. Top of rack· Spine-switch uplinks to other NVL72 rackswhere this rack plugs into the rest of the data hall
  5. Busbar & PDU· ~480 V DC busbar · 120 kW sustainedthe power rail — enough electricity for a block of houses, carried in one bar
Fig. VIThe rackschematic · not to scale

The NVL72 rack drawn in profile — eighteen compute trays and nine switch trays stacked in an upright cabinet, liquid-cooling manifolds running top-to-bottom on both sides, hot coolant leaving through the top.

VI

The rack

Seventy-two GPUs plumbed together, in a liquid-cooled box the size of a refrigerator.

Product
GB200 NVL72 — 72-GPU rack, NVIDIA's flagship AI unit
Population
72 Blackwell GPUs + 36 Grace CPUs
Layout
18 × 1U compute trays + 9 switch trays
Internal fabric
130 TB/s NVLink — all 72 GPUs act as one
Power
120 kW — 100% direct-liquid-cooled (no fans)
Size
~2 × 0.6 × 1.2 m / ~79 × 24 × 47 in
Weight
~1,360 kg / 3,000 lb
Performance
1.4 exaFLOPS — a trillion-trillion math ops/sec

Frontier AI training happens here now. One NVL72 rack is a supercomputer by any pre-2020 definition — hyperscalers order them in blocks of thousands. The rack contains no fans at all; every watt of heat leaves through liquid manifolds on its sides to rooftop chillers. Vertiv's liquid-cooling business lives in this tier.

Used for

  • frontier-model training (GPT-5, Grok-4-class)
  • hyperscaler production inference at scale

Approx cost, 2026

$3.0M–3.4M per rack (GB200) · $6M+ (GB300)

HSBC estimate; next-generation VR200/VR300 racks are roadmapped at $5M–$8.8M each.

Alternates

  • Google TPU v7 Ironwood pod
    ~9,000 accelerators, custom optical fabric
  • AMDMI350 / MI400 rack platforms
    AMD's answer to NVL72, 2026+ ramp
  • AWS Trainium 2 UltraCluster
    Amazon-internal rack; AMZN capex story
Plate № VIITier 7 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. Rack rows· ~192 NVL72 racks · ~13,800 GPUsevery amber sliver is one rack from the previous plate, lined up in rows
  2. In-row CDUs· Coolant distribution units · ~4 MW eachthe fridge-sized boxes that pump chilled water into every rack's manifolds
  3. Power gallery· Medium-voltage transformers + UPSwhere utility power steps down to something the servers can use
  4. Spine switches· Rack-to-rack InfiniBand / Ethernet fabricthe top-of-hall switches that let racks talk to each other at scale
  5. Rooftop chillers· Evaporative cooling · 4,000+ tonswhere the heat from every GPU finally leaves the building as warm air or steam
One reference AI data hall60 × 210 m · 135,600 sq ft

~192

NVL72 racks

13,824

Blackwell GPUs

23 MW

critical IT load

Twenty-three megawatts is roughly the steady electricity draw of eighteen thousand American homes — poured into one building, for one training run.

Fig. VIIThe data hallschematic · not to scale

The data hall seen from above — rows of racks on a raised floor, coolant distribution units along the aisles, transformers and switchgear in the power gallery, rooftop chillers venting heat to the sky.

VII

The data hall

A warehouse of racks with its own substation and coolant plant.

Floor area
50,000 – 1,000,000+ sq ft
Reference hall
60 × 210 m / 197 × 689 ft = 12,600 m² / 135,600 sq ft
Critical load
10–100 MW (1 MW ≈ 800 US homes' steady draw)
Power density
1,000–2,000 watts per sq ft (liquid-cooled AI halls)
Cooling
In-row CDUs (Coolant Distribution Units), rear-door heat exchangers, rooftop chillers

The hall is the unit of buildout — one foundation, one cooling loop, one electrical bus, thousands of racks all inside. A single training cluster for the next generation of frontier models now spans tens of thousands of GPUs across multiple halls on one campus. The colo (colocation) REITs — Equinix, Digital Realty — build and lease this kind of space to hyperscalers.

Used for

  • training runs for frontier AI models
  • colocation real estate leased to hyperscalers & enterprises

Approx cost, 2026

$10–12M per MW (standard) · $20M+ per MW (AI-ready liquid)

A 100 MW facility costs $900M – $1.5B to build, all-in.

Plate № VIIITier 8 · of VIII
Loading plate…
LegendNumbers correspond to markers in fig.
  1. Data halls × 8· 125 MW each · ~1,500 NVL72 racks per halleach rectangle is one of the halls from plate VII — eight of them on one site
  2. On-site substation· 345 kV → 34.5 kV step-down · 1 GW capacitywhere high-voltage power from the grid becomes usable campus power
  3. Cooling ponds· Open-loop evaporative ponds · ~1 M gal/daythe heat leaves as water vapor here — a surprising amount of water
  4. Backup generators· Diesel + battery UPS · 96+ hours runtimeemergency power for when the grid hiccups
  5. Grid tie-in· Long-dated PPA · dedicated 345 kV linethe umbilical cord back to the regional electric grid
One gigawatt AI campus~1.2 km² · ~300 acres

1 GW

IT load · 750,000 homes

~600,000

Blackwell-class GPUs

~1 M

gallons water / day

Stargate, announced in 2025 at $500 billion, is ten of these — ten gigawatts of AI capacity by 2029. The bottleneck is no longer silicon, it's the substation, the transmission right-of-way, and the twenty-year power contract that makes the whole thing possible.

Fig. VIIIThe campusschematic · not to scale

The campus seen as a site plan — multiple data halls arrayed on a grid, the on-site substation linking to the regional grid, cooling ponds and transmission right-of-way visible at the edges.

VIII

The campus

A small city's worth of electricity, poured into one math problem.

Footprint
hundreds of acres · multi-building
Power
1 – 10 gigawatts (1 GW ≈ 750,000 US homes' steady draw)
Current projects
Meta Prometheus · Microsoft Mount Pleasant · Stargate (5 sites)
Stargate target
10 GW aggregate by 2029 — equivalent to ten nuclear reactors

A gigawatt campus uses about as much electricity as 750,000 American homes put together. The bottleneck at this scale is no longer silicon — it's substations, transmission rights-of-way, and long-dated PPAs (Power Purchase Agreements, which are multi-year contracts to buy electricity from a specific generator). This is the end of the AI trade chain, and the reason power utilities made the watchlist: CEG (nuclear), VST (merchant generation), TLN (24/7 baseload).

Used for

  • training the next generation of frontier models ($500M–$1B+ per run)
  • long-horizon hyperscaler inference capacity

Approx cost, 2026

$45B – $55B per gigawatt of capacity

Stargate's 10 GW goal implies $400B+ in infrastructure spend over five years.

Nine orders of magnitude, drawn the same way twice: silicon switches, stacked — and the electricity to run them.

Colophon

Illustrations are schematic — engraved interpretations, not engineering drawings to the millimeter. Die aspect ratios, module mechanical dimensions, and memory package footprints use publicly available reference points and partner documentation; exact figures behind PCI-SIG membership are approximated.

Costs are 2026 estimates from public market reporting; they move. Nothing here is investment advice.

Sources: TSMC · SK hynix · Micron · NVIDIA · DGX docs · DataCenterFrontier · HSBC supply-chain notes · Phosphor Icons

Follow the build

Stick around.

The Anatomy is a tier-by-tier teardown of the AI stack, starting with the GPU. Subscribe to hear when the next tier lands.

One email per new explainer. Nothing else.