AI Server GPU Water Cooling: Why Liquid Cooling Matters for H100, H200, and B200
Share
NVIDIA H100 GPUs have a TDP of 700W, H200 up to 700W, and the upcoming B200 up to 1000W per GPU. In 8-GPU server configurations, this creates 5.6–8kW of heat per chassis — far exceeding air cooling capacity. Direct liquid cooling (DLC) and custom loop solutions reduce data center cooling costs by 30–40% compared to air cooling at these power densities. Understanding these thermal demands is essential for anyone deploying AI infrastructure at scale.
GPU TDP Comparison: The Numbers Driving Liquid Cooling Adoption
Every generation of NVIDIA data center GPUs pushes power envelopes higher. Here is how the flagship SXM modules compare:
| GPU Model | TDP (SXM) | 8-GPU Chassis Heat | Air Coolable? |
|---|---|---|---|
| A100 SXM | 400W | 3.2 kW | Marginal (4U only) |
| H100 SXM | 700W | 5.6 kW | No |
| H200 SXM | 700W | 5.6 kW | No |
| B200 SXM | 1000W | 8.0 kW | No |
| B300 (projected) | 1200W | 9.6 kW | No |
Traditional air cooling in a 4U server chassis maxes out at roughly 350–400W per GPU. That was barely sufficient for A100s and is completely inadequate for anything from the Hopper generation onward. Once you cross the 400W-per-GPU threshold, liquid cooling transitions from a luxury to a hard engineering requirement.
Three Liquid Cooling Approaches for AI Servers
Direct Liquid Cooling (DLC)
DLC is the dominant enterprise approach. Cold plates mount directly onto GPU dies, and facility water circulates through manifolds built into the server tray. Warm water exits to a Coolant Distribution Unit (CDU) where heat is rejected to building chilled water or a dry cooler. DLC handles 80–95% of server heat through liquid, with minimal residual air cooling for VRMs and NVMe drives. Major OEMs like Dell, HPE, and Supermicro ship DLC-ready DGX and HGX platforms.
Custom Loop Liquid Cooling
Custom loops use discrete waterblocks, pumps, reservoirs, and radiators — the same fundamental approach used in enthusiast PCs, but scaled for server hardware. This option offers greater flexibility in cold plate design and coolant choice. However, it requires more maintenance and is typically used in smaller deployments, research labs, or situations where OEM DLC solutions are unavailable for a specific GPU configuration.
Immersion Cooling
Single-phase or two-phase immersion submerges entire servers in dielectric fluid. It captures 100% of heat — including VRMs, DIMMs, and SSDs — and eliminates fans entirely. The tradeoff is higher upfront cost, specialized maintenance, and facility design requirements. Immersion is gaining traction for hyperscale deployments where rack densities exceed 100kW.
Coolant Distribution Units: The Heart of DLC Infrastructure
A CDU sits between the server rack and the facility cooling plant. It contains a heat exchanger, circulation pumps, filtration, and controls. On the server side, it supplies coolant at 35–45°C; on the facility side, it rejects heat to building water or an outdoor dry cooler. CDUs are rated by capacity — a typical 80kW unit serves one rack of 4–8 DLC-equipped servers. Redundant CDU configurations are standard in production environments to avoid single points of failure.
PUE Impact: The Business Case for Liquid Cooling
Power Usage Effectiveness (PUE) measures total facility power divided by IT equipment power. A conventional air-cooled data center runs at PUE 1.4–1.6, meaning 40–60% overhead goes to cooling. Liquid-cooled facilities routinely achieve PUE 1.1–1.2, slashing cooling overhead by 70% or more. For an AI cluster drawing 200kW of GPU power, dropping PUE from 1.6 to 1.1 saves roughly 100kW of continuous cooling power — that is over $80,000 per year in electricity at $0.10/kWh.
Thermal Targets for AI GPUs
NVIDIA specifies a maximum junction temperature of 83°C for H100 SXM under sustained load. Exceeding this triggers thermal throttling, reducing training throughput. Well-designed DLC systems maintain junction temperatures at 60–70°C, providing a 13–23°C margin that prevents throttling even during peak workloads. For the B200 at 1000W, maintaining similar margins requires approximately 40% more coolant flow rate or lower inlet temperatures compared to H100 configurations.
Enterprise and Server Cooling Solutions
Deploying liquid cooling at the server level demands high-quality cold plates, leak-resistant fittings, and reliable pump systems. FormulaMod offers enterprise-grade cooling components — including server-compatible waterblocks, high-flow fittings, and distribution plates — that bridge the gap between consumer custom loops and full OEM DLC infrastructure. For smaller AI clusters and research labs, these components provide a cost-effective path to liquid cooling without committing to a complete facility retrofit.
Frequently Asked Questions
Why can't AI servers use air cooling?
Modern AI GPUs like the H100, H200, and B200 generate 700–1000W each. Air cooling in rack-mount servers maxes out at approximately 350–400W per GPU in a 4U chassis. With 8 GPUs per node, the total heat load of 5.6–8kW simply cannot be dissipated by fans and heatsinks alone without severe throttling or acoustic levels exceeding data center standards.
What is direct liquid cooling (DLC)?
DLC uses cold plates mounted directly on GPU and CPU dies, connected by manifolds to a Coolant Distribution Unit. Facility water or a secondary coolant loop carries heat away from the server to an external heat rejection system. DLC captures 80–95% of server heat through liquid contact, making it far more efficient than air cooling for high-density compute.
What is the target water cooling temperature for H100 GPUs?
NVIDIA specifies a maximum junction temperature of 83°C for the H100 SXM. A properly designed liquid cooling system should maintain junction temperatures between 60–70°C under full load, providing a comfortable margin against thermal throttling. Coolant inlet temperatures of 35–45°C are typical for DLC deployments.
How much does AI server liquid cooling cost?
Costs vary by approach. Rear-door heat exchangers add $3,000–5,000 per rack. Full DLC with CDU runs $15,000–30,000 per rack depending on density and redundancy. Immersion cooling ranges from $50,000–100,000 per tank setup. For smaller deployments, custom loop solutions using quality components can start at $2,000–5,000 per server node.
Can custom PC waterblocks work on server GPUs?
Generally no. Server GPUs like the H100 SXM use proprietary mezzanine connectors and different PCB layouts than consumer GPUs. They require purpose-built cold plates designed for the SXM module footprint and mounting system. However, PCIe variants like the H100 PCIe or L40S sometimes share mounting patterns with consumer GPUs, and select custom waterblock manufacturers produce compatible designs for these models.
Related Articles
Related Articles
12VHPWR Safer RTX 5090 Build: Why Water Cooling Lets You Skip the Cable Drama
12vhpwr · 2026 · cable safety · rtx 5090 · safer build · shunt resistor · undervolt · water coolingDual RTX 3090 NVLink for 70B LLMs: The Cooling Guide
70b llm · dual rtx 3090 · llama 70b · multi gpu · nvlink · ollama · sovereign ai · water coolingH100 Water Cooling Guide: Liquid Cooling for AI Research GPUs
ai research · bykski · h100 · h200 · nvidia · sovereign ai · university lab · water cooling
RTX 4090 Water Cooling Guide: Silence Your 450W AI Workhorse
4090 quiet · ai workstation · barrow · bykski · custom loop · granzon · rtx 4090 · water cooling · waterblock