What GPU waterblocks does FormulaMod carry?

FormulaMod stocks full-cover GPU waterblocks for over 500 graphics card models, covering RTX 5090, 4090, 3090, RX 9070 XT, and H100/H200 GPUs. We stock over 1,600 water cooling products with same-week worldwide shipping from Guangzhou.

How do I choose a waterblock for my GPU?

Match the waterblock to your exact GPU model and PCB variant. Reference-design cards (Founders Edition, reference Sapphire/PowerColor) use universal reference blocks. Non-reference cards from ASUS ROG Strix, MSI Gaming X Trio, Gigabyte AORUS, and EVGA FTW3 require model-specific full-cover blocks that match their unique PCB layouts. Always verify your GPU's exact model number before ordering.

Will a GPU waterblock reduce noise during AI workloads like Ollama or Stable Diffusion?

Yes. Stock GPU air coolers run at 45-65 dBA under sustained AI inference loads. A full-cover waterblock mounted to a 360mm radiator with a quiet fan curve typically reduces noise to under 30 dBA. VRAM temperatures also drop from 90°C+ to 60-70°C, which prevents throttling during long generation runs.

What components do I need for a complete custom water cooling loop?

A complete custom loop requires: (1) a full-cover GPU waterblock, (2) a radiator — 360mm recommended for single-GPU systems, 480mm for dual GPU or CPU+GPU loops, (3) a D5 or DDC pump with reservoir combo, (4) G1/4 threaded fittings — compression fittings for soft tubing or hard fittings for PETG/acrylic hard tubing, (5) tubing, and (6) premixed coolant. FormulaMod sells individual components and complete kits starting at $249 USD.

Does FormulaMod ship internationally?

Yes. FormulaMod ships worldwide from Guangzhou, China. Standard shipping to the US, EU, UK, Canada, and Australia typically takes 7-14 business days. Express DHL/FedEx options are available at checkout for 3-5 business day delivery. All orders include a tracking number.

How long has FormulaMod been operating?

FormulaMod has operated as an independent water-cooling specialist since 2013. Continuous online presence is publicly verifiable via the Internet Archive Wayback Machine. FormulaMod is a U.S. registered trademark (USPTO Reg. No. 6073949) and ships worldwide from our Guangzhou warehouse with full manufacturer warranty.

GPU waterblock water cooling setup for AI workstation — VRAM Overheating: How to Monitor and Fix GPU Memory Temperat

VRAM Overheating: How to Monitor and Fix GPU Memory Temperatures

By Liang Huang, FormulaMod Technical Team · Published Apr 15, 2026 · Updated Apr 22, 2026

April 15, 2026

The Temperature You Are Not Watching

When people check their GPU temperature, they look at one number: the GPU core temperature. MSI Afterburner shows it. nvidia-smi shows it. The number stays at 75-85C under load, and they think everything is fine.

It is not fine.

Modern high-end GPUs have a second thermal hotspot that almost no one monitors: the VRAM. On the RTX 3090, 4090, and 5090, the GDDR6X memory modules run significantly hotter than the GPU die — and they have their own thermal throttle point that directly affects your AI workload performance.

The XDA Developers article "Your VRAM is Overheating While You Watch the Wrong Temperature" brought this issue to mainstream attention. This guide goes deeper: what VRAM thermal throttling actually is, how to monitor it, why it specifically affects AI workloads, and how to fix it — from free software changes to the permanent hardware solution.

What Is VRAM Thermal Throttling?

GDDR6X memory (used on RTX 3090, 3090 Ti, 4090, and 5090) operates at extremely high data rates — up to 21 Gbps per pin on the 4090. That speed generates substantial heat in each memory module. The memory controller monitors the junction temperature of the VRAM modules and begins reducing memory bandwidth when junction temperature exceeds 92-95C (varies by SKU).

This is not the same as GPU core throttling. You can have a GPU core at 70C (perfectly fine) while your VRAM is at 98C (actively throttling). The two temperatures are not correlated in the way most people expect.

Why VRAM Runs Hotter Than the GPU Die

Physical location: VRAM modules are spread around the GPU die on the PCB. Air coolers focus their cooling mass (heatpipes, vapor chamber) on the GPU die. VRAM modules get secondary cooling through the baseplate or small dedicated thermal pads — far less effective.
Power density: GDDR6X modules are small (each die is roughly 10x14mm) but each dissipates 2-4W. With 12-24 modules on a 4090, that is 25-90W of heat from a surface area that is difficult to cool with passive contact alone.
Backside modules: On some GPU models, VRAM modules are on both sides of the PCB. The modules on the back (away from the cooler) are cooled only through the backplate — often poorly.

How VRAM Throttling Affects AI Workloads

VRAM throttling matters more for AI than for gaming because of how AI workloads use memory.

Gaming vs. AI Memory Access Patterns

Access Pattern	Gaming	AI Inference (Ollama, vLLM, ComfyUI)
VRAM utilization	4-12GB (partial)	16-24GB (near-full for large models)
Access frequency	Burst reads per frame, idle between frames	Continuous reads during token generation
Bandwidth sensitivity	Moderate (GPU compute is usually the bottleneck)	High (model weights must be read from VRAM every token)
Impact of throttling	1-3 FPS drop (barely noticeable)	10-30% tokens/sec drop (very noticeable)

When Ollama generates text, it reads model weights from VRAM on every forward pass. For a 70B model (4-bit quantized) on an RTX 4090, that is roughly 35GB of VRAM reads per token. The RTX 4090's memory bandwidth is 1008 GB/s at full speed. VRAM throttling can reduce effective bandwidth to 700-800 GB/s — a 20-30% hit that directly translates to 20-30% fewer tokens per second.

Symptoms of VRAM Throttling During AI Workloads

Tokens/sec drops after 10-20 minutes: Performance starts strong (while VRAM is cool) and degrades as temperatures rise. If your first minute of generation is 40 t/s and it drops to 32 t/s after 15 minutes, VRAM throttling is the probable cause.
Stable Diffusion images take progressively longer: The first image in a batch renders fast. Each subsequent image takes slightly longer as VRAM heats up.
Unexpected OOM errors at high temperature: Your model fits in VRAM at boot, but after sustained use, you get CUDA out-of-memory errors. The memory controller is restricting access to overheating modules, effectively reducing available VRAM.
nvidia-smi shows memory clock below rated speed: If you see memory clock dropping from the rated value (e.g., 10501 MHz on RTX 4090) during sustained load, memory is thermally throttling.

How to Monitor VRAM Temperature

Windows: HWiNFO64

HWiNFO64 is the most reliable way to read VRAM junction temperature on Windows.

Download and install HWiNFO64 (free for personal use)
Launch in "Sensors Only" mode
Scroll to the GPU section
Look for "GPU Memory Junction Temperature" — this is the VRAM junction temp
Run your AI workload and watch the "Maximum" column to see peak VRAM temperature

Important: Not all GPU models expose this sensor. Most RTX 3090, 4090, and 5090 variants do. Some older or lower-end cards do not. If you do not see this reading, your GPU's firmware may not expose the VRAM thermal sensor.

Windows: GPU-Z

GPU-Z also reads VRAM temperature on supported cards. Open the Sensors tab and look for "Memory Temperature." It updates in real-time and can log to a file for long-term monitoring.

Linux: nvidia-smi (Limited)

On Linux, nvidia-smi reports basic GPU temperature but often does not expose VRAM junction temperature separately. For NVIDIA GPUs on Linux, you have a few options:

nvidia-smi -q -d TEMPERATURE — shows all available temperature sensors, including memory temperature on supported GPUs
nvidia-smi --query-gpu=temperature.memory --format=csv — directly queries memory temperature (returns "N/A" on unsupported GPUs)
nvtop — some versions display VRAM temperature if available from the driver

The Linux situation is less complete than Windows for VRAM monitoring. This is one of the challenges flagged by the r/LocalLLaMA community — "AI Linux enthusiasts running RTX GPUs: your cards can overheat silently" because standard Linux tools may not expose the critical VRAM sensor.

What Is a Safe VRAM Temperature?

Temperature Range	Status	Action
Under 80C	Excellent — no performance impact	No action needed
80-90C	Acceptable — no throttling but approaching limits	Monitor regularly, ensure cooling is not degrading
90-95C	Warning — memory controller begins throttling	Improve cooling: better thermal pads, case airflow, or water cooling
95-105C	Critical — active throttling, performance reduced	Immediate action: undervolt, add airflow, plan water cooling upgrade
Above 105C	Dangerous — risk of permanent damage	Stop the workload, investigate cooling failure

Fix 1: Thermal Pad Replacement (Air Cooler)

If you want to stay on air cooling, the most effective upgrade is replacing the stock thermal pads between the VRAM modules and the heatsink baseplate.

Why Stock Thermal Pads Are Often Inadequate

GPU manufacturers use thermal pads that balance cost, manufacturability, and adequate (not optimal) thermal performance. Stock pads are typically 1-3 W/mK thermal conductivity. Aftermarket pads reach 6-12.5 W/mK — a 2-6x improvement in thermal transfer.

For used RTX 3090s specifically, thermal pad degradation is a primary reason for poor VRAM temperatures. After 2-3 years of use, pads compress, dry out, and lose contact. Replacing them with fresh high-quality pads can drop VRAM temperatures by 10-15C.

Recommended Thermal Pads

Thermalright Extreme Odyssey 12.5 W/mK — high conductivity, available in 1mm, 1.5mm, and 2mm thicknesses
Bykski 6W thermal pads (B-GRP-1.5-X) — designed for GPU block VRAM contact, good value
Barrow thermal pad kit (GJ-HCE) — complete kit with thermal pads and paste for GPU block installation

Pad Thickness Matters

Using the wrong thickness thermal pad is a common mistake. Too thin: the pad does not make contact, leaving an air gap (thermal insulator). Too thick: the pad compresses and pushes the heatsink away from the GPU die, reducing GPU die cooling.

Check your GPU model's teardown guide (available on YouTube and forums) for the correct pad thicknesses. Common values:

VRAM modules (front): 1.0-2.0mm depending on model
VRAM modules (back/backplate): 1.5-3.0mm depending on model
VRM/MOSFET: 1.0-1.5mm

Fix 2: Full-Cover Waterblock (The Permanent Solution)

A full-cover GPU waterblock solves the VRAM temperature problem at its root. Unlike air coolers that focus cooling on the GPU die, a full-cover block makes direct thermal contact with:

The GPU die (via thermal paste through a copper cold plate)
All front-side VRAM modules (via thermal pads against the block's baseplate)
VRM/MOSFET components (via thermal pads)
Back-side VRAM (via thermal pads against an active backplate, on blocks that include one)

Coolant flows directly over the areas where VRAM thermal pads make contact, carrying heat away continuously. This is fundamentally more effective than an air cooler where VRAM heat must conduct through a shared metal baseplate before reaching the fin stack.

Typical VRAM Temperature Improvement

GPU	VRAM Temp (Air, Sustained AI)	VRAM Temp (Water, Sustained AI)	Improvement
RTX 3090 (used, aged thermal pads)	95-110C	55-68C	-35 to -45C
RTX 3090 (fresh thermal pads on air)	85-95C	55-68C	-25 to -30C
RTX 4090	88-100C	58-72C	-25 to -30C
RTX 5090	85-95C	55-70C	-25 to -30C

The used RTX 3090 shows the most dramatic improvement because the starting point is often the worst — aged thermal pads, 2-3 years of heat cycling, and GDDR6X modules that were already marginal on air. A waterblock plus fresh thermal pads is effectively a thermal reset for the entire card. See our used RTX 3090 revival guide for the complete process.

Waterblock Options

RTX 3090 blocks with active backplate cooling (best for VRAM on the back of the PCB):

RTX 4090 blocks:

Bykski ASUS ROG/TUF RTX 4090
Barrow RTX 4090 Founders Edition
Granzon Full Armor RTX 4090 FE (maximum VRAM coverage)

RTX 5090 blocks:

Before and After: Real-World Data

Community-reported data from RTX 4090 builds running sustained AI inference workloads:

Metric	Stock Air Cooler	Air + Replaced Pads	Full-Cover Waterblock
GPU Core (sustained)	83-87C	80-84C	45-55C
VRAM Junction (sustained)	94-102C	82-90C	58-72C
Memory Clock (sustained)	10001-10501 MHz (throttled)	10501 MHz (stable)	10501 MHz (stable)
Tokens/sec (Llama 70B 4-bit)	34-38 t/s (degrades over time)	38-40 t/s (stable)	40-43 t/s (stable)
Noise Level	50-65 dBA	50-65 dBA	25-32 dBA

Key finding: thermal pad replacement on air cooling stabilizes VRAM temperatures and prevents throttling, but does not significantly reduce GPU core temperature or noise. Water cooling addresses all three: VRAM, core, and noise.

Decision Framework

Your Situation	Recommendation	Expected Cost
VRAM above 95C but want to stay on air	Replace thermal pads with 12.5 W/mK pads	$15-30
Used RTX 3090 with unknown pad condition	Waterblock + fresh pads (complete thermal reset)	$80-130 for block + pads
RTX 4090/5090, 24/7 AI workload, noise matters	Full-cover waterblock + custom loop	$350-500 for complete loop
RTX 4060 Ti / 4070, intermittent Ollama use	Monitor VRAM temps, likely fine on air	$0

Start by monitoring. If you do not know your VRAM temperatures, you cannot make an informed decision. Open HWiNFO64 or GPU-Z, run your typical AI workload for 30 minutes, and check the peak VRAM junction temperature. If it is under 90C, you are fine. If it is above 92C, you are losing performance right now.

Linux Users: The Silent Overheating Problem

A specific warning for Linux AI builders: on many Linux distributions, VRAM junction temperature is not exposed through standard monitoring tools. The r/LocalLLaMA community has documented cases where Linux users ran RTX 3090s and 4090s at VRAM temperatures above 100C for weeks without realizing it — because neither nvidia-smi nor nvtop showed the VRAM sensor.

If you run AI workloads on Linux and cannot see VRAM temperature in your monitoring tools, assume it is 15-25C higher than your GPU core temperature under sustained load. For an air-cooled 4090 showing 82C core temperature during inference, VRAM is likely at 95-105C — actively throttling.

The safest approach for Linux users is to use nvidia-smi's power limit feature (nvidia-smi -pl 350) to reduce total heat output and thus VRAM temperature. Or install a waterblock — which makes the monitoring gap moot because VRAM temperatures stay well below the throttle point regardless.

Active Backplate Cooling: The VRAM Solution for RTX 3090

The RTX 3090 has a unique thermal challenge: VRAM modules on both sides of the PCB. The front-side modules contact the waterblock directly. The back-side modules are only cooled through the backplate — which on most blocks is a passive metal plate that relies on convection and radiation to dissipate heat.

Active backplate waterblocks route coolant through the backplate itself, providing direct liquid cooling to the back-side VRAM. This drops back-side VRAM temperatures by an additional 10-15C compared to passive backplates.

Bykski's active backplane (TC series) blocks for the RTX 3090 include this feature:

For used RTX 3090 buyers running large models where VRAM stability is critical, an active backplate block is the strongest thermal insurance available.

For the used RTX 3090 revival path, see our complete guide. For GPU-specific waterblock selection, browse our AI Workstation Cooling collection. And for the full cooling system design, read our radiator sizing guide and cost breakdown.

Back to blog

Item added to your cart

VRAM Overheating: How to Monitor and Fix GPU Memory Temperatures

The Temperature You Are Not Watching

What Is VRAM Thermal Throttling?

Why VRAM Runs Hotter Than the GPU Die

How VRAM Throttling Affects AI Workloads

Gaming vs. AI Memory Access Patterns

Symptoms of VRAM Throttling During AI Workloads

How to Monitor VRAM Temperature

Windows: HWiNFO64

Windows: GPU-Z

Linux: nvidia-smi (Limited)

What Is a Safe VRAM Temperature?

Fix 1: Thermal Pad Replacement (Air Cooler)

Why Stock Thermal Pads Are Often Inadequate

Recommended Thermal Pads

Pad Thickness Matters

Fix 2: Full-Cover Waterblock (The Permanent Solution)

Typical VRAM Temperature Improvement

Waterblock Options

Before and After: Real-World Data

Decision Framework

Linux Users: The Silent Overheating Problem

Active Backplate Cooling: The VRAM Solution for RTX 3090

Leave a comment

Country/region

The Temperature You Are Not Watching

What Is VRAM Thermal Throttling?

Why VRAM Runs Hotter Than the GPU Die

How VRAM Throttling Affects AI Workloads

Gaming vs. AI Memory Access Patterns

Symptoms of VRAM Throttling During AI Workloads

How to Monitor VRAM Temperature

Windows: HWiNFO64

Windows: GPU-Z

Linux: nvidia-smi (Limited)

What Is a Safe VRAM Temperature?

Fix 1: Thermal Pad Replacement (Air Cooler)

Why Stock Thermal Pads Are Often Inadequate

Recommended Thermal Pads

Pad Thickness Matters

Fix 2: Full-Cover Waterblock (The Permanent Solution)

Typical VRAM Temperature Improvement

Waterblock Options

Before and After: Real-World Data

Decision Framework

Linux Users: The Silent Overheating Problem

Active Backplate Cooling: The VRAM Solution for RTX 3090

Related Articles

Leave a comment

SUBSCRIBE TO OUR EMAILS