What GPU waterblocks does FormulaMod carry?

FormulaMod stocks full-cover GPU waterblocks for over 500 graphics card models, covering RTX 5090, 4090, 3090, RX 9070 XT, and H100/H200 GPUs. We stock over 1,600 water cooling products with same-week worldwide shipping from Guangzhou.

How do I choose a waterblock for my GPU?

Match the waterblock to your exact GPU model and PCB variant. Reference-design cards (Founders Edition, reference Sapphire/PowerColor) use universal reference blocks. Non-reference cards from ASUS ROG Strix, MSI Gaming X Trio, Gigabyte AORUS, and EVGA FTW3 require model-specific full-cover blocks that match their unique PCB layouts. Always verify your GPU's exact model number before ordering.

Will a GPU waterblock reduce noise during AI workloads like Ollama or Stable Diffusion?

Yes. Stock GPU air coolers run at 45-65 dBA under sustained AI inference loads. A full-cover waterblock mounted to a 360mm radiator with a quiet fan curve typically reduces noise to under 30 dBA. VRAM temperatures also drop from 90°C+ to 60-70°C, which prevents throttling during long generation runs.

What components do I need for a complete custom water cooling loop?

A complete custom loop requires: (1) a full-cover GPU waterblock, (2) a radiator — 360mm recommended for single-GPU systems, 480mm for dual GPU or CPU+GPU loops, (3) a D5 or DDC pump with reservoir combo, (4) G1/4 threaded fittings — compression fittings for soft tubing or hard fittings for PETG/acrylic hard tubing, (5) tubing, and (6) premixed coolant. FormulaMod sells individual components and complete kits starting at $249 USD.

Does FormulaMod ship internationally?

Yes. FormulaMod ships worldwide from Guangzhou, China. Standard shipping to the US, EU, UK, Canada, and Australia typically takes 7-14 business days. Express DHL/FedEx options are available at checkout for 3-5 business day delivery. All orders include a tracking number.

How long has FormulaMod been operating?

FormulaMod has operated as an independent water-cooling specialist since 2013. Continuous online presence is publicly verifiable via the Internet Archive Wayback Machine. FormulaMod is a U.S. registered trademark (USPTO Reg. No. 6073949) and ships worldwide from our Guangzhou warehouse with full manufacturer warranty.

Dual RTX 3090 NVLink for 70B LLMs: The Cooling Guide

By Liang Huang, FormulaMod Technical Team · Published Apr 24, 2026 · Updated Apr 24, 2026

2026年4月24日

The migration from 32B to 70B language models is real, and it changes the hardware equation. A single RTX 4090 or 3090 with 24GB of VRAM can run a quantized 70B model, but at heavy quantization that sacrifices quality. Two RTX 3090s connected with NVLink pool their VRAM into 48GB, enough to run Llama 3.3 70B at Q4_K_M with full context length and quality that approaches the unquantized model. At $800-1,100 per used card, a dual-3090 NVLink build costs $1,600-2,200 for the GPUs alone -- still cheaper than a single RTX 5090 at $2,900-3,500 street price, and it gives you 48GB of VRAM instead of 32GB. The catch is cooling: two 350W GPUs generating 700W of sustained heat in a single chassis requires water cooling. Air cooling two 3090s in the same case is not practically possible for 24/7 AI workloads. This guide covers the complete build from hardware selection through loop design and expected performance.

Key Takeaways

Dual RTX 3090s with NVLink = 48GB combined VRAM for $1,600-2,200. This is the cheapest path to running 70B models without aggressive quantization.
NVLink is available on the RTX 3090 (NOT the 3090 Ti). The NVLink bridge connects both cards at 112.5 GB/s bidirectional bandwidth.
Two 350W cards = 700W sustained heat output. Water cooling is the only viable option for this thermal load in a standard tower case.
The loop requires: 2 GPU waterblocks, a 480mm radiator (minimum), a D5 pump, and a distro plate or parallel water path.
Expected performance: Llama 3.3 70B Q4_K_M at approximately 18 tok/sec, stable 24/7, at 28 dBA noise level.

The 32B to 70B Migration: Why Single GPU Is Done

In 2024, 32B-parameter models were the frontier for local AI. Llama 2 70B existed but required extreme quantization on consumer hardware. By early 2026, the landscape shifted. Llama 3.3 70B, Qwen 72B, and DeepSeek R1 32B set new quality benchmarks that made 32B models feel noticeably worse for complex reasoning, coding assistance, and document analysis.

The r/LocalLLaMA community consensus is clear: once you experience 70B model quality, going back to 32B feels like a downgrade. But 70B models at Q4_K_M quantization need approximately 40-44GB of VRAM for the model weights plus KV cache at reasonable context lengths. No single consumer GPU has that much VRAM. The RTX 5090 has 32GB, which is tight for 70B Q4 with limited context. The RTX PRO 6000 has 96GB but costs $8,000+.

Two RTX 3090s with NVLink give you 48GB of pooled VRAM at a fraction of the cost.

Why Dual 3090 Beats Every Other Sub-$2,000 Path

Configuration	Total VRAM	GPU Cost	70B Q4 tok/sec	Notes
Dual RTX 3090 + NVLink	48GB	$1,600-2,200	~18	Cheapest 48GB path; NVLink bandwidth helps
Single RTX 5090	32GB	$2,900-3,500	~44	Faster but limited to Q4 with short context at 70B
Single RTX 4090	24GB	$1,800-2,200	~35 (Q2 quant)	Requires Q2 quantization at 70B = quality loss
Dual RTX 4090 (no NVLink)	48GB (split)	$3,600-4,400	~30	No NVLink on 4090; PCIe tensor parallelism is slower
Mac Studio M4 Ultra (192GB)	192GB unified	$6,999+	~12	Huge VRAM but much slower inference than NVIDIA

Prices as of April 2026. tok/sec estimates for Llama 3.3 70B Q4_K_M on Ollama.

The dual 3090 is not the fastest option. The single RTX 5090 runs inference 2.4x faster. But the dual 3090 costs 40-60% less and has 50% more VRAM. For workloads where VRAM capacity matters more than raw speed -- serving a 70B model for personal use, running RAG with large context windows, or fine-tuning LoRAs on 70B base models -- the dual 3090 is the practical choice.

NVLink on the 3090: What It Buys You

NVLink on the RTX 3090 provides 112.5 GB/s bidirectional bandwidth between two GPUs. This is roughly 4x faster than PCIe 4.0 x16 (31.5 GB/s) for GPU-to-GPU communication.

For language model inference, NVLink provides two main benefits:

Unified VRAM pool: With NVLink, the two GPUs can access each other's VRAM as if it were one 48GB pool. Ollama and llama.cpp handle this transparently -- you load a 70B model and the runtime splits it across both GPUs.
Faster tensor parallelism: When the model is split across GPUs, each inference step requires the GPUs to exchange intermediate results. NVLink's high bandwidth makes this exchange fast, minimizing the performance penalty of splitting.

Critical note: NVLink is available on the RTX 3090 but NOT on the RTX 3090 Ti. The 3090 Ti removed the NVLink connector. If you are buying used 3090s for a dual build, verify that you are getting the standard 3090, not the 3090 Ti. Also note that the NVLink bridge is a separate purchase -- NVIDIA's 3-slot and 4-slot bridges for the 3090 are available on the used market for $40-80.

The Cooling Math: 700W Sustained in One Case

Two RTX 3090s under sustained Ollama load draw approximately 300-350W each, totaling 600-700W of GPU heat alone. Add CPU, motherboard, and SSD power, and the total system heat output approaches 800-900W.

Air cooling this is not viable for 24/7 operation. Here is why:

Two triple-slot air coolers in the same case block each other's airflow. The top card's intake is the bottom card's exhaust. The top GPU runs 10-15C hotter than the bottom.
At 700W, case fans must run at maximum RPM to move enough air. Total noise: 65-75 dBA -- louder than a vacuum cleaner.
VRAM on both cards hits 95-110C, triggering thermal throttling on at least one card.

Water cooling solves this by moving heat out of the case entirely. Two slim waterblocks replace two bulky air coolers, and the heat is carried by liquid to a radiator that can be mounted anywhere -- top, front, or external. The GPU temperatures are determined by radiator size and fan speed, not by inter-card airflow dynamics.

Parts List

GPUs

Two RTX 3090s. Same AIB variant is ideal (simplifies waterblock selection), but mixed variants work as long as both are the standard 3090 (not 3090 Ti) with NVLink connectors.

Waterblocks (2x)

You need two waterblocks matched to your specific 3090 AIB variant. Our recommendations:

MSI Gaming X Trio / Suprim: 2x Bykski MSI 3090 waterblock with active backplane
ASUS TUF: 2x Bykski ASUS TUF 3090 waterblock with active backplane
Gigabyte Aorus: 2x Bykski Gigabyte Aorus 3090 waterblock with active backplane

Active backplane blocks are strongly recommended for a dual build. The 3090 has GDDR6X on both sides of the PCB, and with two cards running at full power, every degree of cooling margin matters.

Radiator

480mm is the minimum. A Bykski 480mm copper radiator handles 700W with fans at moderate speeds (1,000-1,200 RPM). For quieter operation (fans at 600-800 RPM), add a second 240mm or 360mm radiator. The ideal configuration is 480mm + 360mm = 840mm total radiator area, which keeps both GPUs under 55C with fans barely audible.

Pump

A D5 pump is sufficient for a dual-GPU loop. The Barrow D5 pump + reservoir combo provides adequate flow for two waterblocks in series or parallel, with PWM control for noise management.

Distro Plate (Optional but Recommended)

A distro plate simplifies plumbing for a dual-GPU loop by providing a central distribution point for coolant. It eliminates the need for complex tubing runs between the pump, two GPU blocks, and the radiator. The Bykski Core P3 distro plate works in open-frame cases popular with multi-GPU builds.

Supporting Hardware

Motherboard: Must have 2x PCIe x16 slots with at least 4 slots of physical spacing between them (for the NVLink bridge). ASUS WS boards, ASRock Rack, and Gigabyte Aorus Creator boards are popular choices.
CPU: AMD Threadripper or Ryzen 9 for PCIe lane count. Intel Core i9 works if the motherboard supports bifurcated lanes.
NVLink bridge: NVIDIA 3-slot or 4-slot NVLink bridge for RTX 3090 (not A-series or Quadro bridges). Available used for $40-80.
PSU: 1,200W minimum. Two 3090s plus system = 800-900W total draw. A single 1,200W PSU handles this. For extra safety margin, some builders use two 850W PSUs with a dual-PSU adapter.
Case: Full tower with space for 480mm radiator mounting. The Thermaltake Core P3/P5 open-frame cases are popular for multi-GPU water-cooled builds because they provide easy access and unlimited radiator mounting options.

Building the Loop Step by Step

Loop Order

For a dual-GPU loop, the recommended flow order is: Pump → GPU 1 → GPU 2 → Radiator → Reservoir → Pump. Both GPUs in series keeps the plumbing simple and ensures equal flow through both blocks.

An alternative is parallel flow: pump splits into both GPUs simultaneously, then both outputs merge before the radiator. This reduces flow restriction but requires a Y-splitter or distro plate. Parallel flow provides slightly more even temperatures between the two GPUs (1-2C difference) but adds complexity.

Installation Sequence

Install both 3090 waterblocks on the GPUs (fresh thermal pads and paste).
Install the NVLink bridge on the GPU connectors (before installing in the case -- it is easier to align).
Mount both GPUs in the PCIe slots. Verify the NVLink bridge is properly seated.
Mount the radiator(s) and pump in the case.
Connect tubing: pump out → GPU 1 in → GPU 1 out → GPU 2 in → GPU 2 out → radiator in → radiator out → reservoir → pump in.
Fill and leak test for 24 hours with the GPUs unpowered.
First boot: verify both GPUs appear in nvidia-smi and that NVLink is detected (nvidia-smi topo -m).

Expected Performance: Ollama 70B on Dual 3090 NVLink

Metric	Single 3090 (24GB, Q2 Quant)	Dual 3090 NVLink (48GB, Q4_K_M)
Llama 3.3 70B Quantization	Q2_K (poor quality)	Q4_K_M (good quality)
VRAM Used	~22GB	~42GB (split across both GPUs)
Prompt Processing (tok/sec)	~80	~120
Generation (tok/sec)	~8	~18
Max Context Length	~4K tokens	~16K tokens
Quality vs Full Precision	~85%	~96%
GPU Temp (water cooled)	52C	55C (top) / 52C (bottom)
VRAM Junction Temp	65C	68C (top) / 64C (bottom)
Noise at 1m	28 dBA	30 dBA
24h Stability	Stable (but poor quality)	Stable (good quality)

Based on community testing and manufacturer specifications -- actual results vary by loop configuration.

The dual 3090 build at 18 tok/sec with Q4_K_M quantization delivers a qualitatively different experience from a single 3090 forced to use Q2 quantization. The Q4 model produces more coherent reasoning, better code generation, and more accurate document analysis. At 18 tok/sec, the response speed is comfortable for interactive use -- not fast, but fast enough that you are reading the output as it generates rather than waiting for it.

Thermal and Noise Under 24-Hour Load

With a 480mm + 240mm radiator setup and a D5 pump at 50% PWM, the dual 3090 build runs at approximately 30 dBA measured at 1 meter. The top GPU runs 2-3C warmer than the bottom GPU in a series loop configuration. Both GPUs stay well below throttle thresholds.

Over 24 hours of continuous Ollama serving, coolant temperature stabilizes at approximately 35-38C with room temperature at 22-24C. Token generation speed remains consistent from hour 1 to hour 24. There is no thermal drift, no throttling, and no instability -- exactly what you need from a build that serves as your personal AI assistant or team-shared inference server.

Common Pitfalls and How to Avoid Them

Buying 3090 Ti Instead of 3090

The RTX 3090 Ti does NOT have an NVLink connector. It looks similar, has better single-card performance, and costs about the same on the used market. But without NVLink, two 3090 Ti cards cannot pool their VRAM. They can still be used together via PCIe tensor parallelism (as Ollama and llama.cpp support), but the performance is significantly worse than NVLink: PCIe 4.0 provides 31.5 GB/s vs NVLink's 112.5 GB/s. Always verify NVLink connector presence before buying a used 3090 for a dual build.

Mismatched NVLink Bridge Slot Spacing

NVLink bridges come in 3-slot and 4-slot variants, referring to the physical distance between the two GPU cards. Your motherboard's PCIe slot spacing determines which bridge you need. Most ATX motherboards with two x16 slots space them 4 slots apart. Micro-ATX and some workstation boards may use 3-slot spacing. Measure or count slots before ordering the bridge.

Insufficient PSU Wattage

Two 3090s under full AI load draw 600-700W from the GPU power connectors alone. Add CPU, RAM, fans, pump, and motherboard power, and total system draw reaches 800-900W. A 1,000W PSU technically handles this, but with no headroom for transient power spikes during model loading. A 1,200W PSU is the practical minimum. Some builders run dual 850W PSUs with a sync adapter for full redundancy.

Coolant Flow Issues in Long Loop Runs

A dual-GPU loop with 480mm + 240mm radiators and a distro plate has more restriction than a simple single-GPU loop. A DDC pump may struggle -- use a D5, which provides higher head pressure and handles the additional restriction without issue. If you notice temperature differences greater than 5C between the two GPUs in a series loop, increase pump speed or switch to parallel flow.

Ready to Build?

Find waterblocks with active backplane cooling for every major 3090 variant in our Used 3090 AI Revival collection. For the complete multi-GPU cooling solution -- 480mm radiator, D5 pump, distro plate, and all fittings -- explore the Sovereign AI Rig collection. Two used 3090s with NVLink and water cooling remain the most cost-effective path to running 70B models locally in 2026.

ブログに戻る

3件のコメント

There is an mistake with PCIe 4.0 bandwidth. NVLink provides 56 GB/s (112 GB/s bidirectional) and PCIe 4.0 x16 provides 32 GB/s (64 GB/s bidirectional).
So not 4 times faster but less than 2 times.
Also it will be good to provide direct comparison 2×3090 NVLink TP=2 vs 2×3090 PCIe 4.0 x16 TP=2 vs 2×3090 w/o TP (divided by layers).

Not true, MKDM (if that’s your real name), and kind of unfair.

NVLink does literally extend a single shared address space across connected GPUs. Of course it is not flat, performance-wise, so (as is the case for all distributed systems and applications) careful placement of data and orchestration of computation and communication is crucial for optimal performance. Of course performance depends heavily on many attributes of distributed parallel programs and the data and how it is accessed.

I didn’t see any exaggerations or anything factually incorrect in this article and found the recommendations and links interesting, having just finished a dual-3090 NVLink-connected water-cooled build myself without their benefit. Your comments certainly weren’t helpful or illuminating and your noob comment just makes you sound like every other dick on the internet who thinks bringing others low is the same as raising yourself up.

Another noob who doesnt know what theyre talking about.

- NVLink does not literally create one unified 48 GB VRAM pool like a single GPU. Software still has to split/model-parallelize across GPUs.
- Ollama can split models across GPUs, but that is not the same as transparent unified VRAM.
- NVLink can improve GPU-to-GPU communication, but the speedup depends heavily on backend, model, quantization, batch size, and whether peer-to-peer access is working.

Dual RTX 3090s with NVLink give you 48 GB total VRAM across two GPUs, not a true single 48 GB pool. For LLM inference, frameworks can split the model across both cards, and NVLink can reduce the performance penalty versus PCIe-only communication.

カートにアイテムが追加されました

Dual RTX 3090 NVLink for 70B LLMs: The Cooling Guide

Key Takeaways

The 32B to 70B Migration: Why Single GPU Is Done

Why Dual 3090 Beats Every Other Sub-$2,000 Path

NVLink on the 3090: What It Buys You

The Cooling Math: 700W Sustained in One Case

Parts List

GPUs

Waterblocks (2x)

Radiator

Pump

Distro Plate (Optional but Recommended)

Supporting Hardware

Building the Loop Step by Step

Loop Order

Installation Sequence

Expected Performance: Ollama 70B on Dual 3090 NVLink

Thermal and Noise Under 24-Hour Load

Common Pitfalls and How to Avoid Them

Buying 3090 Ti Instead of 3090

Mismatched NVLink Bridge Slot Spacing

Insufficient PSU Wattage

Coolant Flow Issues in Long Loop Runs

Ready to Build?

3件のコメント

コメントを残す

国/地域

言語

Key Takeaways

The 32B to 70B Migration: Why Single GPU Is Done

Why Dual 3090 Beats Every Other Sub-$2,000 Path

NVLink on the 3090: What It Buys You

The Cooling Math: 700W Sustained in One Case

Parts List

GPUs

Waterblocks (2x)

Radiator

Pump

Distro Plate (Optional but Recommended)

Supporting Hardware

Building the Loop Step by Step

Loop Order

Installation Sequence

Expected Performance: Ollama 70B on Dual 3090 NVLink

Thermal and Noise Under 24-Hour Load

Common Pitfalls and How to Avoid Them

Buying 3090 Ti Instead of 3090

Mismatched NVLink Bridge Slot Spacing

Insufficient PSU Wattage

Coolant Flow Issues in Long Loop Runs

Ready to Build?

Related Articles

3件のコメント

コメントを残す

SUBSCRIBE TO OUR EMAILS