Direct-to-Chip vs Immersion Cooling: Which One’s Actually Right For You?

7 Mar

We need liquid cooling! Cool. Which kind? This is where the conversation usually goes sideways. Because "liquid cooling" isn't one thing - it's like saying "I need a vehicle." Okay, but do you need a ute, a sedan, or a semi-truck?The two main technologies are direct-to-chip cooling and immersion cooling. They solve similar problems but in very different ways. Let's break it down without the marketing BS.

Surgical precision with Direct-to-Chip - think of this like attaching a water-cooling system directly to your car's engine block instead of trying to cool the whole car. How it works is a small heat exchangers (cold plates) mount directly onto your hottest components—CPUs, GPUs, sometimes memory. Coolant flows through tiny channels in these cold plates, absorbs the heat, and carries it away to a Coolant Distribution Unit (CDU). The CDU is basically a big heat exchanger that dumps the heat into your facility's cooling water, and cooled liquid goes back to the servers. The rest of the server - storage, networking, whatever still uses regular air cooling or passive cooling. You're only liquid-cooling what actually needs it.

What it's good at:

Proven at scale: Microsoft, Meta, AWS are all using this for AI infrastructure
Retrofit-friendly: You can implement this in existing racks without rebuilding everything
Flexible: Cool just the hot stuff, leave the rest alone
Familiar operations: Your team understands servers; this is just servers with plumbing
OEM support: Dell, HPE, Supermicro all offer servers with cold plates built-in

Capacity: 50-200 kW per rack, PUE: 1.1-1.2 (when you're removing 60-80% of heat via liquid)

Where it struggles:

Can't handle extreme density (>200 kW racks)
Still needs some airflow for components not liquid-cooled
Cold plates need replacing every 3-5 years (thermal paste degrades)
More complex than air-cooled servers (more points of failure)

Best for:

Hyperscale AI training (NVIDIA GB200, AMD MI300X)
Colocation high-density zones
Retrofitting existing facilities
Mixed workload environments

Immersion Cooling is exactly what it sounds like: dunk the whole server in liquid. It works by putting servers into tanks filled with special non-conductive fluid (dielectric fluid). Heat transfers directly from every component into the liquid. The liquid circulates through heat exchangers where your facility cooling water removes the heat, then it goes back to the tank.
There are two flavors. Single-phase, where liquid stays liquid. It's just circulating and absorbing heat. Two-phase, where liquid boils (around 50-60°C), capturing way more energy. Vapor rises, hits condensers at the top, turns back to liquid, drips down. It's like a self-contained rain system.

What it's good at:

Extreme density: Can handle rack densities that are physically impossible otherwise (100-500+ kW)
Uniform cooling: Every component gets cooled equally—no hot spots
Best-in-class efficiency: PUE 1.02-1.08
Simplified servers: No fans, no heatsinks, less complexity
Silent: Completely quiet operation
Harsh environment proof: Perfect for dusty, hot, extreme conditions

Capacity: 100-500+ kW per rack, PUE: 1.02-1.08

Where it struggles:

Higher upfront cost (tanks, fluid, specialized servers)
Operational complexity (fluid management, handling procedures)
Limited OEM support (mostly custom/white-box servers)
Hard to retrofit (usually needs purpose-built infrastructure)
Less proven at hyperscale (technology still maturing)

Best for:

Ultra-high-density AI inference (future 600+ kW racks)
Cryptocurrency mining (proven use case)
Mining/industrial edge deployments (harsh environments)
Greenfield builds optimized for maximum efficiency
Research/HPC pushing density limits

The Cost Reality

Let's say you're deploying 5 MW of AI compute. Here's what each approach costs:

Direct-to-Chip:

Servers: ~$25M (including cold plates)
CDUs: $1.5M
Piping, electrical, controls: $2M
Total: $28.5M
Annual operating cost: $700K

Immersion:

Servers: ~$22M (simpler—no fans, heatsinks)
Tanks: $3M
Dielectric fluid: $1.5M
Facility work: $1.5M
Total: $28M
Annual operating cost: $550K (lower PUE)

They're surprisingly similar for new builds. Immersion has lower operating costs but higher infrastructure costs. Direct-to-chip is slightly cheaper upfront but costs more to run.

The real difference shows up in retrofits:

Retrofitting for direct-to-chip: $3-5M
Retrofitting for immersion: $8-12M+ (often needs major structural work)

Go with direct-to-chip if:

You're retrofitting an existing facility
You have mixed workload density (some standard, some high)
You're using OEM servers (Dell, HPE, etc.)
Your ops team is familiar with traditional data centers
You need moderate density (50-150 kW racks)
You want proven, low-risk technology

Go with immersion if:

You're building greenfield and can design around it
You need extreme density (200+ kW racks)
You're in harsh environments (mining, industrial, extreme heat)
Maximum efficiency is critical (sustainability mandates, ESG)
You're comfortable with operational complexity
You want to future-proof for 300-600 kW rack densities coming by 2027-2028

The Hybrid Approach

Here's what smart operators are doing: both.

Direct-to-chip for 60-70% of the facility (primary AI workloads)
Immersion for 10-20% (ultra-high-density specialized stuff)
Traditional air for 10-30% (legacy and standard workloads)

This maximizes flexibility. You're not over-investing in immersion for workloads that don't need it, but you have the capability when customers need extreme density.

What We Actually Recommend

For most Australian operators: start with direct-to-chip.

Why?

Proven at hyperscale
Retrofit-friendly
OEM server support
Your team can operate it
Handles 90% of current AI workloads

Immersion makes sense for:

Greenfield builds designed for max efficiency
Remote/harsh edge deployments
Operators targeting ultra-high-density markets
Teams comfortable pioneering new tech

The landscape will shift by 2027-2028. Rack densities are heading toward 300-600 kW. At that point, direct-to-chip might hit physical limits and immersion becomes necessary rather than nice-to-have. But for now? Start with direct-to-chip. Build expertise. Evaluate immersion later for specific use cases.

Need Help Deciding?

Technology choice depends on your facility, customer mix, timeline, and risk tolerance. We do free assessments that model both approaches for your specific situation.

Mark Young

Direct-to-Chip vs Immersion Cooling: Which One’s Actually Right For You?

The 18-Month Head Start: Why Retrofitting Beats Waiting For New Builds.

Why Australian Data Centers Are Fighting With One Hand Tied Behind Their Back?