AI·Mar 13, 2026

The Inception of AI Infrastructure: Bottlenecks All the Way Down

Dylan Patel's breakdown of the 3 big bottlenecks to scaling AI compute reveals a supply chain so deeply nested it feels like waking up from another level of Inception.

Chris Towles

Nested layers of AI infrastructure bottlenecks

I just watched Dylan Patel's deep dive on the 3 big bottlenecks to scaling AI compute and I feel like I woke up from another level of Inception. Every time you think you've found the real constraint, you peel back another layer and discover something deeper.

Dylan is the founder and CEO of SemiAnalysis, and his analysis of the AI infrastructure buildout is the clearest picture I've seen of what's actually happening beneath the hype.

The Three Bottlenecks

The bottlenecks shift over time, but they stack on top of each other:

Logic chips — GPUs and custom silicon
Memory — HBM and DRAM
Power — Electrical infrastructure and cooling

What makes this mind-bending is that solving one bottleneck just reveals the next one. TSMC ramps CoWoS packaging capacity? Great, now you're blocked on HBM supply. Memory vendors scale up? Now you can't get enough power to the data center. Get the power? You still can't get enough EUV lithography tools to make the chips in the first place.

ASML: The Bottleneck Beneath All Bottlenecks

This is where it gets Inception-level deep. By 2028-2030, Dylan argues the ultimate constraint falls to ASML — the Dutch company that makes the world's most complicated machine: the EUV lithography tool.

The numbers are staggering:

ASML currently produces ~70 EUV tools per year
Even with aggressive scaling, they'll reach maybe 100 by end of decade
Each tool costs $300-400 million
A single gigawatt of AI compute requires roughly 3.5 EUV tools

So if you do the math: ~700 cumulative EUV tools by 2030 yields roughly 200 gigawatts maximum. Meanwhile, Sam Altman is talking about wanting 52 gigawatts per year. The numbers don't add up.

And here's the kicker — each EUV tool has 10,000+ suppliers across extraordinarily complex subsystems (Zeiss optics, Cymer light sources, mechanical stages with nanometer precision). You can't just throw money at this. The expertise required to build these machines takes years to develop.

The Leverage Ratio That Broke My Brain

Dylan drops a number that I keep coming back to: a $50 billion gigawatt of data center capacity depends on roughly $1.2 billion in EUV tooling. That's an insane leverage ratio. One company's production capacity — constrained by physics and supply chain complexity — determines whether tens of billions in infrastructure investment can actually produce useful compute.

It's like discovering that the entire global economy runs through a single bridge, and that bridge can only handle so many cars per hour.

The GPU Depreciation Myth

One counterintuitive insight: GPUs aren't actually depreciating the way people assume. Dylan argues that an H100 is worth more today than when it launched, because newer models and architectures extract more value per chip. The software is getting better at using the hardware.

This matters because it means the trillion-dollar infrastructure buildout isn't a depreciating asset race. The chips retain value as long as the models keep improving their efficiency on existing hardware.

Memory Is About to Get Expensive

Memory vendors are expected to double or triple prices as HBM demand outstrips supply. The interesting adaptation: some inference workloads may shift to commodity DRAM, accepting latency tradeoffs for non-real-time agent applications. Not everything needs the fastest memory — a background agent processing your emails can wait a few extra milliseconds.

Power: The 50GW Gap

By 2028, there's an estimated gap of 50+ gigawatts in power generation for AI data centers. The fundamental problem is a timing mismatch: AI companies want data centers built in 18 months, but adding power generation to the grid takes 5+ years on average.

Microsoft's annual CapEx is projected to surpass $80 billion (up from ~$15 billion five years ago). Total annual AI data center investment could reach $400-500 billion by mid-decade. All of it constrained by whether you can actually power the buildings.

Why This Matters for Software Engineers

If you're building AI-powered products, this has practical implications:

Compute costs aren't going down anytime soon. Plan for expensive inference, especially for real-time applications.
Efficiency matters more than scale. The companies winning will be those extracting more value per FLOP, not just throwing more FLOPs at problems.
The agent paradigm helps. Async, non-real-time agent workloads can use cheaper compute tiers and commodity memory. Design your systems to be latency-tolerant where possible.
Edge inference is underrated. Anything you can push to the device sidesteps the entire data center bottleneck chain.

The Inception Feeling

What gave me the Inception feeling isn't any single bottleneck — it's the recursive nesting. You think the problem is chips, but it's actually memory. You think it's memory, but it's actually power. You think it's power, but it's actually the machines that make the chips. And the machines that make the chips depend on optics from a single German company and light sources that push the boundaries of physics.

Each layer seems like the "real" world until you zoom out and realize you're still dreaming.

The AI infrastructure buildout is the largest industrial project in human history, and it's constrained by supply chains that were designed for a world that needed far less compute. We're trying to push exponential demand through linear supply chains. Something has to give.

Note: Some data points in this post come from supplementary SemiAnalysis research and other Dylan Patel appearances, not solely from this video.

Watch the full conversation: Dylan Patel — Deep Dive on the 3 Big Bottlenecks to Scaling AI Compute or read the transcript on Dwarkesh Patel's site

Building a Multi-Agent Loan Approval System with Human-in-the-Loop

A demo of multi-agent AI orchestration where three specialized reviewers analyze loan applications independently, stream results in real-time via SSE, and a human makes the final call.

What I Tell Teams About Claude Code

Honest advice for small teams adopting Claude Code — from primary sources to force multipliers to why your codebase quality matters more than ever.