
I just watched Dylan Patel's deep dive on the 3 big bottlenecks to scaling AI compute and I feel like I woke up from another level of Inception. Every time you think you've found the real constraint, you peel back another layer and discover something deeper.
Dylan is the founder and CEO of SemiAnalysis, and his analysis of the AI infrastructure buildout is the clearest picture I've seen of what's actually happening beneath the hype.
The bottlenecks shift over time, but they stack on top of each other:
What makes this mind-bending is that solving one bottleneck just reveals the next one. TSMC ramps CoWoS packaging capacity? Great, now you're blocked on HBM supply. Memory vendors scale up? Now you can't get enough power to the data center. Get the power? You still can't get enough EUV lithography tools to make the chips in the first place.
This is where it gets Inception-level deep. By 2028-2030, Dylan argues the ultimate constraint falls to ASML — the Dutch company that makes the world's most complicated machine: the EUV lithography tool.
The numbers are staggering:
So if you do the math: ~700 cumulative EUV tools by 2030 yields roughly 200 gigawatts maximum. Meanwhile, Sam Altman is talking about wanting 52 gigawatts per year. The numbers don't add up.
And here's the kicker — each EUV tool has 10,000+ suppliers across extraordinarily complex subsystems (Zeiss optics, Cymer light sources, mechanical stages with nanometer precision). You can't just throw money at this. The expertise required to build these machines takes years to develop.
Dylan drops a number that I keep coming back to: a $50 billion gigawatt of data center capacity depends on roughly $1.2 billion in EUV tooling. That's an insane leverage ratio. One company's production capacity — constrained by physics and supply chain complexity — determines whether tens of billions in infrastructure investment can actually produce useful compute.
It's like discovering that the entire global economy runs through a single bridge, and that bridge can only handle so many cars per hour.
One counterintuitive insight: GPUs aren't actually depreciating the way people assume. Dylan argues that an H100 is worth more today than when it launched, because newer models and architectures extract more value per chip. The software is getting better at using the hardware.
This matters because it means the trillion-dollar infrastructure buildout isn't a depreciating asset race. The chips retain value as long as the models keep improving their efficiency on existing hardware.
Memory vendors are expected to double or triple prices as HBM demand outstrips supply. The interesting adaptation: some inference workloads may shift to commodity DRAM, accepting latency tradeoffs for non-real-time agent applications. Not everything needs the fastest memory — a background agent processing your emails can wait a few extra milliseconds.
By 2028, there's an estimated gap of 50+ gigawatts in power generation for AI data centers. The fundamental problem is a timing mismatch: AI companies want data centers built in 18 months, but adding power generation to the grid takes 5+ years on average.
Microsoft's annual CapEx is projected to surpass $80 billion (up from ~$15 billion five years ago). Total annual AI data center investment could reach $400-500 billion by mid-decade. All of it constrained by whether you can actually power the buildings.
If you're building AI-powered products, this has practical implications:
What gave me the Inception feeling isn't any single bottleneck — it's the recursive nesting. You think the problem is chips, but it's actually memory. You think it's memory, but it's actually power. You think it's power, but it's actually the machines that make the chips. And the machines that make the chips depend on optics from a single German company and light sources that push the boundaries of physics.
Each layer seems like the "real" world until you zoom out and realize you're still dreaming.
The AI infrastructure buildout is the largest industrial project in human history, and it's constrained by supply chains that were designed for a world that needed far less compute. We're trying to push exponential demand through linear supply chains. Something has to give.
Note: Some data points in this post come from supplementary SemiAnalysis research and other Dylan Patel appearances, not solely from this video.
Building a Multi-Agent Loan Approval System with Human-in-the-Loop
A demo of multi-agent AI orchestration where three specialized reviewers analyze loan applications independently, stream results in real-time via SSE, and a human makes the final call.
What I Tell Teams About Claude Code
Honest advice for small teams adopting Claude Code — from primary sources to force multipliers to why your codebase quality matters more than ever.