From Assumption to Evidence: Scaling AI Infrastructure With Data
Borrowing a page from evolutionary biology to understand what real scale looks like in AI infrastructure.
I’ll use this stock image of Lake Victoria to remind everyone that well-designed, closed-loop data centers consume less water than your neighborhood pizza parlor. Now back to the topic at hand.
Roughly 15,000 years ago, Lake Victoria nearly disappeared. A shift in East Africa's climate dried the basin down to a fraction of its former self, and when the rains returned and the lake refilled, a small founding population of unremarkable cichlid fish moved back into what was close to a blank slate. What happened next is one of the fastest evolutionary events ever documented. In roughly 15,000 years, that single founding population radiated into somewhere around 500 distinct species: algae scrapers, snail crushers, open-water piscivores, insectivores. Each new species was shaped by the specific niche it settled into, faster than scientists once believed evolution could plausibly work.
(Sure, this seems like a strange metaphor for the data center industry. But humor me on this. We can probably agree that there’s reason to look for parallels with evolution and natural selection.)
The emergence of modern AI is dramatically shifting the industry’s climate. For those of us building data centers, the environment hasn’t merely become more demanding. It has reset, all at once, and a new range of niches have opened up that the old dominant design was never built to fill. What's emerging in response isn't a bigger data center. It's a different species entirely, and the industry is still trying to figure out how to measure it.
Old Yardsticks
For three decades, data centers were a stable environment, and a stable valuation model matched it. Whatever the workload, the facility looked functionally the same: standardized "white space" with air-cooled racks. There was also stability in the way it was priced: a leasing model borrowed from commercial real estate, where a facility's value came from the rent it collected and what comparable buildings had recently sold for, with occupancy as the underlying health metric that mattered most.
AI training workloads have broken that model. A modern GPU rack can draw well over a hundred kilowatts, dissipated through liquid delivered directly to the chip rather than air blown across a room. The facility isn't a neutral container anymore: it has to be configured from the component level around one specific workload, for one or two counterparties who account for nearly all of its revenue. A recent industry analysis from Global Data Center Hub made the underwriting consequence explicit: apply the traditional model (e.g. lease term, occupancy, building depreciation, power capacity, square footage) to one of these "compute factories," and the number it produces is wrong.
They propose a rebuilt framework that swaps in five variables that actually track value in this new environment: offtake duration in place of lease term, since the offtake agreement is the cash flow. Counterparty credit quality in place of occupancy, since one or two tenants generate nearly all the revenue. The GPU hardware refresh cycle (every two to three years according to the analysis, which is debatable) in place of building depreciation. The actual cost basis of power, not raw capacity, since $0.01/kWh of spread compounds into a durable margin advantage at scale. And, critically, design qualification (e.g. “can this facility support the current and next generation of AI hardware without a retrofit?”) in place of square footage.
That reframing is correct for the current state of the data center real estate industry, in that it still prices a single facility in isolation. Like the way a naturalist might examine one fish and infer a species. But it has no answer for what happens when the same design is dropped into a dozen different lakes at once and allowed to adapt.
Radiation, Not Redesign
The cichlids didn't converge on one perfect body plan once Lake Victoria refilled. Instead, they spread across dozens of them, each refined by the specific conditions of the niche it occupied. Different conditions require different design features, and AI factories are no different. Build one compute factory, and many of the underwriting assumptions are an engineer's best estimate, checked against a single data point once the facility is live. Build the same standardized, liquid-cooled design repeatedly, across different power markets, climates, and counterparties, instrumented consistently, and those variables stop being assumptions. They become measurements.
Take power cost. At a single facility, the number in the underwriting model is whatever rate was written into the contract at signing. Across a fleet of a dozen sites, that same figure becomes something you can actually verify. You can track how the price actually paid compared to what was promised, market by market, and see which power deals held up under real conditions and which ones quietly eroded the margin they were supposed to protect.
Design qualification works the same way. Instead of certifying, on paper, that a liquid-cooling architecture should support the next generation of AI hardware, an operator running the identical design in a desert site and a Midwest site gets to watch it perform under two genuinely different thermal loads. That operator carries real evidence, not hope, into every site that follows.
GPU-refresh readiness compounds the same advantage. Proving out a hardware transition at the fourth facility removes the guesswork around whether the eleventh can absorb it, long before a capital partner or a counterparty has to ask. And a counterparty's actual behavior works the same way too. Whether they run equipment within the specified range, respect maintenance windows, and renew on schedule stops being a single, static credit rating. It becomes an observed pattern, built up across every facility that counterparty touches.
Every additional site in a standardized, telemetry-equipped fleet isn't just more capacity. It's another observation in a radiation happening in real time. An operator running a distributed fleet with consistent instrumentation gets to watch its own radiation unfold as it happens. That's a structural advantage a single-campus developer, however well engineered, cannot replicate: they're making the best argument they can from assumptions developed under immense time pressure. Meanwhile, the fleet operator is making an argument from a track record that compounds with every new site. To a capital partner underwriting offtake durability, or a compute owner-operator choosing where to place a workload that needs to run for years, evidence beats a well-reasoned estimate every time.
The Winning Trait Isn't a Single Design
It's worth remembering what happened to the cichlids that didn't adapt: they weren't outcompeted by one superior fish. They were outcompeted by an entire radiation. Collectively, that population covered more of the new lake's niches than any single generalist body plan could. One species didn’t win out. Optimized diversity won out.
That's the more useful way to read where AI infrastructure is heading. The winners won't be defined by whoever engineers the single best compute factory. They'll be defined by whoever can stand up the same proven design across the widest range of power markets and counterparties fastest, and who can prove empirically that each new site will perform the way the last one did. Or better. Lake Victoria's cichlids didn't know they were adapting; they simply kept producing offspring into every niche the new lake offered, and the ones that fit, survived. In AI infrastructure, that advantage now belongs to whoever is paying close enough attention to know which niches are actually working, and building the next facility with that knowledge already built in.