AI Workloads Don’t Just Scale Power, They Reshape It

Apr 30

AI workloads are reshaping grid dynamics: massive data centers can drop or swing hundreds of megawatts in seconds, threatening the stability of the power network. To address this, developers are exploring foundational changes to data center design and reinventing the way they are delivered.

In July of 2024, an electrical event that lasted less than a second demonstrated the risks of repackaging old solutions for new problems. In Northern Viginia, a piece of lightning protection equipment mounted to a 230kV transmission line briefly malfunctioned. The disruption was enough to send all data centers in Fairfax County onto backup power supplies. When the transmission line was automatically restored milliseconds later, the grid returned to the power levels it was supplying before the outage. But the data centers were no longer there.

In the Power Grid, A Delicate Balance

The Virginia data center facilities did what they were designed to do: immediately switch to backup power to keep the IT assets online, and remain on backup until manually returned by facility managers to the primary grid supply. But from the grid’s perspective, this event represents a new kind of shock to our national infrastructure: power demand that doesn’t just dip but disappears suddenly, and all at once.

The load loss resulting from an event like the one in Fairfax County creates a mismatch between electricity supply and demand at the very moment the system is trying to stabilize. With too much generation online relative to load, system frequency rises, and grid operators must scramble to bring supply back into balance. Failure to do so quickly can trigger cascading damage to power plants or critical transmission infrastructure.

For operators of our power grid, it is not just about having “enough power.” It is about having the right amount of power, in the right place, at the right time. Balancing supply and demand is a continual imperative, and it becomes dramatically harder when a single site can represent hundreds of megawatts of demand and can change its behavior faster than the grid can comfortably adjust.

The sixty data centers that went offline in Fairfax County were largely filled with conventional IT infrastructure. Their combined power draw in 2024 was roughly equivalent to just one of the AI campuses under development in Texas, Ohio and elsewhere. Concentrating AI capacity into a handful of massive facilities magnifies the consequences of getting the timing wrong. In a world of many smaller loads, supply-demand mismatches are diluted across a region. In a world of gigawatt-scale campuses, mismatches can become “grid events.”

Even when an AI facility is operating normally, the underlying workloads can create fast and significant power swings. In the words of John Perella, CEO of Terraflow Energy, “if you have a gigawatt data center and you see anywhere from a 30% to 80% swing, you're talking about 300 megawatts to 800 megawatts of swing multiple times a minute. When I mean swing, I mean – immediately disappearing, off for a couple seconds, immediately spiking back on for a couple seconds. It looks like an EKG up and down and up and down.”

This “business as usual” volatility also threatens the grid’s supply/demand balance, and necessitates special data center equipment to help the facility act like a stable grid asset. Traditional backup generation isn’t sufficient. According to Perella, “the fastest generators on the planet - you're probably talking diesel or recips - take 25 to 30 seconds to ramp up. If you're running off a generator and you don't have something between the generator and the data center, you're literally going to break the crankshafts on the gen sets. It's like driving a supercar down the highway at 80 miles an hour and shifting from fifth gear to first gear over and over and over again. How many times is a car going to let you do that before you drop your clutch, or your engine just falls out and blows up?” In critical ways, the volatility of AI workloads requires a new, not incremental, approach to facility design.

The flagship Stargate data center campus in Abilene, TX is planned for 1.2 gigawatts (1,200 MW) of total power capacity. This one campus will have roughly the same load profile as the sixty data centers in Northern Virginia that contributed to the July 2024 grid event.

Adapting to Change

It takes a different mindset to deliver an AI data center facility that is responsive to workload volatility and respectful of the power grid. This year, we are seeing several novel concepts gain fresh momentum as large hyperscale projects stall and public discourse on data center construction intensifies. Below is a summary of three emerging trends that continually come up in our discussions around AI infrastructure deployment.

Many and Small in lieu of Single and Large

The concept of deploying an interconnected fabric of small data centers is by no means new. In fact, established operators like Equinix, Digital Realty and NTT embrace the dispersion and interconnection of their fleets as core value propositions. But over the last 12 months, the number of organizations expressing intent to deploy similar fleets for AI compute has expanded significantly. Major telecommunications and crypto mining companies have indicated their intended participation in annual SEC filings. Inference platforms like Together AI, Fireworks AI and Modal are proliferating. Even companies with distributed portfolios of built assets completely unrelated to data centers (think retail, warehouses and the like), are positioning themselves for a hard pivot into AI infrastructure.

This trend is partly fueled by the scarcity of sites with abundant available power. It also signals a broader realization that future AI workloads may not require massive compute clusters. That is good for the country’s electrical infrastructure, because distributed deployments avoid concentrating a massive load in one part of the grid. They make local balancing easier and reduce the short-term consequences of supply-demand mismatches. As a result, smaller data centers can avoid some of the complex electrical systems that large facilities need to mitigate the impact of volatility on the supporting grid.

Resilience, Rehashed

Organizations that start down the path of distributed AI infrastructure quickly find that it opens the door to a reimagining of resilience and uptime. When IT assets are largely concentrated in one massive facility, the consequences of a prolonged outage can be catastrophic. As a result, the data center industry has spent decades developing system configurations to keep the IT online in spite of grid outages, equipment failures or emergency maintenance needs. These configurations require large amounts of redundant components as well as backup generation. These add-ons can constitute 50% or more of the final facility CapEx cost, to say nothing of the ongoing OpEx costs associated with testing and maintaining this largely-idle equipment.

When the same IT assets are geographically distributed across multiple sites in a wide area, the aggregate network is inherently resilient against the failure of any one component across the facility stack or the local power grid infrastructure. As a result, the stakes of a prolonged outage at any site are much lower. This has empowered organizations to rethink resilience. Instead of costly generators and redundant equipment, operators can opt for new and sophisticated orchestration platforms that automatically detect outages and direct critical AI workloads to facilities that are operational.

This is a meaningful trend that is real, if nascent: small, distributed deployments offer the opportunity to theoretically meet end user service level agreements without onsite backup generation or much of the redundant systems common in conventional, large-scale data center facilities.

Facility as Product

A shift to distributed AI infrastructure requires a renewed focus on standardization to preserve the operational efficiency inherent in a single, large facility. A developer can quickly get bogged down across multiple site deployments if conventional design and construction methods are used. Once deployed, the facilities must support operations and maintenance that is as remote, automated and predictable as possible.

Many builders are coalescing around modular, off-site construction practices for fabrication, integration and commissioning of facilities. In many cases, up to 80% of the construction work can be strategically co-located in factories with skilled labor pools and equipment distributors. The factories offer robust opportunities to parallelize delivery activities and enhance construction quality.

Modular construction can be further enhanced through productization, in which the data center is largely designed as though it were a product, and in advance of Notice to Proceed on a specific deployment project. This mindset can drive the development of a truly standardized, industrialized platform for AI infrastructure. Because AI compute assets behave differently from traditional IT, the facilities must be intentionally and thoughtfully designed for their novel power densities, cooling requirements, workload volatility, networking specifications and operational protocols. The design phase of a traditional construction project does not permit the time and resources to solve these novel challenges on the critical path of deployment. Too often, project teams rely on conventional designs as a result of time pressure, inadvertently carrying forward outdated techniques inadequate for the needs of AI. Productization makes space for thoughtful design and amortizes the investment across many future site deployments.

Meeting the Moment

Productization is not a cost-optimization exercise layered on top of yesterday’s facility patterns. It is the prerequisite for building AI infrastructure that behaves responsibly on the grid and reliably under volatile, bursty workloads. If a campus can swing hundreds of megawatts in seconds, “standard” electrical architectures, protection schemes, and operating modes are no longer safe defaults. What is required is a first-principles redesign of the facility (or fabric of facilities) as a dynamic system that can absorb, shape, and communicate power behavior, rather than simply consume it.

That kind of reimagining does not fit cleanly into the traditional project cadence, where design hardens quickly after a site is selected and a schedule is already running. Once a project is in motion, the incentives favor speed, permitting certainty, and reuse of familiar components. The result is an understandable tendency to treat AI’s new constraints as exceptions to manage, instead of fundamentals to design around. A facility that is intended to be grid-friendly and workload-resilient needs deliberate time up front to test assumptions, productize interfaces, and validate performance before it is committed to steel, switchgear, and concrete.

This is why the organizations best positioned in this cycle are often the ones least burdened by entrenched reference designs, legacy assets, and mature delivery processes that penalize deviation. New entrants and teams with real risk appetite can make space for deep technical design, and can pursue architectures that avoid interconnection bottlenecks, reduce redundant infrastructure, and remain adaptable as compute and cooling technologies evolve. The reward is not just faster deployment. It is a platform that can scale into an inference-driven market where value accrues to distributed, interconnected AI infrastructure that can be deployed quickly, operated efficiently, and integrated into the grid with confidence.

Zak Kostura