Protecting the Power Grid When Training Artificial Intelligence

Dillon Jensen
November 12, 2025

Submitted as coursework for PH240, Stanford University, Fall 2025

Introduction

Fig. 1: Example of a rack's power use while training an AI model. Values are normalized as a proportion of the rack's maximum rated power use. (Image source: D. Jensen)

Datacenter energy use has grown consistently since the advent of cloud computing. Due to increased demand for artificial intelligence (AI) computational power, the global electricity consumption of datacenters is projected to rise to nearly 3 PWh annually by 2030. [1] (For comparison, the United States as a whole used an estimated 4.18 PWh of electrical power in 2023. [2]) The rapidly changing face of the datacenter power landscape presents new and difficult challenges when integrating with traditional, slow-changing power grids.

Problem Statement

The first law of thermodynamics states that for a given volume, the enclosed energy changes at a rate equal to the difference of the energy entering the system and the energy that is leaving the system. [3] On the power grid, this means that the electricity being generated and transmitted through power lines around the country has to be equal to the electrical power being used at any given point in time. If the demand for electricity exceeds the available supply, there is a risk of blackout or lasting damage to generators and other connected hardware. An incident in the Eastern United States in 2024 resulted in roughly 1.5 GW of datacenter-related power demand abruptly disconnecting from the grid, resulting in a roughly 7% drop in demand within seconds. Emergency measures were dispatched to address the situation, but grid voltage and frequency fluctuated for several minutes afterward. [4]

The power grid is not adapted for handling abrupt changes in demand. Normal patterns of urban and industrial power use have historically been forecasted well enough to bring generators online or offline as necessary throughout a day or week. Sudden, unpredictable changes in demand are difficult to respond to given that large generators can take a long time to ramp power output up or down---top-tier natural gas turbines, for example, take about six minutes to ramp up to their full power output even though they are designed to respond to grid-scale power fluctuations. [5]

Fig. 2: Large AI training jobs can ramp much faster than traditional load-balancing generators with similar rated power output. (Image source: D. Jensen)

The way that AI is trained can make it very difficult to balance supply and demand on the power grid, especially as more and more power goes toward AI datacenters. Traditional datacenter power loads have been predictable, with millions of server requests processed simultaneously and the overall power changing gradually from hour to hour. On the other hand, the process of training a single AI model utilizes entire racks or rows of servers in the data center. The computers in these racks divide up the work of training an AI model, turning on together and periodically pausing to share their updated calculations with each other and divide up the next batch of work. [6] Fig. 1 shows an example of what a rack's power use might look like as it works on training an AI model. The rack runs at or near full power when running its own computations ("training"), then drops abruptly to much lower power sharing data ("checkpointing") with the other racks. Regardless of the number of racks involved in a specific training job, the shape of the total job's power trace will match each rack's power trace because the units are operating in sync.

Experimental measurements reported by the North American Electric Reliability Commission (NERC) demonstrated that an AI training cluster can transition from its maximum rated to minimum power in roughly 500 milliseconds. [7] Next-generation AI models could reasonably demand upwards of 500 MW computational power during training. [8] A 500 millisecond transition to full power would require a ramping rate of 1000 MW/second. For comparison, the grid-balancing natural gas turbine has a maximum power output of 481 MW, with a maximum ramping rate of 1.42 MW/second, nearly three orders of magnitude too slow to compensate for the AI-training power ramp. [5]

This issue has not gone without notice among grid operators, and many are starting to push back against proposed AI datacenter projects unless datacenter operators can demonstrate some ability to buffer the grid from the ramping rates of training workloads. The Alberta Electric System Operator (AESO), for example, recently proposed a maximum allowable ramp rate of 10 MW/min (~0.17 MW/second) for transmission-connected data centers. One solution offered by AESO is to include energy storage as part of the datacenter power system in order to minimize the risk that AI would otherwise pose to the integrity of the broader power grid. [9]

Energy Storage: Why It's Necessary and How It Helps

Fig. 3: The energy stored in the rack has to change if the supply and demand are not equal. (Image source: D. Jensen)

Fig. 4: Dashed red line shows an example of what might be an allowable grid power trace against a high ramp-rate datacenter power trace. The energy storage solution that settles the difference has to meet peak power ratings, as well as timing and capacity requirements. (Image source: D. Jensen)

Thinking just on the level of a single rack of computers in a datacenter, Fig. 3 illustrates how the energy added to the rack comes from the power grid, while energy leaves the system as the computers and their supporting systems do work. If the power added to the rack is not equal to the power leaving the rack, there has to be some energy either being stored in or discharged from the rack.

Applying this thought process to the case of training AI, the rack power may fluctuate dramatically over the course of a few seconds, while the grid is only able to respond to gradual changes in demand. Therefore, there has to be a means of storing energy somewhere in the system in order to allow the datacenter to operate without placing the broader power grid at risk. Fig. 4 illustrates this principle graphically, where some energy is diverted into storage after a rack power ramping event, giving time for the grid to adjust. This energy storage system has to be able to meet the following specifications in order to fill the role:

Power: The energy storage system has to be able to handle enough power to actually compensate for the entire difference between where the rack's demand and the grid's supply.
Energy: The energy storage system has to have enough capacity that it won't ever be completely full when it needs to recieve more energy, or completely empty when power is needed. In the worst case scenario, the rack of servers has been operating at maximum power for a long time and suddenly turns off and stays off, forcing the energy storage system to charge for however long it takes for the power grid to ramp down to the new demand level.
Response Time: The energy storage system needs to be able to respond by turning on/off fast enough that the power grid doesn't get slammed with flickering high/low demand spikes. Chemical batteries, capacitors, and other passive energy storage devices are ideal for meeting this specification.

A combination of different systems may be necessary in order to meet all of these requirements at once.

Conclusion

Based on this analysis, it has been shown that the existing power grid is not well equipped to adapt to the emerging challenges presented by AI datacenter loads. Grid operators are beginning to push back against new datacenter builds because AI loads fluctuate faster than grid generators can adapt. Assuming no change is made in the way that power is used while training AI models, some means of energy storage is required to buffer the slowly-changing grid power supply from rapidly fluctuating datacenter power demands.

© Dillon Jensen. The author warrants that the work is the author's own and that Stanford University provided no input other than typesetting and referencing guidelines. The author grants permission to copy, distribute and display this work in unaltered form, with attribution to the author, for noncommercial purposes only. All other rights, including commercial rights, are reserved to the author.

References

[1] A. Katal, S. Dahiya, and T. Choudhury, "Energy Efficiency in Cloud Computing Data Centers: A Survey on Software Technologies," Clust. Comput. 26, 1845 (2023).

[2] "Electric Power Annual 2024," U.S. Energy Information Administration, October 2025, Table 1.1.

[3] A. Saggion, R. Faraldo, and M. Pierno, Thermodynamics: Fundamental Principles and Applications (Springer, 2019), p. 23.

[4] "Incident Review: Considering Simultaneous Voltage-Sensitive Load Reductions," North American Electric Reliability Corporation, January 2025.

[5] "Siemens HL-Class: The Next Generation of Siemens Advanced Air-Cooled Gas Turbines," Siemens AG, PGGT-T10031-00-7600, 2018.

[6] Z. Ye et al., "Deep Learning Workload Scheduling in GPU Datacenters: A Survey," ACM Comput. Surv. 56, 146 (2024).

[7] "Characteristics and Risks of Emerging Large Loads," North American Electric Reliability Corporation, July 2025.

[8] J. You et al., "Scaling Intelligence: The Exponential Growth of AI's Power Needs," Electric Power Research Institute, August 2024.

[9] "AESO Connection Requirements for Transmission-Connected Data Centres," Alberta Electric System Operator, August, 2025.