Forecasting the Energy Cost of a Trillion-Parameter Future

Abstract

The next decade’s alpha lies not in timing AI adoption, but in forecasting the megawatts that will make it possible. As artificial-intelligence workloads scale beyond a trillion parameters, electricity becomes the new constraint. Global data-center power demand is projected to reach ≈ 945 terawatt-hours (TWh) by 2030, more than doubling this decade and introducing a structural cost line item that hyperscalers cannot hedge [IEA, Electricity 2024, AI & Data Centre Outlook 2024]. This article analyzes two analytical lenses—the Watts-per-Token Forecast (WPTF) and Energy-Forecast Elasticity (EFE)—linking workload telemetry (token throughput, floating-point operations per query, duty-cycle variance) with infrastructure variables such as Power Usage Effectiveness (PUE) and Cooling Load Factor (CLF). Baseline modeling shows that every 0.1-point improvement in global PUE translates to ≈ $2.5–5 billion in annualized savings [Google’s fleet average of 1.09; electricity tariffs from EIA 2024], underscoring energy-forecast accuracy (Weighted Absolute Percentage Error < 5%) as a leading indicator of gross-margin stability. The marginal cost of intelligence now reflects a dual constraint—compute density and electrical intensity—making megawatts per query as material to unit economics as GPUs per workload.

1. The Financial Stakes of Power Forecasting

Electricity has become the largest variable cost driver in AI-era infrastructure. The International Energy Agency projects global data-centre consumption to surge from ≈ 415 TWh in 2024 to ≈ 945 TWh by 2030, effectively doubling within six years [IEA, Electricity 2024]. The U.S. Department of Energy expects data-centres’ share of total electricity demand to rise from 4.4% in 2023 to 6.7 to 12% by 2028, a structural shift in the power market [DOE, 2024]. At industrial tariffs of US $0.04–0.08 per kWh, that translates to a US $40–75 billion annual energy bill by the end of the decade.

Meanwhile, rack-level power density has escalated from 5–10 kW to 100+ kW, with hyperscale campuses drawing hundreds of megawatts each [Ramboll, 2024]. At that scale, a 1% forecasting miss equals ≈ 9 TWh—or $350–700 million in unexpected cost. Cooling, UPS losses, and PUE fluctuations compound this exposure.

For investors, energy-forecast precision is now a margin variable: the hyperscalers that model their megawatt curve most accurately will turn AI demand into cash flow—while others turn it into heat.

2. Energy as the New Capacity-Planning Variable

For two decades, cloud capacity planning has centered on compute—forecasting GPU clusters, storage nodes, and network throughput. But in the AI era, energy has emerged as the next planning frontier. Hyperscalers now design around megawatt envelopes and grid latency as much as GPU counts—because even perfect compute forecasting fails if the power curve lags behind it: power has effectively become the new SKU.

This shift is redrawing the value chain. Firms like Vertiv (VRT), Schneider Electric (SU.PA), ABB (ABBN.SW), and Eaton (ETN)—once viewed as background infrastructure players—are now strategic partners in AI uptime. Their cooling, power-distribution, and UPS systems determine how efficiently megawatts become usable compute.

To support this transition, this article discusses two emerging forecasting lenses: the Watts-per-Token Forecast (WPTF)—a measure of the energy cost behind each AI inference—and Energy-Forecast Elasticity (EFE), which tracks how total power scales with workload growth. These frameworks elevate power forecasting from a facilities exercise to a core financial discipline—where each basis point of accuracy compounds into lower cost of compute and higher system resilience.

3. Analytical Frameworks: Watts-per-Token Forecast and Energy-Forecast Elasticity

3.1. Watts-per-Token Forecast (WPTF)

Watts-per-Token Forecast measures the marginal energy intensity of inference—how many watt-hours are consumed for every AI token, prompt, or query generated.

Watts-per-Token = Total Energy Consumed (Wh) / Tokens or Queries Generated

Recent public disclosures help calibrate the range:

OpenAI (ChatGPT) ≈ 0.34 Wh per query [Sam Altman’s Blog, 2024]
Google (Gemini) ≈ 0.24 Wh per median prompt [Fast Company, 2024]
IEA/TrendForce data centers imply ≈ 0.009–0.01 Wh per token when averaged over billions of inferences.

These figures together define an operational envelope for inference efficiency—roughly 0.01–0.3Wh per text generation event, depending on model size and deployment density.

A simple scenario illustrates the metric: a 10.2 kW inference cluster running 24 hours generates tokens @ 24,525 tokens/seconds. [Inference Power Estimation, Token Generation Rate]

Watts-per-Token = (10.2 kW×1h) / (24,525 Tokens/sec × 3600 sec) = 0.000116 Wh/token

Declining Watts-per-Token Forecast signals architectural efficiency—through better model compression, inference batching, or voltage/frequency scaling at the chip level. Tracking Watts-per-Token Forecast over time allows investors to gauge when a provider’s energy curve decouples from model-scale growth, a key indicator of sustainable margin expansion.

3.2. Energy-Forecast Elasticity (EFE)

EFE expresses how total facility power scales relative to AI workload growth:

EFE = (% Δ in Total Power Load) / % Δ in Workload (Compute or Tokens Processed)

An EFE of 1 implies linear scaling; < 1 signals efficiency leverage; > 1 flags power bottlenecks—when cooling, power-conversion, and ancillary loads rise disproportionately with compute growth. If an operator’s AI workload grows 40% while its power draw rises 25%, EFE = 0.63—indicating that power demand is inelastic relative to workload growth, a sign of efficient scaling.

Metric	Definition	Ideal Range	Financial Signal
WPTF	Watts per token or query	↓ over time	Lower energy intensity → higher margin
EFE	% Δ power ÷ % Δ workload	< 1	Efficiency gain → forecasting edge

Together, Watts-per-Token Forecast (WPTF) and Energy-Forecast Elasticity (EFE) translate energy operations into financial leverage. WPTF captures instantaneous efficiency; EFE captures structural scalability. When monitored together, they turn power planning into a quantitative discipline—where each basis-point improvement compounds into lower cost of compute and higher resilience across hyperscale portfolios.

4. Linking Workload Telemetry to Infrastructure Energy: From Tokens to Megawatts

Energy forecasting in artificial-intelligence infrastructure begins at the workload layer. Every model request produces a measurable sequence of tokens—the atomic units of computation—and each token requires a fixed number of floating-point operations, or FLOPs. When multiplied by the model’s total output and the chip’s energy per FLOP, this yields the total energy consumed by the information-technology equipment:

IT Energy = Token Throughput × FLOPs per Token × Energy per FLOP

This equipment energy is then amplified by the efficiency of the physical facility. The Power Usage Effectiveness (PUE) expresses how much additional power is required beyond the IT load to run cooling, power-conversion, and auxiliary systems:

Total Facility Energy = IT Energy × PUE

A Cooling Load Factor (CLF) describes how the facility’s cooling demand rises with workload volatility. When AI workloads operate at uneven duty cycles—spiking during model retraining or bursty inference—the CLF increases, driving PUE upward.

These relationships connect operational telemetry to the forecasting metrics defined earlier:

Watts-per-Token Forecast = (Total Facility Energy Consumed) / (Tokens or Queries Generated) = FLOPs per Token × Energy per FLOP × PUE

As model duty cycles become smoother (that is, when GPU workloads maintain steadier utilization instead of spiking during bursts) or cooling systems more efficient, PUE falls and Watts-per-Token Forecast declines accordingly.

The Energy-Forecast Elasticity (EFE) measures how total facility power scales as workloads expand. When Power Usage Effectiveness (PUE) remains stable or declines with growth, EFE stays below 1—signaling efficient scaling. When PUE rises alongside higher Cooling Load Factors (CLF), EFE exceeds 1, indicating diminishing energy returns at high utilization. EFE encapsulates the joint elasticity of PUE, hardware efficiency, and workload utilization, capturing how total facility energy responds to incremental compute demand.

5. The Responsibility Stack: Who Shapes the Energy Curve

For investors, efficiency in AI infrastructure isn’t additive—it’s interdependent.
A gain in one layer often erodes another. Each new rack generation raises performance-per-watt, but it also multiplies cooling and conversion losses, inflating total energy cost per inference. To model margin resilience accurately, investors must evaluate the entire responsibility stack—from hyperscalers and chipmakers to integrators and utilities—because each node in this chain pushes or pulls on the same megawatt curve.

5.1. Hyperscalers—Orchestrating Utilization Efficiency

Cloud leaders such as Amazon (AMZN), Microsoft (MSFT), and Google (GOOGL) control how smoothly compute loads translate into energy demand. Their scheduling algorithms, load-balancing systems, and model-duty optimizers determine whether GPU clusters idle or run at near-steady duty cycles. Every 1% improvement in utilization can save tens of millions in power and cooling costs annually [IEA, Electricity 2024].
By flattening thermal and compute volatility, hyperscalers move their Energy-Forecast Elasticity (EFE) toward 1.0—achieving steadier grid draws, lower facility wear, and more predictable margins.

5.2. Chipmakers—Shaping the Performance-per-Watt Frontier

Semiconductor designers—NVIDIA (NVDA), AMD (AMD), and Intel (INTC)—anchor the hardware side of the curve. Each new accelerator generation improves throughput per watt by 25–40% [NVIDIA Blog, 2024; AMD, 2025; Epoch AI GPU Analysis] but simultaneously raises rack density and thermal load. NVIDIA’s forthcoming Rubin Ultra NVL576 rack reportedly consumes about 600 kW per cabinet [Data Center Dynamics, 2025]—nearly 10× the power envelope of 2018-era Volta systems. Thus, while TFLOPs-per-watt improve, total energy intensity per floor tile can still climb, a reminder that efficiency at the chip level doesn’t automatically yield system-wide savings.

5.3. Infrastructure Integrators—Converting Capex into Predictable Energy Savings

Electrical and mechanical system providers like Vertiv (VRT), Schneider Electric (SU.PA), ABB (ABBN.SW), and Eaton (ETN) turn power into usable compute. Their high-voltage direct-current (HVDC) distribution, solid-state transformers, and liquid-cooling modules can cut conversion losses by 4–6%. Yet these same designs often require higher up-front capital and tighter thermal tolerances. Each megawatt deferred through efficiency translates into margin preservation—but if cooling or redundancy systems lag, the gain evaporates in downtime risk [Ramboll, 2025; IEA, 2024; Delta Electronics says 800V HVDC architecture improves power efficiency by over 4%].
Investors should view this segment as the “energy-to-compute exchange rate” that connects physical infrastructure to digital margin.

5.4. Energy Providers and Utilities—Setting the Macro Ceiling

At the top of the stack sit utilities and independent power producers such as NextEra Energy (NEE), Duke Energy (DUK), and Southern Company (SO). Their grid reliability, renewable mix, and latency define how low a hyperscaler’s Power Usage Effectiveness (PUE) can realistically fall [IEA, Electricity 2024; McKinsey, 2024].

Regions with abundant low-carbon baseloads—the Nordics, Pacific Northwest, and parts of Canada—enable hyperscalers to run smoother EFE curves and reduce exposure to price spikes.
But when local grids tighten, the hyperscaler’s efficiency narrative collapses into cost volatility, proving that the megawatt is the new unit of competitive advantage.

6. The Energy-Efficiency Paradox

Despite GPU efficiency improving nearly 3.5x since 2016, average rack power density has doubled from 6.1 kW to 12 kW. This paradox—where better chips yield higher bills—arises because hyperscalers reinvest efficiency gains into denser deployments rather than cost savings. Modern AI racks now pack multiple 700 W H100 GPUs where 250 W V100s once resided, pushing total facility loads upward even as performance-per-watt improves.

The graph below illustrates this divergence. The green line shows GPU efficiency climbing steadily across the Volta, Ampere, and Hopper generations. The red line reveals rack power density rising in parallel. Both trends are moving upward simultaneously, defying the intuitive expectation that efficiency gains should reduce total energy draw.

Figure 1. The Energy-Efficiency Paradox in AI Infrastructure

Source: AFCOM State of Data Center Reports (2016-2024), Uptime Institute Global Surveys, and Epoch AI GPU Analysis [AFCOM 2016, AFCOM Data 2021-2024, Epoch AI GPU Analysis]

At hyperscale, every additional 100 kW of continuous rack load translates to roughly $39k–$70k per year at $0.045–$0.08/kWh (≈ $3.9k–$7.0k per 10 kW and ≈PUE of 1.4). For investors, this decoupling of chip-level efficiency from system-level costs means energy forecasting has become the dominant margin variable: hyperscalers that model their megawatt curve most accurately will capture AI economics, while others face compounding energy-cost surprises. The widening gap between GPU performance-per-watt and facility-level load is precisely what the Energy-Forecast Elasticity (EFE) captures—showing how subsystem gains can still inflate total power curves.

7. What’s Next: Forecasting the Trillion-Parameter Energy Economy

The trillion-parameter era will be defined by a dual race—compute acceleration and energy orchestration. As models expand toward 10¹³ parameters, their power envelope scales in lockstep. BloombergNEF now estimates AI data-center electricity demand could exceed 1,500 terawatt-hours (TWh) by 2035, while total compute throughput rises nearly 100x from 2023 levels [BloombergNEF, Power for AI: Easier Said Than Built; IEA Energy demand from AI; Epoch AI, How many AI models will exceed compute thresholds?; Training compute of frontier AI models grows by 4-5x per year]. Together, they form the two axes of the new AI production frontier: floating-point operations and megawatt-hours.

The hyperscalers that synchronize both curves—balancing GPU utilization with grid efficiency—will capture the durable margin advantage. A 10% improvement in power-forecast accuracy or utilization scheduling can unlock 300–500 basis points [derived from hyperscaler quarterly filings showing cloud EBITDA margins of 30–40%] of EBITDA margin, equivalent to billions in annual cash flow across platforms like Amazon (AMZN), Microsoft (MSFT), and Google (GOOGL).

Policy and capex are converging fast. The U.S. §48C clean-power credit and Europe’s Data Centre Green Deal earmark more than $15 billion for grid-interactive efficiency upgrades [U.S. §48C Advanced Energy Project Credit’s $10 billion allocation under the Inflation Reduction Act; European Digital Europe Programme‘s €8.1 billion total budget for digital infrastructure modernization, 2021-2027]. Chipmakers and energy integrators—from NVIDIA (NVDA) to Vertiv (VRT) and Schneider Electric (SU.PA)—stand at the intersection of these incentives.

But the next horizon goes beyond megawatts and GPUs:

AI workloads will become grid-interactive assets. Hyperscalers are beginning to co-locate with renewable plants, using real-time energy markets to modulate workloads and offset carbon intensity.
Forecasting models will integrate grid signals. Energy price, carbon intensity, and renewable variability will feed directly into workload-scheduling algorithms.
The next research frontier is multi-modal forecasting. Future models will jointly predict compute, cooling, and carbon, extending energy forecasting into a complete sustainability equation.

“Forecasting no longer stops at GPUs or megawatts—it extends to carbon, water, and grid time slots.”

For investors, the alpha is shifting from compute growth to forecasting precision. Publicly listed power-efficiency firms such as Vertiv (VRT) have outperformed the NASDAQ by roughly 150% annualized over the last two years [stock comparison tools], mirroring this structural re-rating of the energy layer. In upcoming earnings cycles, investors should track how hyperscalers disclose megawatt capacity, PUE improvements, and power-purchase agreements (PPAs)—because energy transparency may soon become the new compute metric on Wall Street.

Disclaimer:

This article reflects the author’s analysis and opinions. It is provided for informational purposes only and should not be taken as investment, financial, legal, tax, or professional advice. Mentions of companies or securities are used strictly to support the analytical discussion and should not be viewed as recommendations. The author does not provide personalized financial advice, and readers should consult a qualified advisor before making investment decisions. Past performance is not a guarantee of future results.

About the author

Amogh Garg

Amogh Garg is a Senior Member of Technical Staff in Salesforce’s Cloud Economics and Capacity Management (CECM) organization, where he leads forecasting and capacity planning for the infrastructure powering the company’s Agentforce AI platform. His work centers on forecasting and capacity planning for cloud and AI platforms, including GPU fleet planning, inference growth modeling, and the economics of scaling distributed compute. Before Salesforce, Amogh led infrastructure forecasting and capacity initiatives across AWS, Meta, and Microsoft, supporting some of the world’s largest internet and AI platforms. His writing centers on the intersection of AI infrastructure, energy markets, and financial planning, with a focus on how forecasting is evolving to power the next decade of cloud and AI growth.

2 thoughts on “Forecasting the Energy Cost of a Trillion-Parameter Future”

Apoorv

29 November 2025 at 10:32 pm

This is a great analysis! It reframes the AI-scaling conversation beyond model size and FLOPs, spotlighting the under-appreciated — yet critical — role of energy, cooling, and facility overhead in AI economics.
Apoorv

29 November 2025 at 10:35 pm

Also wholeheartedly agree with the inference that effective forecasting of energy use per token (through frameworks like WPTF and EFE) will separate those hyperscalers who turn AI demand into scalable profit from those who end up “burning cash for heat.”

Authors work in this area could lead to something groundbreaking!