Research Article算电协同
Chandan Chaudhary、Alaaeldein Abdelkader、Yansong Pei、Mohammed Benidris、Joydeep Mitra
Published 2026-06-12 · arXiv · Credibility S
The proliferation of large-scale data centers introduces spatially correlated demand profiles that challenge the long-standing assumption of statistical independence of loads in power system analysis. This paper examines the emergence of such load correlations and evaluates their impact on data-center-dominated grids. Analytical derivations reveal that correlated load fluctuations amplify aggregate stochastic distur…
Abstract, interpretation and reference
Abstract
The proliferation of large-scale data centers introduces spatially correlated demand profiles that challenge the long-standing assumption of statistical independence of loads in power system analysis. This paper examines the emergence of such load correlations and evaluates their impact on data-center-dominated grids. Analytical derivations reveal that correlated load fluctuations amplify aggregate stochastic disturbances, reduce voltage stability margins through weakened reactive power stiffness, and degrade frequency stability margin by erosion of natural load diversity effects. Real-time digital simulation studies confirm that moderate spatial correlation in distributed data centers produces simultaneous frequency deviations and voltage fluctuations across multiple buses. The findings offer transmission system operators a physics-based perspective to interpret emerging oscillatory phenomena and establish stability planning criteria grounded in measurable load-correlation structures rather than traditional diversity assumptions.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用仿真建模和情景分析,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Chandan Chaudhary, Alaaeldein Abdelkader, Yansong Pei, 等. Spatial Load Correlation in AI Data-Center-Dominated Power Systems[J/OL]. (2026-06-12)[2026-06-16]. http://arxiv.org/abs/2606.13853v1.
Research ArticleAI 运维优化
Roblex Nana Tchakoute、Claude Tadonki
Published 2026-05-23 · arXiv · Credibility S
High-Performance Computing (HPC) has recently entered the Exascale era, and considerable efforts are being made to fully harness this potential power for large-scale applications, such as cutting-edge generative AI (training and exploitation). The corresponding energy consumption is very high, and forecasts are alarming, making this metric a critical systemic bottleneck. Addressing this issue presents a genuine chal…
Abstract, interpretation and reference
Abstract
High-Performance Computing (HPC) has recently entered the Exascale era, and considerable efforts are being made to fully harness this potential power for large-scale applications, such as cutting-edge generative AI (training and exploitation). The corresponding energy consumption is very high, and forecasts are alarming, making this metric a critical systemic bottleneck. Addressing this issue presents a genuine challenge for the entire cloud-edge-HPC continuum at all scales, from low-power IoT microcontrollers to multi-megawatt data centers. Beyond financial costs, green computing is driven by considerations related to climate change and environmental concerns such as carbon footprint ($CO_2e$), as well as constraints on energy production and supply, leading to a real need to regulate {\em information and communication technology} (ICT) activities. This article presents a comprehensive overview of energy-efficient computing, taking into account the most recent and significant contributions. Based on this exploration of the state of the art, we design and describe a holistic taxonomy of the aforementioned publications, structured around various perspectives, including {\em hardware and software aspects, measurement instrumentation, software optimizations, dynamic task scheduling, voltage scaling, workload consolidation, federated learning}, and {\em cooling}. Particular emphasis is placed on large-scale AI, which receives significant attention due to its considerable resource requirements. We conclude with an analysis of a forward-looking roadmap that considers the main perspectives of sustainable computing.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,AI 运维、负载预测和设施调优正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向能效评价口径、运营指标和优化目标的系统化梳理。意义:对日报读者而言,它可用于判断AI 工具是否能降低运维复杂度并提升可用性。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Roblex Nana Tchakoute, Claude Tadonki. Energy-Aware Computing in the Year 2026[J/OL]. (2026-05-23)[2026-06-16]. http://arxiv.org/abs/2605.24569v1.
Research Article芯片与算力
Minghao Li、Alicia Golden、Samuel Hsia、Michael Kuchnik、Adi Gangidi、Xu Zhang、Ashmitha Jeevaraj Shetty、Zachary DeVito
Published 2026-05-23 · arXiv · Credibility S
The rapid scaling of large language model training requires distributing GPU resources across multiple data center buildings and regions. We refer to such paradigm as "scale-across" training. As infrastructure expands, the system design space becomes increasingly intricate, encompassing new model architectures, hardware heterogeneity, and evolving communication patterns. Drawing from Meta's production experience, we…
Abstract, interpretation and reference
Abstract
The rapid scaling of large language model training requires distributing GPU resources across multiple data center buildings and regions. We refer to such paradigm as "scale-across" training. As infrastructure expands, the system design space becomes increasingly intricate, encompassing new model architectures, hardware heterogeneity, and evolving communication patterns. Drawing from Meta's production experience, we highlight the complexities of deploying training jobs across a few data centers housing hundreds of thousands of GPUs. To accelerate exploration of the large design space and to enable efficient training for frontier model development, we conduct in-depth characterization of three key design dimensions: parallelism placement, parallelism scheduling, and network layer technologies. We then propose ScaleAcross Explorer, an optimizer that considers the interplay of design dimensions and holistically optimizes scale-across training. Testbed experiments and simulations demonstrate up to 64.62% training speedups over production configuration and up to 37.59% training speedups over the state-of-the-art baseline across a wide range of design points.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,芯片、服务器和高密度算力部署正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向跨地域数据中心负载与电力资源之间的调度关系。意义:对日报读者而言,它可用于判断芯片路线和服务器密度变化如何传导到机房设计。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Minghao Li, Alicia Golden, Samuel Hsia, 等. ScaleAcross Explorer: Exploring Communication Optimization for Scale-Across AI Model Training[J/OL]. (2026-05-23)[2026-06-16]. http://arxiv.org/abs/2605.24326v1.
Research Article算电协同
Chandan Chaudhary、Michael Murillo、Mohammed Ben-Idris、Joydeep Mitra、Dilip Pandit、Atri Bera
Published 2026-06-12 · arXiv · Credibility S
Hyperscale AI data centers induce spatially and temporally correlated load fluctuations that violate classical independence assumptions and are not captured by time-averaged spectral methods. These correlations are episodic and non-stationary, requiring analysis that resolves transient structure. This paper applies Dynamic Mode Decomposition (DMD) to the temporal evolution of pairwise inter-bus correlation coefficie…
Abstract, interpretation and reference
Abstract
Hyperscale AI data centers induce spatially and temporally correlated load fluctuations that violate classical independence assumptions and are not captured by time-averaged spectral methods. These correlations are episodic and non-stationary, requiring analysis that resolves transient structure. This paper applies Dynamic Mode Decomposition (DMD) to the temporal evolution of pairwise inter-bus correlation coefficients to form a low-dimensional state representation that enables modal analysis without a stationarity assumption. DMD eigenvalues encode the correlation regime: their location in the complex plane distinguishes sustained coherence, decaying transients, and intensifying events, while oscillation frequency maps to underlying physical coupling mechanisms. Using an IEEE 39-bus Real-Time Digital Simulator (RTDS) testbed with three converter-interfaced AI data center loads driven by synthetic workload profiles, global DMD provides a time-averaged modal baseline in a slow thermal band ($f \approx 0.005$\,Hz, $|μ| = 0.91$) captures 93.6\% of total correlation energy. A sliding-window DMD formulation identifies transient intensification events: 51 of 775 windows (6.6\%) satisfy the $|μ_k^{(n)}| > 1$ criterion, which aligns with stochastic workload coincidences. Cross-validation with RTDS voltage coherence confirms elevated coupling during these intervals. The proposed modal growth indicator provides an early-warning signal of correlation intensification prior to peak pairwise coherence.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用仿真建模和情景分析,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向跨地域数据中心负载与电力资源之间的调度关系。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Chandan Chaudhary, Michael Murillo, Mohammed Ben-Idris, 等. Modal Analysis of Spatial Load Correlation in AI Data Center-Dominated Power Systems[J/OL]. (2026-06-12)[2026-06-16]. http://arxiv.org/abs/2606.13847v1.
Research Article算电协同
Haoxiang Wan、Xingpeng Li
Published 2026-05-18 · arXiv · Credibility S
Data center electricity consumption reached 4.4% of U.S. total in 2023 and is projected to grow to 6.7--12% by 2028, imposing increasing stress on transmission networks while representing a largely untapped source of controllable demand-side flexibility. This paper proposes a modular security-constrained unit commitment (SCUC) framework that coordinates flexible data center workloads with system-level scheduling to …
Abstract, interpretation and reference
Abstract
Data center electricity consumption reached 4.4% of U.S. total in 2023 and is projected to grow to 6.7--12% by 2028, imposing increasing stress on transmission networks while representing a largely untapped source of controllable demand-side flexibility. This paper proposes a modular security-constrained unit commitment (SCUC) framework that coordinates flexible data center workloads with system-level scheduling to reduce renewable curtailment, alleviate congestion, and lower operating costs. Three mixed-integer linear programming (MILP) models are formulated: the Data Center Spatial model (DC-S), enabling instantaneous workload redistribution across geographically distributed sites; the Data Center Temporal model (DC-T), permitting each site to shift its deferrable load across time while preserving the daily energy balance; and the Data Center Spatio-Temporal model (DC-ST), jointly activating both mechanisms and spanning the largest feasible operating region. Case studies on a modified IEEE 24-bus reliability test system show that DC-ST eliminates all base-case and post-contingency transmission violations at a flexibility ratio of 40%, and reduces renewable curtailment by up to 84.4% at 30% relative to the inflexible baseline. Sensitivity analysis further reveals that moderate flexibility levels of 20%--30% already capture most of the achievable benefits, supporting practical deployment with limited operational burden on data center operators.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Haoxiang Wan, Xingpeng Li. Data Center Spatio-Temporal Load Flexibility in Security-Constrained Unit Commitment for Enhanced Grid Efficiency and Reliability[J/OL]. (2026-05-18)[2026-06-16]. http://arxiv.org/abs/2605.18517v1.
Research Article算电协同
Yugui Liu、Yibo Ding、Xudong Li、Jing Qu、Wenyi Zhang、Tong Qian、Wuyou Xiao、Zhengyang Hu
Published 2026-06-03 · arXiv · Credibility S
Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs…
Abstract, interpretation and reference
Abstract
Energy-intensive data centers (DCs) have emerged as substantial and flexible loads in modern power systems, underscoring the critical need for computation-electricity coordination. Harnessing the spatio-temporal flexibility of DC workloads is a promising approach to facilitate this coordination. However, existing studies overlook the collaborative potential of computational resource sharing among geo-distributed DCs, thereby failing to fully unlock this flexibility. In this paper, a bi-level computation-electricity coordination framework is proposed to explicitly capture the bidirectional interactions between DCs and power grid. Firstly, a peer-to-peer cloud service market (P2P-CSM) for geo-distributed DCs is proposed, which enables bilateral cloud service transactions to leverage regional heterogeneities (e.g., electricity prices, cooling efficiency). Secondly, locational marginal prices are embedded into the framework to reflect network congestion and nodal price disparities. Thirdly, a dual consensus alternating direction method of multipliers (ADMM)-based decentralized algorithm is developed as the P2P market clearing algorithm, and a bisection-assisted iterative algorithm is proposed to ensure rigorous convergence of the framework. Case studies conducted on modified IEEE 30-bus system validate that the P2P-CSM achieves a win-win computation-electricity coordination: it not only increases total DC operational profit by 22.8\%, but also effectively alleviates grid congestion and yields a 3.2\% reduction in total energy consumption.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用框架构建和频域/系统级分析,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Yugui Liu, Yibo Ding, Xudong Li, 等. Peer-to-Peer Cloud Service Market for Data Centers Oriented to Computation-Electricity Coordination[J/OL]. (2026-06-03)[2026-06-16]. http://arxiv.org/abs/2606.04981v1.
Research Article算电协同
Jiyong Lee、Melody Agustin、Joanne Langsdorf、Erhan Kutanoglu、Michael Baldea、Ilias Mitrai
Published 2026-05-28 · arXiv · Credibility S
In this paper, we consider the expansion of power grids under emerging large loads from data centers and electrified manufacturing. We develop a multi-period grid capacity expansion model to determine optimal investment profiles for power generation, storage, and transmission capacity while accounting for hourly power dispatch, such that electricity demand is satisfied and the total planning and operation cost is mi…
Abstract, interpretation and reference
Abstract
In this paper, we consider the expansion of power grids under emerging large loads from data centers and electrified manufacturing. We develop a multi-period grid capacity expansion model to determine optimal investment profiles for power generation, storage, and transmission capacity while accounting for hourly power dispatch, such that electricity demand is satisfied and the total planning and operation cost is minimized. We also propose a new modeling approach regarding the spatial distribution of demand from large loads. The model is used to analyze the expansion of a synthetic grid that follows key characteristics of the ERCOT system over a seven-year planning horizon, under loads from data centers and electrified oil refining, which account for 17.5% and 4.7% of total annual electricity demand by the end of the planning horizon. The optimal investment policy leads to an 83.6% increase in generation capacity and exploits the short construction times of solar and storage as well as the operational flexibility of thermal generators. Finally, sensitivity analysis reveals that the construction time of grid assets substantially impacts investment timing, generation technology mix, and transmission capacity expansion. The proposed modeling framework is general and can be extended to other grid systems, enabling the exploration of diverse demand scenarios, policy assumptions, and regional characteristics.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用框架构建和频域/系统级分析,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Jiyong Lee, Melody Agustin, Joanne Langsdorf, 等. Grid Capacity Expansion under Data Centers and Electrified Manufacturing Large Loads[J/OL]. (2026-05-28)[2026-06-16]. http://arxiv.org/abs/2605.29053v2.
Research Article算电协同
Haiyang You、Chengwei Lou、Jin Zhao、Yue Zhou、Lu Zhang、Jin Yang
Published 2026-05-25 · arXiv · Credibility S
The expansion of data centers (DCs) drives a sustained increase in electricity demand and associated water withdrawals at generation sites. These withdrawals occur at generation sites and are virtually allocated to demand based on network power flows. Consequently, the actual water footprint of a specific load varies dynamically with generation dispatch and network conditions. Existing approaches typically rely on s…
Abstract, interpretation and reference
Abstract
The expansion of data centers (DCs) drives a sustained increase in electricity demand and associated water withdrawals at generation sites. These withdrawals occur at generation sites and are virtually allocated to demand based on network power flows. Consequently, the actual water footprint of a specific load varies dynamically with generation dispatch and network conditions. Existing approaches typically rely on static statistical accounting to quantify these water footprints. However, such static methods fail to capture how dispatch optimization and workload relocation dynamically affect water withdrawals. As a result, static statistical accounting approaches remain decoupled from the optimization process, rendering them incapable of guiding workload relocation or power dispatch to mitigate water stress. To address this limitation, this paper develops an operational electricity-computation-water (ECW) nexus framework that internalizes virtual water impacts directly into power system dispatch. The framework represents dispatch optimization as a differentiable optimization layer embedded within a deep learning architecture, enabling efficient end-to-end learning of coordination policies while preserving operational feasibility. Combined with fixed-point coordination, the framework enforces consistency between virtual water attribution and physical generation-side withdrawals. Case studies on the IEEE 30-bus and 118-bus test systems demonstrate reliable convergence, exact power-water consistency, and reductions of approximately 3-5% in generation-related freshwater withdrawals under water-constrained conditions.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向跨地域数据中心负载与电力资源之间的调度关系。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Haiyang You, Chengwei Lou, Jin Zhao, 等. From Accounting to Coordination: A Virtual Water-Aware Electricity-Computation-Water Nexus Framework for Data Center Dispatch[J/OL]. (2026-05-25)[2026-06-16]. http://arxiv.org/abs/2605.25854v1.