Research Article能效优化
Xiang Liu、Shimiao Yuan、Zhenheng Tang、Peijie Dong、Kaiyong Zhao、Qiang Wang、Bo Li、Xiaowen Chu
Published 2026-05-12 · arXiv · Credibility S
LLM inference is still evaluated mainly as a model or software problem: accuracy, latency, throughput, and hardware utilization. This is incomplete. At deployment scale, the relevant output is a quality-conditioned token produced under joint constraints from effective compute, delivered data-center power, cooling capacity, PUE, and utilization. We argue that the ML community should treat inference as \emph{energy-to…
Abstract, interpretation and reference
Abstract
LLM inference is still evaluated mainly as a model or software problem: accuracy, latency, throughput, and hardware utilization. This is incomplete. At deployment scale, the relevant output is a quality-conditioned token produced under joint constraints from effective compute, delivered data-center power, cooling capacity, PUE, and utilization. We argue that the ML community should treat inference as \emph{energy-to-token production}. We formalize this view with a dimensionally consistent Token Production Function in which token rate is bounded by both compute-per-token and energy-per-token ceilings. Listed API prices vary by over an order of magnitude across providers, but we use price dispersion only as directional motivation, not as causal evidence of marginal cost. The core physical question is instead: under fixed quality and service targets, when does the binding constraint move from theoretical peak compute toward delivered power, cooling, and operational efficiency? Under this framing, system optimizations -- latent KV-cache compression, sparse or heavily compressed attention, quantization, routing, and difficulty-adaptive reasoning -- are not merely local engineering tricks. They are energy-to-token levers because they reduce FLOPs/token, joules/token, memory traffic, or utilization losses under fixed $(q^{*},s^{*})$. We therefore call for inference papers and benchmarks to report Joules/token, active binding constraint, PUE-adjusted delivered power, and utilization-adjusted token output alongside accuracy and latency.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,PUE/WUE、能效指标和运营成本控制正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向能效评价口径、运营指标和优化目标的系统化梳理。意义:对日报读者而言,它可用于判断不同能效指标是否真实反映节能和成本收益。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Xiang Liu, Shimiao Yuan, Zhenheng Tang, 等. Position: LLM Inference Should Be Evaluated as Energy-to-Token Production[J/OL]. (2026-05-12)[2026-05-30]. http://arxiv.org/abs/2605.11733v1.
Research Article热管理与液冷
Viktor Danchev、Alex Dyer、Sebastian Grau、Guillaume Vazeille
Published 2026-05-07 · arXiv · Credibility S
The Standard Model of particle Physics has been validated to extraordinarily high precision by the Large Hadron Collider (LHC). Yet it leaves some of the most fundamental questions in Physics unresolved: the nature of dark matter, the hierarchy problem, and the unification of forces. Multiple next-generation terrestrial colliders have been proposed such as the Future Circular Collider (FCC) which will reach centre-o…
Abstract, interpretation and reference
Abstract
The Standard Model of particle Physics has been validated to extraordinarily high precision by the Large Hadron Collider (LHC). Yet it leaves some of the most fundamental questions in Physics unresolved: the nature of dark matter, the hierarchy problem, and the unification of forces. Multiple next-generation terrestrial colliders have been proposed such as the Future Circular Collider (FCC) which will reach centre-of-mass energies of $\approx$100 TeV, yet the energy scales at which hints of Grand Unified Theories (GUTs) and string theory are expected to be observed ($10^{11}-10^{13}$ TeV) remain orders of magnitude beyond the reach of any terrestrial facility. We argue that the path to these energy frontiers inevitably leads to Space. By examining the fundamental scaling law for circular proton colliders, we establish that colliders of radius $10^3-10^5$ km are required to enter the PeV-EeV regime. In addition, Space-based colliders benefit from virtually free ultra-high vacuum ($< 10^{10}$ particles/m$^3$ above 1000 km altitude), passive cryogenic cooling, reduction of geological and political constraints, and perhaps most importantly -- the substantial reduction of the thermodynamic penalty that dominates terrestrial cryogenic power budgets. We survey existing proposals for beyond-Earth colliders, derive order-of-magnitude requirements for an orbital collider constellation, and assess feasibility against current and near-term spacecraft capabilities in formation flying, power generation, and precision attitude control. We conclude that recent developments in orbital infrastructure -- particularly gigawatt-scale orbital power architectures being developed for Space-based data centers -- are converging with the needs of a Space-based mega collider, making serious feasibility studies warranted and promising a more certain path towards the core questions of modern Physics.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,液冷、热管理和数据中心能效正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用综述归纳和指标比较,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向冷却效率、能源利用或运维策略的改进方向。意义:对日报读者而言,它可用于判断液冷方案、热管理路线和高密度部署节奏。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Viktor Danchev, Alex Dyer, Sebastian Grau, 等. The Case for Space-Based Particle Colliders: Orbital Infrastructure as a Path to Grand Unification Energy Scales[J/OL]. (2026-05-07)[2026-05-30]. http://arxiv.org/abs/2605.08239v1.
Research Article能效优化
Raphael Hendrigo de Souza Gonçalves、Wendel Marcos dos Santos
Published 2026-05-07 · arXiv · Credibility S
This study proposes a scalable Digital Twin framework for energy optimization in data centers.The framework integrates IoT-based data acquisition, cloud computing, and machine learning techniques to enable real-time monitoring, forecasting, and intelligent energy management. A controlled small-scale data center environment was developed to monitor variables such as power consumption, temperature, and computational w…
Abstract, interpretation and reference
Abstract
This study proposes a scalable Digital Twin framework for energy optimization in data centers.The framework integrates IoT-based data acquisition, cloud computing, and machine learning techniques to enable real-time monitoring, forecasting, and intelligent energy management. A controlled small-scale data center environment was developed to monitor variables such as power consumption, temperature, and computational workload. Long Short-Term Memory (LSTM) models were employed to predict energy demand and support operational decision-making. Experimental results demonstrated improvements in energy efficiency, including reductions in power consumption and enhancements in Power Usage Effectiveness (PUE). Despite being evaluated in a constrained environment, the proposed framework demonstrates strong potential as a scalable and cost-effective solution for sustainable data center management.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,PUE/WUE、能效指标和运营成本控制正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向能效评价口径、运营指标和优化目标的系统化梳理。意义:对日报读者而言,它可用于判断不同能效指标是否真实反映节能和成本收益。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Raphael Hendrigo de Souza Gonçalves, Wendel Marcos dos Santos. A Scalable Digital Twin Framework for Energy Optimization in Data Centers[J/OL]. (2026-05-07)[2026-05-30]. http://arxiv.org/abs/2605.05581v1.
Research Article算电协同
Johnny R. Zhang、Gaoyuan Du、Qianyi Sun、Shiqi Wang、Jiaxuan Li、Xian Sun
Published 2026-05-05 · arXiv · Credibility S
AI data centers are increasingly becoming tightly coupled compute--energy systems, where workload placement, cooling demand, electricity procurement, storage operation, and carbon emissions interact over time. This paper studies carbon-aware compute--power scheduling for geographically distributed AI data centers with microgrid prosumer capabilities. We propose a mixed-integer linear programming (MILP) framework tha…
Abstract, interpretation and reference
Abstract
AI data centers are increasingly becoming tightly coupled compute--energy systems, where workload placement, cooling demand, electricity procurement, storage operation, and carbon emissions interact over time. This paper studies carbon-aware compute--power scheduling for geographically distributed AI data centers with microgrid prosumer capabilities. We propose a mixed-integer linear programming (MILP) framework that jointly schedules rigid training jobs, routes elastic inference workloads, dispatches local generation and battery storage, and manages bidirectional grid interaction under latency, continuity, power-balance, and carbon-budget constraints. The model captures two key features of emerging AI infrastructure: heterogeneous workload flexibility and site-level energy prosumer operation. Experiments on synthetic yet practically motivated instances show that the proposed joint MILP substantially improves total operational benefit over compute-only and energy-only baselines while reducing emissions. The results further indicate that inference-routing flexibility is a major source of value, battery storage provides useful temporal flexibility, and local-generation-rich settings are particularly favorable. The framework provides a tractable optimization abstraction for sustainable and grid-interactive AI data centers.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Johnny R. Zhang, Gaoyuan Du, Qianyi Sun, 等. Carbon-Aware Compute--Power Scheduling for AI Data Centers with Microgrid Prosumer Operations[J/OL]. (2026-05-05)[2026-05-30]. http://arxiv.org/abs/2605.03751v2.
Research Article算电协同
Fiaz Hossain、Nilanjan Ray Chaudhuri、Alok Sinha、Sai Gopal Vennelaganti、Mohammed E. Nassar
Published 2026-05-02 · arXiv · Credibility S
A framework is established that assesses the impact of variations in artificial intelligence (AI) data center (DC) loads on the fatigue damage of steam/gas turbines of the synchronous generators (SGs) from torsional oscillations. Next, a simple three-step process that is supported by frequency-domain analysis is laid out to quantify the limits on fluctuations in AI DC loads. In the first step, the maximum allowable …
Abstract, interpretation and reference
Abstract
A framework is established that assesses the impact of variations in artificial intelligence (AI) data center (DC) loads on the fatigue damage of steam/gas turbines of the synchronous generators (SGs) from torsional oscillations. Next, a simple three-step process that is supported by frequency-domain analysis is laid out to quantify the limits on fluctuations in AI DC loads. In the first step, the maximum allowable variation in electrical power output at each SG terminal is independently determined from the first principles. This step needs only a lumped multi-mass model of the mechanical side of the SG. In the second step, we propose a new approach that relies on load flow to determine the so-called algebraic `interaction factor' that maps the change in AI DC load at a given bus to the corresponding change in each of the SG power outputs. In the third step, we propose a screening method to rank the candidate buses to site AI DCs and solve an optimization problem to determine the optimal allowable fluctuations in the AI DCs. We demonstrate the applicability of the proposed approach through frequency-domain and time-domain analyses in the modified IEEE 4-machine and IEEE-68 bus systems using a dynamic phasor framework. Finally, we demonstrate the scalability of the proposed approach on the synthetic 2000-bus Texas system.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Fiaz Hossain, Nilanjan Ray Chaudhuri, Alok Sinha, 等. Limiting the Impact of AI Data Centers on Fatigue Life of Thermal Turbine Generators in the Grid: A Frequency-Domain Approach[J/OL]. (2026-05-02)[2026-05-30]. http://arxiv.org/abs/2605.01173v1.
Research Article热管理与液冷
Jacob Morrison、Noah A. Smith、Emma Strubell
Published 2026-05-02 · arXiv · Credibility S
Modern language model development extends far beyond pretraining, yet environmental reporting remains narrowly focused on the cost of training a single final model. In this work, we provide the first detailed breakdown of the environmental impact of a full model development pipeline, from pretraining through supervised fine-tuning, preference optimization, and reinforcement learning, for Olmo 3, a family of 7 billio…
Abstract, interpretation and reference
Abstract
Modern language model development extends far beyond pretraining, yet environmental reporting remains narrowly focused on the cost of training a single final model. In this work, we provide the first detailed breakdown of the environmental impact of a full model development pipeline, from pretraining through supervised fine-tuning, preference optimization, and reinforcement learning, for Olmo 3, a family of 7 billion and 32 billion parameter models in both instruction-following and reasoning variants. We find that reasoning models are 17x more expensive to post-train than their instruction-tuned counterparts in terms of datacenter energy, driven by reinforcement learning rollout generation. Development costs (including experimentation, failed runs, and ablations) account for 82.2% of total compute, a roughly 65% increase over the ~50% reported for pretraining-focused pipelines in prior work. In total, we estimate our model development process consumed ~12.3 GWh of datacenter energy, emitted 4,251 tCO2eq, and consumed 15,887 kL of water, with water consumption driven entirely by power generation infrastructure rather than data center cooling. These costs, which are almost entirely unreported by model developers, are growing rapidly as post-training pipelines become more complex, and must be accounted for in environmental reporting standards and by the research community working to reduce AI's environmental impact.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,液冷、热管理和数据中心能效正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用建模优化、调度分析或算法评估,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向冷却效率、能源利用或运维策略的改进方向。意义:对日报读者而言,它可用于判断液冷方案、热管理路线和高密度部署节奏。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Jacob Morrison, Noah A. Smith, Emma Strubell. The Hidden Cost of Thinking: Energy Use and Environmental Impact of LMs Beyond Pretraining[J/OL]. (2026-05-02)[2026-05-30]. http://arxiv.org/abs/2605.01158v1.
Research Article算电协同
Jiyong Lee、Melody Agustin、Joanne Langsdorf、Erhan Kutanolgu、Michael Baldea、Ilias Mitrai
Published 2026-05-28 · arXiv · Credibility S
In this paper, we consider the expansion of power grids under emerging large loads from data centers and electrified manufacturing. We develop a multi-period grid capacity expansion model to determine optimal investment profiles for power generation, storage, and transmission capacity while accounting for hourly power dispatch, such that electricity demand is satisfied and the total planning and operation cost is mi…
Abstract, interpretation and reference
Abstract
In this paper, we consider the expansion of power grids under emerging large loads from data centers and electrified manufacturing. We develop a multi-period grid capacity expansion model to determine optimal investment profiles for power generation, storage, and transmission capacity while accounting for hourly power dispatch, such that electricity demand is satisfied and the total planning and operation cost is minimized. We also propose a new modeling approach regarding the spatial distribution of demand from large loads. The model is used to analyze the expansion of a synthetic grid that follows key characteristics of the ERCOT system over a seven-year planning horizon, under loads from data centers and electrified oil refining, which account for 17.5% and 4.7% of total annual electricity demand by the end of the planning horizon. The optimal investment policy leads to an 83.6% increase in generation capacity and exploits the short construction times of solar and storage as well as the operational flexibility of thermal generators. Finally, sensitivity analysis reveals that the construction time of grid assets substantially impacts investment timing, generation technology mix, and transmission capacity expansion. The proposed modeling framework is general and can be extended to other grid systems, enabling the exploration of diverse demand scenarios, policy assumptions, and regional characteristics.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用框架构建和频域/系统级分析,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Jiyong Lee, Melody Agustin, Joanne Langsdorf, 等. Grid Capacity Expansion under Data Centers and Electrified Manufacturing Large Loads[J/OL]. (2026-05-28)[2026-05-30]. http://arxiv.org/abs/2605.29053v1.
Research Article算电协同
Denisa-Andreea Constantinescu、David Atienza
Published 2026-05-26 · arXiv · Credibility S
At global scale, data-center electricity demand is growing faster than the grids that supply it, while system operators increasingly require large flexible loads that can adjust power within seconds to absorb variable wind and solar generation. For multi-megawatt AI/HPC facilities, the key unresolved question is practical and measurable: how quickly can the software stack translate a grid request into a real change …
Abstract, interpretation and reference
Abstract
At global scale, data-center electricity demand is growing faster than the grids that supply it, while system operators increasingly require large flexible loads that can adjust power within seconds to absorb variable wind and solar generation. For multi-megawatt AI/HPC facilities, the key unresolved question is practical and measurable: how quickly can the software stack translate a grid request into a real change in GPU power at the facility meter, where commitments are settled? We answer this on real hardware with GridPilot, a three-tier predictive controller operating across milliseconds, seconds, and hours, augmented by a deterministic safety-island bypass for fast response. On a three-GPU NVIDIA V100 testbed, GridPilot achieves a measured end-to-end trigger-to-target response of 97.2 ms, which is 6.9x faster than the 700 ms requirement of Nordic Fast Frequency Reserve. We further incorporate an instantaneous Power Usage Effectiveness (PUE) correction so dispatched commitments remain robust at meter level rather than only at IT load level. In replay experiments across six representative European grids (from Sweden to Poland), the PUE-aware controller closes 2.5-5.8 percentage points of cooling-overhead drag. GridPilot is released as open source and serves as a proof of concept that MW-scale AI/HPC demand can be engineered as controllable, grid-responsive flexibility by design.
中文解读
背景:AI 数据中心负载、功率密度和能源约束同步上升,算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题:论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法:摘要显示作者采用实验验证、原型测试或测量对比,把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果:研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义:对日报读者而言,它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。
参考文献
Denisa-Andreea Constantinescu, David Atienza. GridPilot: Real-Time Grid-Responsive Control for AI Supercomputers[J/OL]. (2026-05-26)[2026-05-30]. http://arxiv.org/abs/2605.26384v1.