智算中心论文观察｜2026-06-22

Current Issue

Volume 2026 · Issue 06-22

按期刊卷期页方式整理本期论文。每条仅使用日报已列出的可追溯公开来源，不新增未经核验事实。

Research Article芯片与算力

System-Level Thermal Validation of 2.5D Packages in GPU Servers: Impact of TCB vs HCB HBM Platforms

Woohyun Park、Youchang Na、S. Hong、Yoko Tomo、H. Yu、Yanggyoo Jung、Gyungbum Kim、H. Kang

Published 2026-05-26 · Semantic Scholar · Credibility S

Abstract, interpretation and reference

Abstract

The scalability and long-term reliability of 2.5D System-in-Package (SiP) platforms are increasingly governed by complex thermal management requirements, particularly as the integration of High-Bandwidth Memory (HBM) introduces concentrated heat profiles that challenge the system’s operational limits. The package platform—Thermo-Compression Bonding (TCB) versus Hybrid Copper Bonding (HCB) of HBM—strongly influences intra- and inter-package thermal behavior. This work implements 2.5D system-in-package (SiP) thermal test vehicles (TTVs) in an Open Compute Project (OCP)-standard GPU server with embedded sensors and controllable heaters across HBM stacks and GPU dies, faithfully mirroring functional heterogeneous package floorplans. Experimental results demonstrate thermal nonlinearity - strong platform- and cooling-dependent. At 1030 W per package, HCB reduces intra-package GPU to HBM thermal crosstalk versus TCB by 2.2% under air cooling and 9.8% under liquid cooling, while inter-package thermal crosstalk varies by up to 13.7% across cooling conditions. Comparative evaluation confirms that HCB measurably improves thermal conduction, reducing both intra- and inter-package thermal resistance. From a data-center perspective, the reduction in GPU to HBM crosstalk resistance enables up to 0.9°C higher allowable coolant inlet temperature in liquid cooling relative to the TCB baseline, which translates to approximately 3% cooling power reduction and PUE improvement from 1.26 to 1.24. For a 1000-rack AI cluster, this corresponds to roughly 31 GWh annual energy savings. Measured thermal trends further indicate that as AI infrastructure evolves toward inference-heavy, memory-focused workloads with increased HBM base-die power, HCB platforms will deliver progressively larger thermal benefits due to the shift toward more vertical-resistance-limited behavior. This study establishes GPU server-integrated 2.5D SiP TTV methodology as a robust platform for system-level thermal validation and demonstrates that HBM platform selection directly impacts data-center operational efficiency and future inference scalability.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，芯片、服务器和高密度算力部署正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用实验验证、原型测试或测量对比，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向能效评价口径、运营指标和优化目标的系统化梳理。意义：对日报读者而言，它可用于判断芯片路线和服务器密度变化如何传导到机房设计。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Woohyun Park, Youchang Na, S. Hong, 等. System-Level Thermal Validation of 2.5D Packages in GPU Servers: Impact of TCB vs HCB HBM Platforms[J/OL]. Electronic Components and Technology Conference. (2026-05-26)[2026-06-22]. https://www.semanticscholar.org/paper/2ca4f8beb1ea19fe6d038cdee022de662a80ecd6.

Full text 中文海报

Research Article算电协同

Contextual Robust Optimization for AI Data Center Scheduling with Statistical Guarantees

Yijie Yang、Xi Weng、Yue Chen

Published 2026-06-16 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

The rapid growth of AI workloads is substantially increasing data center electricity demand and carbon emissions, motivating the development of carbon-aware scheduling methods. However, effective scheduling is challenging because renewable generation and AI workloads are subject to forecast errors, while training and inference workloads exhibit heterogeneity in computational characteristics. This paper proposes a contextual robust optimization framework for AI data center operation. The proposed model explicitly captures the heterogeneous computational characteristics of AI training and inference workloads. To deal with renewable generation and workload forecast errors, we develop loss-based uncertainty learning models that directly map contextual features to covariate-dependent uncertainty sets. The resulting contextual joint chance-constrained scheduling problem is reformulated into a tractable robust optimization problem, and a calibration algorithm is developed to provide finite-sample probabilistic feasibility guarantees for multiple joint chance constraints. Numerical experiments based on real-world AI workload traces and renewable generation data show that the proposed method reduces operating costs by an average of 5.57% compared to benchmark methods while maintaining reliable feasibility and strong computational scalability.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用建模优化、调度分析或算法评估，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向跨地域数据中心负载与电力资源之间的调度关系。意义：对日报读者而言，它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Yijie Yang, Xi Weng, Yue Chen. Contextual Robust Optimization for AI Data Center Scheduling with Statistical Guarantees[J/OL]. (2026-06-16)[2026-06-22]. http://arxiv.org/abs/2606.17466v1.

Full text 中文海报

Research ArticleAI 运维优化

Energy-Aware Computing in the Year 2026

Roblex Nana Tchakoute、Claude Tadonki

Published 2026-05-23 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

High-Performance Computing (HPC) has recently entered the Exascale era, and considerable efforts are being made to fully harness this potential power for large-scale applications, such as cutting-edge generative AI (training and exploitation). The corresponding energy consumption is very high, and forecasts are alarming, making this metric a critical systemic bottleneck. Addressing this issue presents a genuine challenge for the entire cloud-edge-HPC continuum at all scales, from low-power IoT microcontrollers to multi-megawatt data centers. Beyond financial costs, green computing is driven by considerations related to climate change and environmental concerns such as carbon footprint ($CO_2e$), as well as constraints on energy production and supply, leading to a real need to regulate {\em information and communication technology} (ICT) activities. This article presents a comprehensive overview of energy-efficient computing, taking into account the most recent and significant contributions. Based on this exploration of the state of the art, we design and describe a holistic taxonomy of the aforementioned publications, structured around various perspectives, including {\em hardware and software aspects, measurement instrumentation, software optimizations, dynamic task scheduling, voltage scaling, workload consolidation, federated learning}, and {\em cooling}. Particular emphasis is placed on large-scale AI, which receives significant attention due to its considerable resource requirements. We conclude with an analysis of a forward-looking roadmap that considers the main perspectives of sustainable computing.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，AI 运维、负载预测和设施调优正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用建模优化、调度分析或算法评估，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向能效评价口径、运营指标和优化目标的系统化梳理。意义：对日报读者而言，它可用于判断AI 工具是否能降低运维复杂度并提升可用性。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Roblex Nana Tchakoute, Claude Tadonki. Energy-Aware Computing in the Year 2026[J/OL]. (2026-05-23)[2026-06-22]. http://arxiv.org/abs/2605.24569v1.

Full text 中文海报

Research Article芯片与算力

AI-on-Chip Systems: A Cross-Layer Review of Architectures, Interconnects, Design Automation, and Embedded Intelligence

Mohamed M. Morsy

Published 2026-06-15 · Semantic Scholar · Credibility S

Abstract, interpretation and reference

Abstract

The rapid growth of artificial intelligence (AI) workloads is reshaping semiconductor design across architecture, interconnect, memory hierarchy, packaging, timing, and design automation. Rather than converging on a single hardware solution, the field is expanding into a heterogeneous ecosystem that includes data-center graphics processing units (GPUs), edge neural processing units (NPUs), and application-specific integrated circuits (ASICs), field-programmable gate array (FPGA)-based and hybrid AI system-on-chip (SoC) platforms, chiplet-enabled systems, and emerging beyond-conventional-silicon approaches such as photonic, neuromorphic, and analog in-memory processors. This paper presents a comprehensive review of AI-on-chip systems from a cross-layer perspective. It examines AI chip architectures and hardware platforms, network-on-chip (NoC) designs for AI communication patterns, and algorithm–hardware co-design methods for model acceleration, including compression, quantization, and sparsity-aware optimization. It also reviews clocking, synchronization, and clock-domain-crossing (CDC) challenges in large heterogeneous systems and chiplets, as well as manufacturing, advanced packaging, and reliability issues, including two-and-a-half-dimensional (2.5D) and three-dimensional (3D) integration, thermal and mechanical constraints, assembly quality, and long-term yield considerations. In parallel, the paper surveys the growing role of AI in chip design itself, covering machine-learning-assisted analysis, Bayesian and reinforcement-learning-based optimization, and the emerging use of large language models (LLMs) and AI agents for register-transfer level (RTL) generation, design-space exploration, and autonomous electronic design automation (EDA) workflows. Finally, it discusses beyond-silicon AI chip directions and the broader economic and industry context shaping cloud, on-premises, and edge deployment. By integrating these topics into a unified framework, this review highlights the key technological drivers, system-level tradeoffs, and future research directions that will define next-generation scalable, reliable, and energy-efficient AI-on-chip systems.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，芯片、服务器和高密度算力部署正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用综述归纳和指标比较，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向跨地域数据中心负载与电力资源之间的调度关系。意义：对日报读者而言，它可用于判断芯片路线和服务器密度变化如何传导到机房设计。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Mohamed M. Morsy. AI-on-Chip Systems: A Cross-Layer Review of Architectures, Interconnects, Design Automation, and Embedded Intelligence[J/OL]. Electronics. (2026-06-15)[2026-06-22]. https://www.semanticscholar.org/paper/6559f17a3e4aaa83cbf55ab2f8c0657056399288.

Full text 中文海报

Research Article算电协同

From Tokens to Energy Flexibility: Quantization-Enabled Demand Response for Data Centers with LLM Inference Workloads

Bojun Du、Xiaoyi Fan、Ershun Du、Long Chen、Jianpei Han、Qingchun Hou、Ning Zhang、Chongqing Kang

Published 2026-06-17 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

The rapid growth of large language model (LLM) inference is creating significant data-center loads that face increasing energy-management challenges under tightening grid conditions and demand response (DR) requirements. Conventional data-center energy management mainly relies on temporal and spatial workload shifting and campus-level energy asset scheduling, but it usually treats LLM inference demand as an aggregate load. As a result, these approaches fail to exploit the internal characteristics of LLM serving and therefore overlook the flexibility offered by LLM-specific techniques such as model quantization. To unlock this flexibility, this paper proposes a quantization-enabled energy management framework for grid-responsive LLM inference data centers. First, a quantization-to-power model is established to map each model--quantization configuration to a compact set of dispatchable parameters. Second, a two-stage quantization-enabled DR model is developed to account for model instance switching, request routing, and precision selection. Third, a multi-campus co-optimization method is introduced for DR participation by integrating grid-side electricity and carbon signals with the quantization-enabled DR model. Case studies show that the proposed framework reduces total data-center operating cost by 34.3\% without curtailing served token volume, validating model quantization as an effective flexibility lever for grid-responsive LLM data-center energy management.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用建模优化、调度分析或算法评估，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义：对日报读者而言，它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Bojun Du, Xiaoyi Fan, Ershun Du, 等. From Tokens to Energy Flexibility: Quantization-Enabled Demand Response for Data Centers with LLM Inference Workloads[J/OL]. (2026-06-17)[2026-06-22]. http://arxiv.org/abs/2606.18851v1.

Full text 中文海报

Research Article芯片与算力

Wafer-Level Integrated 1200 V SiC MOSFET Package with Room-Temperature Wafer Bonding and Embedded Microfluidic Cooling

Jiajing Nie、Jiuyang Tang、Hao Guan、Xinyue Wang、Tao Jiang、Junran Zhang、Guoqi Zhang、Guangyin Lei

Published 2026-05-26 · Semantic Scholar · Credibility S

Abstract, interpretation and reference

Abstract

The rising demand for high-power semiconductor devices in sectors such as electric vehicles (EVs), renewable energy conversion, and data centers highlights the need for efficient and reliable thermal management technologies. In this work, we present a simulation-based study of a 1200 V SiC MOSFET wafer-level power package that integrates chip–package co-design, room-temperature wafer bonding, and embedded microfluidic cooling. By utilizing a room-temperature bonding process to mitigate fabrication-induced warpage and optimizing the chip geometry to balance thermal spreading with mechanical stress, this proposed architecture ensures structural integrity while maximizing heat transfer efficiency. Thermal-fluid-mechanical multiphysics modeling results revealed that the proposed wafer-level microfluidic package achieved a 35.14% reduction in total thermal resistance compared with conventional SiC MOSFET power modules. The design demonstrates improvements in junction temperature uniformity and overall heat dissipation efficiency, which is promising for next-generation high-power density applications.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，芯片、服务器和高密度算力部署正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用仿真建模和情景分析，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向算力硬件、边缘计算或模型部署对基础设施的牵引。意义：对日报读者而言，它可用于判断芯片路线和服务器密度变化如何传导到机房设计。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Jiajing Nie, Jiuyang Tang, Hao Guan, 等. Wafer-Level Integrated 1200 V SiC MOSFET Package with Room-Temperature Wafer Bonding and Embedded Microfluidic Cooling[J/OL]. Electronic Components and Technology Conference. (2026-05-26)[2026-06-22]. https://www.semanticscholar.org/paper/11fa662b073d777b3f9125fd8ef8a3bb5cf601cc.

Full text 中文海报

Research Article算电协同

Revisiting "Cooler is Better": ITD-Aware Per-CPU Thermal Optimization for Sustainable Data Center Operation

Jason Crop、Hayden Moore、Sudeep Pasricha

Published 2026-06-10 · arXiv · Credibility S

Abstract, interpretation and reference

Abstract

As data center energy demand approaches grid-level constraints, optimizing conventional server infrastructure is essential for sustainable growth. The long-standing assumption that "cooler is better", i.e., lower CPU temperatures reduce power, does not fully hold for modern low-voltage CPUs, where inverse temperature dependence (ITD) drives higher supply voltages at lower temperatures. This creates a non-monotonic performance-per-watt curve where efficiency peaks at an intermediate thermal point. In this paper, for the first time, we empirically characterize ITD on production Intel Xeon CPUs and demonstrate that efficiency-optimal temperatures are CPU part-specific, and frequently higher than typical data center operating conditions. Measurements from commercial cloud data center platforms (Amazon, Equinix) reveal that approximately half of modern high-power CPUs operate about 10°C below their efficiency-optimal thermal point. By implementing ITD-aware thermal grouping of CPUs and inlet temperature adjustments, data center operators can optimize facility-level cooling and overall sustainability. Our case study shows that this approach can reduce total data center energy by 4-13% without sacrificing performance or reliability.

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，算力负载与电网侧资源的协同调度正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用建模优化、调度分析或算法评估，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向AI 负载波动对电网设备寿命和调频边界的影响。意义：对日报读者而言，它可用于判断智算中心建设是否受电网容量、负载波动和调度机制约束。仍需结合全文实验条件、样本范围和成本假设核验。

参考文献

Jason Crop, Hayden Moore, Sudeep Pasricha. Revisiting "Cooler is Better": ITD-Aware Per-CPU Thermal Optimization for Sustainable Data Center Operation[J/OL]. (2026-06-10)[2026-06-22]. http://arxiv.org/abs/2606.11163v1.

Full text 中文海报

Research Article芯片与算力

Heat transfer and flow characteristics of bionic Victoria Amazonica liquid cooling plate for thermal management of chips in data centers

Feng Zhou、Wenlong Gu、Wenlong Li、G. Ma

Published 2026-06-01 · Semantic Scholar · Credibility S

Semantic Scholar 未提供可展示的原文摘要；请打开论文链接查看全文摘要。

Abstract, interpretation and reference

Abstract

Semantic Scholar 未提供可展示的原文摘要；请打开论文链接查看全文摘要。

中文解读

背景：AI 数据中心负载、功率密度和能源约束同步上升，芯片、服务器和高密度算力部署正在成为智算中心设计的关键变量。问题：论文聚焦现有方案在效率、可靠性或工程协同上的瓶颈。方法：摘要显示作者采用文献摘要中的模型、实验或案例分析，把运行负载、冷却/能源系统和基础设施约束放在同一分析框架中。结果：研究重点指向算力硬件、边缘计算或模型部署对基础设施的牵引。意义：对日报读者而言，它可用于判断芯片路线和服务器密度变化如何传导到机房设计。摘要缺失，建议优先打开原文查看方法、数据和边界条件。

参考文献

Feng Zhou, Wenlong Gu, Wenlong Li, 等. Heat transfer and flow characteristics of bionic Victoria Amazonica liquid cooling plate for thermal management of chips in data centers[J/OL]. International Communications in Heat and Mass Transfer. (2026-06-01)[2026-06-22]. https://www.semanticscholar.org/paper/11f6857398316b362b30dcdbd0b233df7100bb1e.

Full text 中文海报

智算中心论文专站

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献

Abstract

中文解读

参考文献