Planning and Design of Liquid Cooling Systems for Hyperscale Data Centers: Fundamentals for Mission-Critical Facilities
Context: The Technical Imperative of Liquid Cooling in 2026
The global liquid cooling market for data centers reached USD 6 billion in 2026 and projects sustained growth with a compound annual growth rate (CAGR) of 18.2% through 2035, when it is estimated to reach USD 27.1 billion (Global Market Insights, 2026). This growth does not respond to speculative trends, but to an inescapable technical necessity: thermal load densities in processing racks have exceeded the operational limit of conventional air cooling.
While traditional configurations operated with densities between 5 and 15 kW per rack, contemporary artificial intelligence and high-performance computing (HPC) installations demand systems capable of dissipating between 30 and 120 kW per rack (Uptime Institute, 2024). GPU processors used in AI clusters generate between 500 and 700 watts per unit, and chip thermal design power (TDP) has surpassed 280 watts, with projections exceeding 700 watts by 2025 (Energy and Built Environment, 2024).
Air cooling presents insurmountable physical limitations in these scenarios. The thermal transfer capacity of air is approximately 1,000 times lower than that of liquid fluids, which implies exponentially greater requirements for air volume, flow velocity, and fan power to achieve the same heat dissipation results. In operational terms, this translates to unsustainable increases in PUE (Power Usage Effectiveness) and total energy consumption.
The planning and conceptual design phase determines the performance, energy efficiency, and economic viability of the entire cooling infrastructure throughout its operational lifecycle. Reaclima, with over 50 years of experience and executed projects such as Foxconn GDL Vesta 8 and Amazon AWS Querétaro data centers, applies engineering methodologies based on international standards and structured risk analysis to guarantee installations that meet operational continuity, energy efficiency, and regulatory compliance objectives.
Performance Objectives Definition: PUE, Load Density, and Operating Temperature
The first step in designing a liquid cooling system consists of establishing quantifiable performance objectives. These objectives determine the system architecture, applicable technologies, and required redundancy levels.
Power Usage Effectiveness (PUE)
PUE quantifies the total energy efficiency of the data center through the relationship between total energy consumed by the facility and energy consumed by IT equipment. A PUE of 1.0 represents perfect theoretical efficiency, where all energy is dedicated to processing. Conventional facilities with air cooling operate with PUE between 1.5 and 2.0, while well-designed liquid cooling systems achieve values between 1.05 and 1.2.
German EnEfG regulation establishes PUE objectives ≤1.3 for new data centers starting in 2027, with targets of 1.2 for facilities built from 2026 (Global Market Insights, 2026). ASHRAE Standard 90.4 defines the Mechanical Load Component (MLC) as a complementary metric to evaluate the efficiency of mechanical cooling systems, establishing performance thresholds that guide equipment selection (ASHRAE, 2024).
In projects such as the one executed for Amazon AWS Querétaro, Reaclima integrated cooling systems that allowed achieving PUE objectives consistent with international standards, through the combination of indirect evaporative cooling technologies, air economizers, and zone-based thermal management. The transition to liquid cooling in extreme density installations allows projecting PUE below 1.15 under design conditions.
Thermal Load Density per Rack
Load density per rack determines the applicable liquid cooling technology. Current configurations are classified into three categories:
-
Moderate density (15-30 kW/rack): Can be served with rear-door heat exchangers (RDHx) or cold plates on specific components, maintaining air cooling for peripheral equipment.
-
High density (30-80 kW/rack): Require direct-to-chip cooling (D2C) through cold plates installed on CPUs, GPUs, memory modules, and voltage regulators. These systems remove between 70% and 80% of thermal load directly at the point of generation.
-
Extreme density (>80 kW/rack): Demand single-phase or two-phase immersion, where complete servers are submerged in dielectric fluids. Two-phase immersion allows achieving PUE of 1.02-1.03 through evaporation and condensation of the refrigerant fluid.
The design must consider not only average density, but transient thermal load peaks. AI and HPC models present significant load variations according to the type of processing executed, which demands systems with dynamic response capacity.
Operating Temperature Ranges: ASHRAE Thermal Guidelines and H1 Envelope
ASHRAE Technical Committee 9.9 publishes the Thermal Guidelines for Data Processing Environments, which define recommended and allowable temperature and humidity ranges for different classes of IT equipment. The 2021 edition introduced the H1 envelope, a new class of high-density equipment that integrates processors, accelerators, memory, and network controllers in thermally demanding configurations.
The H1 envelope specifies narrower recommended air temperature ranges than conventional classes, with upper limits of 25°C (77°F) in the allowable range. However, liquid cooling systems operate directly on components, which allows raising ambient air return temperatures without compromising chip junction temperature.
In installations with liquid cooling, thermal design must consider:
-
Coolant supply temperature: Typically between 18°C and 30°C, depending on technology. Higher supply temperatures allow greater utilization of free cooling and reduction of compression work in chillers.
-
Delta T (difference between supply and return): Values between 8°C and 15°C are common. A higher Delta T reduces the required volumetric flow and pump consumption, but may require larger exchange surfaces in cold plates.
-
Condensing/heat rejection temperature: Determines the efficiency of chillers or cooling towers. Favorable climatic conditions allow operating with reduced condensing temperatures, improving system COP (Coefficient of Performance).
System Architecture: FWS and TCS Demarcation
ASHRAE TC 9.9 recommends the use of Coolant Distribution Units (CDU) to establish physical and functional demarcation between the Facility Water System (FWS) and the Technology Cooling System (TCS). This separation presents operational and risk management advantages:
The FWS comprises the facility-level cooling infrastructure: chillers, cooling towers, primary piping, distribution pumps, and water treatment systems. This system operates with standard process water and handles significant fluid volumes.
The TCS is limited to the secondary circuit that connects the CDU with cold plates or immersion systems installed in racks. This circuit operates with specialized fluids (water-glycol mixtures, dielectric fluids) in a closed low-volume loop. TCS pressure and temperature are regulated independently, allowing fine adjustments without impacting FWS operation.
The CDU functions as a heat exchanger between both systems and as a monitoring, filtration, and coolant quality control point. Modern units integrate:
- High-efficiency plate heat exchangers
- Variable speed pumps for flow adjustment
- Filtration systems (typically 5-10 microns)
- Temperature, pressure, flow, and coolant quality sensors
- Automatic isolation valves upon anomaly detection
- Thermal expansion and volume compensation systems
Current CDUs reach capacities of 2 MW per unit, which allows serving between 25 and 40 high-density racks with a single unit (Data Center Dynamics, 2025). Providers such as Schneider Electric, Vertiv, CoolIT, and Rittal offer scalable solutions with modular architecture.
Risk Management: Thermal Inertia, Redundancy, and Transient Modeling
Liquid cooling systems for mission-critical installations must incorporate risk mitigation strategies from the design phase.
Thermal Inertia Enhancement
ASHRAE TC 9.9 recommends increasing system thermal inertia to avoid thermal damage from abrupt load changes or temporary cooling loss. Thermal inertia is achieved through:
- TCS coolant volume: Increasing the total volume of the secondary loop through larger diameter piping or thermal expansion tanks.
- Thermal storage capacity: Integration of chilled water tanks or phase change materials (PCM) that act as thermal buffers.
- Component thermal mass: Materials with high specific heat capacity (copper, aluminum) in cold plates and heat sinks increase response time to thermal transients.
In critical installations such as financial or telecommunications data centers, cooling loss can cause thermal throttling of processors in less than 60 seconds, with potential thermal shutdown in 2-3 minutes. Increasing thermal inertia extends this margin to 5-10 minutes, allowing activation of redundant systems or load migration protocols.
Active Redundancy in CDUs and Pumps
Redundancy must be designed as active redundancy (N+1 or 2N) instead of passive redundancy (standby). In N+1 configuration, all CDUs operate simultaneously at partial capacity, so that the loss of one unit is compensated by automatic load increase in remaining units, without service interruption.
Circulation pumps must be installed in redundant configuration with automatic isolation valves. Advanced systems incorporate variable speed pumps with PID (Proportional-Integral-Derivative) control that adjust flow according to actual thermal demand, reducing energy consumption during partial load periods.
Transient Modeling and Predictive Validation
ASHRAE recommends executing transient modeling to verify the performance of systems and components that lack empirical data from previous testing (ASHRAE TC 9.9, 2026). Transient modeling uses computational fluid dynamics (CFD) software and thermal simulation to predict:
- System response to sudden load changes (AI processing startup, VM migration)
- Thermal stabilization time after CDU or pump loss
- Temperature distribution in racks and localized hot spots
- Effectiveness of residual hot air containment strategies
Reaclima employs LiDAR scanning technologies and BIM (Building Information Modeling) modeling to capture the exact geometry of installations before executing work. Digital models are integrated with CFD simulations to validate piping design, CDU location, and coolant distribution strategies, reducing the risk of interferences during construction and optimizing commissioning time.
Regulatory Compliance: ASHRAE 90.4, NOM-035-ENER-2025, and ISO Certifications
Design must guarantee compliance with energy efficiency standards and applicable national regulations.
ASHRAE Standard 90.4: Energy Standard for Data Centers
ASHRAE 90.4 establishes minimum energy efficiency requirements for mechanical systems installed in data centers. The standard defines three compliance segments: process cooling, process heating, and process ventilation. For liquid cooling systems, the process cooling segment is determinant.
The standard employs the concept of Mechanical Load Component (MLC), calculated as the relationship between annual cooling, ventilation, and pumping energy divided by annual IT equipment energy. Systems must be designed to comply with maximum MLC values according to installation configuration and geographic location.
The use of economizers (direct air-side, indirect evaporative, waterside) allows reducing MLC by taking advantage of favorable climatic conditions. In regions with low wet-bulb temperatures throughout the year, such as Querétaro or Hermosillo, economizers can operate during 60-70% of annual hours, reducing dependence on mechanical cooling.
NOM-035-ENER-2025: Energy Efficiency in Unitary Air Conditioners
The Mexican Official Standard NOM-035-ENER-2025, published in the Official Gazette of the Federation on August 20, 2025, establishes minimum levels of Integrated Energy Efficiency Ratio (IEER) for unitary air conditioners with nominal capacities between 19,050 W (65,000 BTU/h) and 70,340 W (240,000 BTU/h). Although this standard applies specifically to commercial air conditioning equipment, its entry into force on February 15, 2026 reflects Mexico's regulatory commitment to energy efficiency in high-demand installations.
Data centers incorporating hybrid systems (liquid cooling for high-density components, air conditioning for peripheral equipment) must verify that unitary air equipment complies with IEER requirements, test methods, and labeling established in the standard.
ISO 50001 Certifications and Energy Management
Reaclima operates under ISO certifications that guarantee standardized quality management and energy efficiency processes. Liquid cooling projects are developed following energy management protocols aligned with ISO 50001, which establish procedures for measurement, analysis, and continuous improvement of energy performance.
Existing Infrastructure Integration Considerations
Many installations face the challenge of incorporating liquid cooling in data centers originally designed for air cooling. Retrofitting presents restrictions in space, electrical capacity, and service distribution.
Design must consider:
- Cooling generation capacity: Existing chillers may require expansion or replacement to handle increased thermal loads. Heat rejection capacity in cooling towers must be evaluated.
- Piping distribution: Primary (FWS) and secondary (TCS) piping routing must avoid interferences with cable trays, fire protection piping, and electrical systems. Use of BIM allows identifying conflicts before construction.
- Raised floor height: Liquid cooling systems require under-floor space for piping, manifolds, and quick connections. Raised floor heights below 45 cm may present restrictions.
- UPS capacity and electrical distribution: Circulation pumps and CDUs consume electrical power. Although total consumption is lower than air systems, available capacity in UPS system and PDUs must be verified.
Conclusion: Precision Engineering for Operational Continuity
Planning and design of liquid cooling systems for hyperscale data centers demands technical rigor, adherence to international standards, and predictive performance analysis. PUE objectives, load density, operating temperature, and redundancy must be quantitatively defined from the conceptual phase, guiding architecture decisions, technology selection, and system configuration.
Reaclima applies engineering methodologies based on ASHRAE TC 9.9, ASHRAE 90.4, and national regulations such as NOM-035-ENER-2025, integrating BIM modeling tools and CFD simulation to validate design before execution. The experience accumulated in mission-critical projects such as Foxconn GDL and Amazon AWS Querétaro supports the capability to deliver installations that guarantee operational continuity, energy efficiency, and regulatory compliance.
Does your installation face load densities that exceed air cooling capacity? Let's discuss liquid cooling options that align with your operational and budgetary objectives.
References
- ASHRAE Technical Committee 9.9. (2026). Technical Bulletin: Liquid Cooling Resiliency. American Society of Heating, Refrigerating and Air-Conditioning Engineers.
- ASHRAE. (2024). ANSI/ASHRAE/IES Standard 90.4-2022: Energy Standard for Data Centers.
- Energy and Built Environment. (2024). Liquid cooling of data centers: A necessity facing challenges. ScienceDirect.
- Global Market Insights. (2026). Data Center Liquid Cooling Market Size & Share 2026-2035.
- Uptime Institute. (2024). Data Center Rack Power Density Trends 2022-2024.
- Diario Oficial de la Federación. (2025). Mexican Official Standard NOM-035-ENER-2025: Energy efficiency in unitary air conditioners.