By Bob Haley and Chad Harrill
SUMMARY
Most if not all organizations are seeking to maximize their computing capabilities while controlling energy consumption. Today’s data-driven environment demands efficient and immediate access to applications, information data and data analysis. The optimization of data center energy infrastructure has emerged as a critical component in the reduction of a facility’s carbon footprint while working to meet digital demand. We will explore various strategies and technologies available today, that can be considered in a holistic data center consolidation and optimization approach: an approach that enhances digital resource availability, increases energy efficiency, scalability, redundancy, and overall operational effectiveness.
PRELUDE
This exploration provides an agnostic overview of the important role data center optimization plays in energy conservation, its significance in supporting digital operations and the desire of organizations to meet the ever-evolving demands for information. The individual technologies discussed will not solve the issues as a standalone solution, yet when combined offer organizations opportunities to increase their digital capabilities, while controlling overall energy consumption costs.
The purpose of this thought piece is to offer technical solutions to address the challenges facing data center operators, Information Technology (IT) managers or their organizations, and to raise awareness of the growing potential to reduce our carbon footprint, data center energy consumption, and to prompt dialogue between IT managers and data center operators. A collaborative solution can be developed when these teams work together to explore strategies for the consolidation, optimization and convergence of data center, IT hardware, applications, and facilities control systems.
The total number of global data center facilities is on the rise, resulting in an expected 35-40 gigawatts (GW) of electrical power consumption by 2030, more than doubling the 17 GW in 2022 to an anticipated 2.2% of the world’s power consumption, according to the McKinsey analysis article January 17, 2023. For reference, one gigawatt-hour (GWh) is equal to 1 million kWh. 1 GW could power approximately 876,000 households for one year if they collectively consume 10,000 kWh each as referenced in a study by Zach Stein, Carbon Collective, July 19, 2023.
The importance of the development and implementation of a comprehensive Facilities Operational Environmental Control Management (FOECM) strategy includes an IT systems Geographical Virtualization Load Balancing (GVLB) approach, coupled with the use of renewable energy sources, when possible, to reduce data centers current GW, and
anticipated overall global energy consumption.
DATA CENTER INFRASTRUCTURE AND CONSOLIDATION OPTIMIZATION
This naturally leads to exploring techniques to consolidate and optimize the physical infrastructure of the data center such as rack configurations, power, and cooling systems. Data center operators face numerous challenges including meeting their facilities electrical, cooling, and redundancy needs, while supplying a resilient robust and scalable facility. The digital mediums regularly used in our everyday lives continue to grow exponentially. Data center operators and IT managers are being asked to meet these demands, while controlling operational costs, maintaining energy efficiency, and offering highly resilient redundant infrastructures.
The opportunity exists for collaboration between the IT, facilities manager, and control systems vendors to develop and deploy a holistic Data Center Information Management (DCIM) that is consolidated into the data centers Building and Power Management Systems (BMS/PMS). DCIM focuses on the monitoring, managing, and optimizing of the physical infrastructure including power distribution, cooling systems and equipment management. DCIM incorporates real-time analytics for the facilities operators to see overall energy consumption, allowing decisions to be made that better optimize their facilities.
In larger more complicated facilities, the Supervisory Control and Data Acquisition (SCADA) manages industrial processes such as power generation and water treatment. The SCADA system offers a centralized monitoring platform to gather and utilize real-time information to manage operations using remote devices and Programmable Logic Controllers (PLC’s) with Human Machine Interfaces (HMI’s). SCADA and BMS systems utilize different network protocols to communicate. Modbus, DNP3, OPC are very prevalent in SCADA systems while BACnet is common in BMS
systems.
The integration of a SCADA system into an integrated optimization strategy will add a level of complexity for automated controls implementation. Although these network languages can communicate with the use of a gateway device, translation may be necessary adding another layer of complexity and potential security challenges into the overall optimization strategy. Integrating DCIM, BMS/PMS and IT systems provides an overarching configuration that enhances the operational capabilities while reducing multi-systems complexity that most data center operators encounter. This integrated approach results in improved coordination between data center cooling and electrical systems to reduce consumption thus leading to a reduction in costs.
Optimizing server, storage, and networking hardware is an ongoing and evolving process. As hardware and energy saving technology advances, equipment life cycle management analysis will become a large part of the strategy. New equipment will offer the latest in energy saving components as well as software to manage the equipment.
SOFTWARE DEFINED DATA CENTER (SDDC) IT INFRASTRUCTURES
The key components of a Software Defined Data Center (SDDCs) can create an agile and scalable data center IT hardware infrastructure. These components include Software Defined Network (SDN), Software Defined Data Storage systems (SDS) and Software Defined Computing (SDC). Virtualized server, data storage and network routing hardware has revolutionized data center IT hardware allocations by enabling the efficient utilization of server/data storage resources – a practice which allows underutilized and idle server/ storage demands to be load balanced, or dynamically reallocated to devices internal to the data center or another location. By reducing the total number of servers/storage systems operating, we can minimize overall IT physical infrastructure demand within the data center. The reduction of the total number of IT systems (network, server, and storage devices) decreases the amount of heat generated by this equipment, and in time diminishes the electrical and cooling requirements.
Server, storage, and network virtualization allows for the optimization of the physical hardware through the use of dynamic resource allocations capabilities of the computing resources. Adjusting the operations of the IT equipment based on true demand avoids over provisioning of equipment and thus reducing energy waste.
Deploying power management features, such as power capping, inherent within the server, storage, and networking the virtualization software platforms allows the IT Manger to automatically power off, or idle servers, network, and storage hardware when not in demand – and dynamically reroute applications. The software will also allow the operator to preconfigure and cap the overall data center power and cooling consumption. Dynamically reallocating IT application demand, during peak energy consumption hours, offers the data center operators the ability to consolidate their infrastructure. This reduces the facilities overall electrical and cooling requirements, thus greatly reducing overall or time-of-day energy consumption.
DCIM controlled power capping can be especially useful; this approach avoids power availability and cooling capacity constraints by prioritizing power allocation to critical systems, assisting in the control of overall energy costs and preventing thermal overload or outages within the data center. When administrators cap the total electrical usage of a facility, the automatic reallocation of IT demand to another internal or external infrastructure would be triggered once the threshold is met – controlling demand, electrical and cooling loads.
Convergence of the Information and Operational Technology (OT) platforms onto an IP based Ethernet platform (Layer 3) enhances the software consolidation capabilities of IT and facilities control systems. Operating on a shared communications platform allows the various systems to utilize a common IT podium to integrate monitoring and mitigation abilities for both IT and facilities system failures, reducing security risks.
In conjunction with the implementation of an optimization strategy, the purchase and use of energy saving equipment components such as high efficiency power supplies, fans and solid-state storage should be a key component. Right sizing servers and data storage systems for optimal virtualization allows the hardware to scale either vertically (internally) and horizontally (geographically) to meet the demands of the application without over provisioning. Utilizing horizontal demand allocation capabilities will require planning to ensure the data transmission media is capable of meeting any data synchronization or latency challenges.
In larger facilities, the use of higher voltage component power supplies (415/480v) could reduce the need for additional electrical supply transformation, thus reducing energy waste and heat generation within the server hall. The use of the higher voltage 480V circuits also reduces resistive I2R (copper) losses by a factor of four to five times compared to using 240V or 208V circuits. This is due to 480V circuits requiring less current to deliver the same amount of power thus reducing resistive heat generation. This approach is evidenced in the following research paper, Improving High Performance Computing Efficiency with 480V Power Supplies, by Giri Chukkapalli & Maria McLaughlin, Cray Inc. 2013. Combined with the use of power capping, the higher voltage would offer greater savings while maintaining optimal systems performance.
With the importance of real-time access to data, digital application managers are continually looking for ways to improve performance. At times, application managers oversize the Random Access to Independent Disk (RAID) storage configurations. Reanalyzing the need for RAID 5 configurations to utilize less demanding storage will reduce the overall storage capacity needed to meet the demand requirements, improve responsiveness of data access, and lessen the over provisioning storage systems.
Implementing Solid-State Disks (SSDs) either within the server or an independent data storage array in the facility offers considerable energy consumption savings
by reducing the heat generated within equipment. Although more expensive, the improved capabilities of SSD increase performance while reducing the overall carbon footprint of the equipment.
OPTIMIZING INFORMATION AND OPERATIONAL TECHNOLOGY
Three server rack cooling strategies have become a conventional implementation in today’s data centers. By separating the hot and cold air streams, physical aisle containment deployments minimize the mixture of hot and cold air, preventing hot spots and reducing energy waste. This approach provides the operations managers more precise control of cooling, improves equipment performance, and extends the life cycle of the equipment.
Hot aisle containment keeps equipment exhaust from mixing with the cool air by deploying physical barriers to keep the warm air contained and route the air to the
cooling units. This containment strategy introduces an issue for operators or maintenance staff who must work in a warm environment that can exceed 100 degrees Fahrenheit. Local and federal employee working environments requirements should be closely followed.
Cold aisle containment is less popular, yet remarkably similar to hot aisle containment, only holding the cold air the servers consume. However, in the event of catastrophic cooling failure, servers can overheat very quickly causing a total system failure. Deployment of a cold aisle only containment strategy introduces unnecessary operational risk. Instead, this should be a component of an overall optimization strategy.
With the advancement of real time Artificial Intelligence (AI) predictive analysis capabilities, operators can utilize this information to manage their risk more efficiently. A consolidated IT and operational systems optimization strategy will introduce the option to use an advanced containment cooling strategy. This is widespread practice to install either a hot or a cold aisle containment strategy.
Deploying a combined hot and cold aisle rack infrastructure cooling strategy offers the greatest opportunity for energy savings yet introduces risk to the server infrastructure; in the event of a cooling, either water or refrigerant based systems failure, servers would quickly consume the existing contained cool air and result in a catastrophic hardware system failure. However, as a component of a holistic facilities and IT infrastructure redeployment strategy, this risk can be mitigated.
CONCLUSION
Developing a server and IT hardware consolidation plan requires careful planning to ensure data synchronization, latency, security, and systems redundancy requirements
are contemplated. Considering workload distribution, virtual server settings, facilities interconnectivity speeds, and overall power management policies is crucial. Continued, regular monitoring and analysis will be necessary to adapt changing workloads and application requirements in the effort to strike a balance between power efficiency and the optimal performance of the critical systems.
Data centers manage critical and sensitive information, making security a critical factor in optimization efforts. Convergence of the DCIM, IT and OT networks onto a
single platform requires careful planning and continually monitoring. Efficient real-time monitoring and analytics are critical for identifying potential security issues, optimizing resources and performance bottlenecks predicting failures.
Insights and guidance in the development of a holistic strategy can be gained by continued exploration of the various aspects of data center optimization outlined in this paper. A more informed understanding and implementation of integrated strategies will lead to improved efficiencies, systems performance, redundancy, and the overall reduction of power consumption.
Bob Haley is Mission Critical Facilities Director at HDR. He can be reached at [email protected].
Chad Harrill is Mission Critical Facilities Project Manager at HDR. He can be reached at [email protected].