Microsoft and Meta join Google in using AI to help run their data centers – TechCrunch
Data centers, which drive the apps, websites, and services that billions of people use every day, can be hazardous places for the workers that build and maintain them. Workers sometimes have to service a data center’s electrical equipment while it’s being energized. And they can become exposed to chemicals like chlorine, which is used as a sterilizing agent for the water circulated through liquid cooling systems for computers and servers. In June 2015, five people had to be taken to a hospital after a chlorine gas leak at an Apple data center in Maiden, North Carolina.
Data centers are safer than they used to be. But in search of forward-looking solutions, some tech giants say that they’re exploring how AI can be applied to prevent safety issues. For example, Microsoft is developing an AI system that analyzes data from a range of sources and generates alerts for data center construction and operations teams to “prevent or mitigate the impact of safety incidents.” A complementary but related system, also under development, attempts to detect and predict impacts to data center construction schedules.
“These initiatives are both in early testing phases and are expected to begin expanding into our production environments later this year,” a Microsoft spokesperson told TechCrunch via email.
Meta also claims to be investigating ways AI can anticipate how its data centers will operate under “extreme environmental conditions” that might lead to unsafe work environments. The company says that it has been developing physical models to simulate extreme conditions and introducing this data to the AI models responsible for optimizing power consumption, cooling, and airflow across its servers.
“We have significant operational data from our data centers, in some areas at high frequency with built-in sensors in servers, racks, and in our data halls,” a Meta spokesperson told TechCrunch. “Each server and network device, taking on different workloads, will consume different amounts of power, generate different amounts of heat, and make different amounts of airflow in the data centers. Our [infrastructure] team collects all the data from each server and then develops AI models which can allocate our servers and racks in the data centers and send workloads into these servers to optimize [for] performance and efficiency.”
Of course, companies have motivations aside from safety to ensure data centers remain in peak condition. Outages are expensive — and are becoming more frequent. According to a 2020 survey by the IT Uptime Institute, an IT consulting firm, a third of data center owners and operators admitted to experiencing a major outage over the past 12 months. One in six claimed that their outage cost them more than $1 million — up from one in ten in 2019.
Meta has more than 20 data centers in operation around the world, including new projects in Texas and Missouri estimated to cost $1.6 billion combined. Microsoft, meanwhile, manages more than 200 data centers, and says it’s on pace to build between 50 to 100 new data centers each year for the foreseeable future.
AI also promises to find opportunities for energy — and therefore cost — savings in the data center that normally fly under the radar, another appealing aspect for corporations. In 2018, Google claimed that AI systems developed by its DeepMind affiliate were able to deliver energy savings of 30% on average compared to its data centers’ historical energy usage.
When reached for comment, DeepMind said that it had no updates to share beyond the initial announcement. IBM and Amazon didn’t respond to inquiries. But both Meta and Microsoft say they’re now using AI for similar energy-tuning purposes.
Microsoft launched AI “anomaly detection methods” in late 2021 to gauge and mitigate unusual power and water usage events within the data center, using telemetry data from electrical and mechanical devices. The company is also using AI-based approaches to identify and fix issues with power meters in the data center, and to identify ideal spots to place servers in order to minimize wasted power, network, and cooling capacity.
Meta, for its part, says that it has been leveraging reinforcement learning to reduce the amount of air it pumps into data centers for cooling purposes. (At a high level, reinforcement learning is a type of AI system that learns to solve a problem by trial and error.) Most of the company’s data centers use outdoor air and evaporative cooling systems, making optimizing airflow a high priority.
The reduced environmental footprint is an added benefit of energy-regulating AI systems. Data centers consumed about 1% of the global electricity demand and contributed to 0.3% of all carbon dioxide emissions in 2020, according to a report from the Environmental Investigation Agency. And the typical data center uses 3 million to 5 million gallons of water per day, the same amount of water as a city of 30,000 to 50,000 people.
Microsoft has previously said that it plans to have all of its data centers running on 100% renewable energy by 2025. Meta claimed to have achieved the feat in 2020.