Scaling AI Sustainably

NAE Login

Attention NAE Members

Starting June 30, 2023, login credentials have changed for improved security. For technical assistance, please contact us at 866-291-3932 or helpdesk@nas.edu. For all other inquiries, please contact our Membership Office at 202-334-2198 or NAEMember@nae.edu.

Members Only

Click here to login if you're an NAE Member

Forgot Username or Password

Recover Your Account Information

Reset Password

Download PDF

Winter Bridge on The Grainger Foundation Frontiers of Engineering

December 13, 2024 Volume 54 Issue 4

This issue features articles by The Grainger Foundation US Frontiers of Engineering 2024 symposium participants. The articles examine cutting-edge developments in microbiology and health, artificial intelligence, the gut-brain connection, and digital twins.

Scaling AI Sustainably

Thursday, December 12, 2024

It is essential that AI, the 21st century’s most important technology, be developed with sustainability in mind.

The past 50 years have seen a dramatic increase in the amount of compute capability per person. Since the introduction of the first commercially available microprocessor—the Intel 4004 released in 1971—the number of transistors in a computer die has increased by over a millionfold. Accelerating knowledge discovery in science and engineering demands even higher computation capability beyond what a single microprocessor can offer.

High-performance computing infrastructures, from supercomputers solving weather forecasting, molecular modeling, and other complex computational problems to AI training clusters, connect tens of thousands of processors using advanced networking gears to further scale up computation capability. For example, the Frontier supercomputer—the world’s first exascale supercomputer—comes with more than 37,000 graphics processing units (GPUs) to deliver more than one quintillion calculations per second. And, to propel our reach to artificial general intelligence, AI superclusters, such as Google’s Tensor Processing Unit-based^{^[1]} AI Hypercomputer or Meta’s Research SuperCluster,^{^[2]} also rely on horizontal scaling to achieve high performance, scalability, and efficiency.

With data centers of domain-specific hardware specialization, the computing industry has achieved many orders-of-magnitude efficiency improvements through decades of technological innovations. Such efficiency improvement is key to keeping the energy use of data centers globally constant. Between 2010 and 2018, the compute instances in global data centers increased by 5.5 times while the overall energy use increased by merely 0.06 times (Masanet et al. 2020). AI is revolutionizing the entire industry, from education and medicine to e-commerce, finance, and entertainment. OpenCatalyst^{^[3]} uses AI to discover new electrocatalysis for efficient renewable energy storage. AlphaFold^{^[4]} uses AI to rapidly predict protein structures that have the potential to revolutionize the entire biological science domain. FarmBeats^{^[5]} uses AI to improve farming efficiency. AI is changing the way we live, learn, communicate, and interact with each other in a profound manner.

Despite the positive societal benefits, the development of AI technologies necessitates an increase in data center energy use and associated greenhouse gas emissions from the rising demand for computing resources. Between 2019 and 2021, the amount of data used for AI model training increased by more than two times, corresponding to a more than 20 times model size increase (Wu et al. 2022). In fact, since the inception of AlexNet in 2012, the number of parameters of the state-of-the-art AI models has been increasing exponentially into the scale of trillions of parameters and requiring terabytes of memory capacity for storage. The AI scaling trend is pushing the frontier of computing infrastructures.

The first step is to understand AI’s carbon impact across its lifecycle holistically. AI’s lifecycle carbon impact comes from use of AI systems, called operational carbon footprint, as well as from manufacturing of AI systems and data center construction materials, called embodied carbon footprint.

Operational Carbon Footprint

Operational carbon footprint is defined as the carbon emissions (or carbon dioxide equivalent [CO₂e]) associated with the electricity consumed. For a key production recommendation model, the energy consumption breakdown is roughly 30:30:40 over the phases of data, experimentation/training, and inference. For example, the operational carbon footprint of llama3 model training^{^[6]} is estimated to produce 2,290 tonnes of carbon emissions using the GPU thermal design power of 700W and the average emission factor of the US grids. In reality, AI workflows typically involve dozens, if not hundreds, of training runs to produce a final model. Once a model is trained to meet the desirable accuracy level, it is further optimized to serve various product use cases. In the case of the language translation model, the inference footprint can double the training carbon footprint over its entire model lifetime.

This is an important time for us, computer system designers, to innovate with sustainability in mind.

Embodied Carbon Footprint

AI’s carbon impact goes beyond emissions associated with the electricity needed to power the models (i.e., operational energy use) to include the embodied emissions of the required infrastructure, such as semiconductor manufacturing, and steel and cement used for data center construction. AI systems used for model training and inference at scale come with manufacturing carbon emissions that are produced during the production of system hardware. Carbon embodied in AI system hardware constitutes a substantial portion of AI’s overall carbon footprint. Taking the multilingual translation task as an example, the model’s overall lifecycle carbon footprint is approximately four times higher than the operational carbon footprint of model training.

For consumer electronics, embodied carbon footprint is more significant than operational carbon footprint over the computing device’s lifecycle, with a rough breakdown of 80 to 20 (Gupta et al. 2022). As consumer electronics evolve into wearables that germinate the next wave of computing, additional demand in data centers will be required to support the new computing paradigm. And we expect a similar embodied-to-operational-carbon-footprint breakdown for wearables, such as smart glasses. In the near future, we will have augmented reality with contextual AI capability. This is an important time for us, computer system designers, to innovate with sustainability in mind. What do sustainability-first design principles look like for the next wave of computing?

Bending the Demand Curve with Efficiency

To sustainably develop AI—this century’s most important technology—we must make AI and computing, more broadly, efficient and flexible. Ample efficiency optimization opportunities are present across the entire AI model life cycle, spanning data, algorithms, and system hardware.

Data Efficiency

AI models train on a large amount of data. When designed well, data scaling, sampling, and selection strategies can lead to faster training time and higher model quality at the same time (Sachdeva et al. 2022). This potential for data efficiency optimization is not straightforward in the real world, where data formats are highly fragmented, data modality is diverse, and the data quality of training samples is uncertain. This is why having a common metadata format for AI, such as Croissant (Akhtar et al. 2024), is paramount to sustaining the increasing infrastructure demands of data. Complementary to AI data optimization, the data storage and ingestion pipeline for AI demands significant power capacity (Zhao et al. 2022). An optimized composite data storage infrastructure using novel application-aware cache policies can reduce more than three times input/output than a baseline least-recently-used (LRU) flash cache, reducing power demand in a petabyte-scale production AI training cluster by 29% (Zhao et al. 2023).

Model Efficiency

Making machine learning models parameter-efficient can contribute to more effective use of energy. Taking the family of foundational language models, Llama (Touvron et al. 2023), as an example, LlaMA-13B outperforms GPT-3 (175B) for a variety of tasks and, at the same time, consumes approximately 24 times less energy than GPT-3. Llama as a parameter-efficient model for language tasks is superior across the key design dimensions of accuracy, training time, energy consumption, and operational carbon footprint. Model optimization for efficient inference is fertile ground. By clustering redundant attention heads over tokens in transformer-based large language models, a simple inference-time model pruning technique, such as CHAI, can improve inference efficiency significantly (Agarwal et al. 2024). Meanwhile, additional inference efficiency improvement can be achieved with techniques, such as LayerSkip, that eliminate the need to rely on a separate model for faster token generation (Elhoushi et al. 2024). Model efficiency is more important now than ever to help bend AI’s rising energy demand.

System Efficiency

By enabling agile design space exploration for large model acceleration on distributed systems, the MAD-Max performance modeling framework enables efficient acceleration. Machine learning researchers can use the framework to navigate the design space of model parallelization and hardware deployment strategies by navigating at the pareto frontier of training time performance and energy use holistically (Hsia et al. 2024). For the multilingual language model, a combination of data locality optimization, GPU acceleration, low-precision data format use, and algorithmic optimization can effectively bring over 800 times energy efficiency improvement (Wu et al. 2022). In addition, optimization frameworks, such as Zeus, can enable model developers to navigate the tradeoff between operational energy consumption and performance optimization by automatically finding optimal job- and GPU-level configurations for recurring deep neural network (DNN) training jobs, thus reducing inefficient energy usage in model training (You et al. 2023).

Going Beyond Efficiency

Making AI more flexible and resilient to the changing environment helps. Scale matters. As the scale of model training increases from hundreds to tens of thousands of GPUs, training workflow failures occur more frequently, with substantial training time degradation (Kokolis et al. 2024). Thus, fault-tolerant, resilient distributed training frameworks are becoming more important, not only for large-scale machine learning models but also for distributed training environments, such as federated learning at the edge (Huba et al. 2023). While the existing software system stack does not yet support flexible computation shifting, when available, such a flexibility feature can provide an important lever to application developers and data center operators to explicitly annotate and expose flexibility at the levels of functions, programs, services, or workloads. It enables better control and management decisions to dynamic signals, such as compute resource availability, electricity availability, or carbon intensity of energy for improved resiliency and sustainability (Xing et al. 2023).

AI can be part of the solution by accelerating grid decarbonization and renewable energy storage technology advancements. A significant fraction—5 to 15%—of renewable energy generated in the grids around the world today goes into waste due to curtailment. More accurate energy demand and emission forecasting methods provide power grid operators effective signals to improve grid-level energy demand-response management. However, emission forecasting is a dynamic yet complex problem. Predicting energy demand in a highly interconnected grid transmission network as well as renewable energy availability in the presence of changing weather conditions requires building and running computationally intensive physical models and simulations. This is where AI can help—to predict when and where renewable energy is stranded (Acun et al. 2023). With the capability to accurately forecast when and where renewable surplus happens, AI can help improve renewable energy utilization, resulting in more effective demand-response management at the grid level, thereby accelerating grid decarbonization. What other complex climate problems can we leverage AI to solve?

Last but not least, lowering carbon embodied in hardware helps mitigate AI’s carbon impact (Gupta et al. 2022). Taking an Apple iPhone3 from 2009 and iPhone11 from 2019 released a decade later as an example, the operational carbon footprint improved by 1.6 times while the manufacturing carbon footprint increased by 3 times. This is primarily driven by more advanced hardware architectures with a much larger collection of application-specific accelerators and a higher semiconductor manufacturing environmental footprint. More than 80% of the iPhone11’s lifecycle carbon footprint is embodied in the hardware, while less than 20% comes from operational uses. This shift opens up new design possibilities for computer system designers. How do we minimize lifecycle carbon emissions of AI and computing through lowering carbon embodied in the hardware? What do sustainability-driven computer systems look like when we keep end-of-life processing in mind at the design stage? How do we weigh and co-design operational and embodied carbon footprints to minimize AI and computing’s lifecycle carbon and environmental footprint?

Looking Forward

The past decades witnessed impressive technology advances that bootstrapped the AI revolution. The laser focus on performance and energy efficiency optimization has made computing capable, cost-effective, and prevalent. In the presence of the growth of AI and its applications, we must continue to focus holistically on efficiency optimization across the AI model development cycle as well as across the hardware-software system stack.

AI can be part of the solution by accelerating grid decarbonization and renewable energy storage technology advancements.

To sustainably scale this century’s most important technology, we must go beyond efficiency (Wu et al. 2024). We need AI to be flexible and part of the solution, and to come with minimal manufacturing environmental footprints to achieve an environmentally sustainable future for computing. To do so, we must be able to measure the environmental impact of AI. However, it is not going to be straightforward. There is a need to develop a standard, scientifically rigorous emission accounting methodology for AI. Having a common, easy-to-use carbon accounting standard is a key step to enable systematic and transparent assessment for modern AI use cases and incentivize meaningful climate actions.

Characterizing and analyzing carbon emissions is a complex process. While there are initial efforts, which propose ways to measure power consumption of AI systems, there are not yet metrics or standard tools. This is an area where active research is needed. Carbon Explorer (Acun et al. 2023) and ACT (Gupta et al. 2022) are just the initial steps to arm the computing industry (Meta 2024; Rivalin et al. 2024) and research community with design space exploration tools that put carbon as a first-class design principle. The role of carbon, along with performance, power, and cost efficiency, opens up new opportunities for the AI system stack design (Elgamal et al. 2023; Wu and Gupta 2022). While enabling more sustainable scaling for AI, there are also plenty of impactful opportunities in our hands to deploy AI to tackle climate challenges.

References

Acun B, Lee B, Kazhamiaka F, Maeng K, Gupta U, Chakkaravarthy M, Brooks D, Wu C-J. 2023. Carbon explorer: A holistic framework for designing carbon aware datacenters. ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems 2:118–32.

Acun B, Morgan B, Richardson H, Steinsultz N, Wu C-J. Unlocking the potential of renewable energy through curtailment prediction (proposals track). 2023. Proceedings of the NeurIPS Workshop: Tackling Climate Change with Machine Learning.

Agarwal S, Acun B, Hosmer B, Elhoushi M, Lee Y, Venkataraman S, Papailiopoulos D, Wu C-J. 2024. CHAI: Clustered head attention for efficient LLM inference. Proceedings of the 41st International Conference on Machine Learning. PMLR 235:291–312.

Akhtar M, Benjelloun O, Conforti C, Gijsbers P, Giner-Miguelez J, Jain N, Kuchnik M, Lhoest Q, Marcenac P, Maskey M, and 11 others. 2024. Croissant: A metadata format for ml-ready datasets. DEEM ‘24: Proceedings of the Eighth Workshop on Data Management for End-to-End Machine Learning:1–6.

Elgamal M, Carmean D, Ansari E, Zed O, Peri R, Manne S, Gupta U, Wei G-Y, Brooks D, Hills G, Wu C-J. 2023. Carbon-Efficient Design Optimization for Computing Systems. HotCarbon ‘23: Proceedings of the 2nd Workshop on Sustainable Computer Systems:1–7.

Elhoushi M, Shrivastava A, Liskovich D, Hosmer B, Wasti B, Lai L, Mahmoud A, Acun B, Agarwal S, Roman A, and 3 others. 2024. LayerSkip: Enabling early exit inference and self-speculative decoding. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers):12622–42.

Gupta U, Elgamal M, Hills G, Wei G-Y, Lee H-HS, Brooks D, Wu C-J. 2022. ACT: Designing sustainable computer systems with an architectural carbon modeling tool. 2022. Proceedings of the 49th Annual International Symposium on Computer Architecture:784–99.

Gupta U, Kim YG, Lee S, Tse J, Lee H-HS, Wei G-Y, Brooks D, Wu C-J. 2022. Chasing carbon: The elusive environmental footprint of computing. IEEE Micro 42(4):37–47.

Hsia S, Golden A, Acun B, Ardalani N, DeVito Z, Wei G-Y, Brooks D, and Wu C-J. 2024. MAD-Max beyond single-node: Enabling large machine learning model acceleration on distributed systems. Proceedings of the ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA):818–833.

Huba D, Nguyen J, Malik K, Zhu R, Rabbat M, Yousefpour A, Wu C-J, Zhan H, Ustinov P, Srinivas H, and 4 others. 2022. PAPAYA: Practical, private, and scalable federated learning. Proceedings of Machine Learning and Systems 4 (MLSys 2022).

Kokolis A, Kuchnik M, Hoffman J, Kumar A, Malani P, Ma F, DeVito Z, Sengupta S, Saladi K, Wu C-J. 2024. Revisiting Reliability in Large-Scale Machine Learning Research Clusters. arXiv: 2410.21680. Online at https://arxiv.org/abs/2410.21680.

Masanet E, Shehabi A, Lei N, Smith S, Koomey J. 2020. Recalibrating global data center energy-use estimates. Science 367(6481):984–86.

Meta. 2024. 2024 Sustainability Report. Online at https://sustainability.atmeta.com/2024-sustainability- report/.

Rivalin L, Yi L, Diefenbach M, Bruefach A, Amatruda F, Tiecke T. 2024. Estimating embodied carbon in data center hardware, down to the individual screws. Meta, 2024.

Sachdeva N, Wu C-J, McAuley J. 2022. On sampling collaborative filtering datasets. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining: 842–50.

Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M-A, Lacroix T, Rozière B, Goyal N, Hambro E, Azhar F, and 4 others. 2023. LLaMA: Open and efficient foundation language models. arXiv:2302.13971. Online at https://doi.org/10.48550/arXiv.2302.13971.

Wu C-J, Acun B, Raghavendra R, Hazelwood K. 2024. Beyond efficiency: Scaling AI sustainably. IEEE Micro (early access):1–8.

Wu C-J, Gupta U. 2022. Designing low-carbon computers with an architectural carbon modeling tool. Meta, 2022.

Wu C-J, Raghavendra R, Gupta U, Acun B, Ardalani N, Maeng K, Chang G, Behram FA, Huang J, Bai C, and 15 others. 2022. Sustainable AI: Environmental Implications, Challenges and Opportunities. Proceedings of Machine Learning and Systems 4 (MLSys 2022).

Xing J, Acun B, Sundarrajan A, Brooks D, Chakkaravarthy M, Avila N, Wu C-J, Lee BC. 2023. Carbon responder: Coordinating demand response for the datacenter fleet. arXiv:2311.08589. Online at https://doi.org/10.48550/arXiv.2311.08589.

You J, Chung J-W, Chowdhury M. 2023. Zeus: Understanding and optimizing GPU energy consumption of DNN training. Proceedings of the 20th USENIX Symposium on Networked Systems Design and Implementation.

Zhao M, Agarwal N, Basant A, Gedik B, Pan S, Ozdal M, Komuravelli R, Pan J, Bao T, Lu H, and 7 others. 2022. Understanding data storage and ingestion for large-scale deep recommendation model training: Industrial product. ISCA ’22: Proceedings of the 49th Annual International Symposium on Computer Architecture:1042–57.

Zhao M, Pan S, Agarwal N, Wen Z, Xu D, Natarajan A, Kumar P, Shankar PS, Tijoriwala R, Asher K, and 8 others. 2023. Tectonic-Shift: A composite storage fabric for largescale ML training. Proceedings of the 2023 USENIX Annual Technical Conference.

^{^[1]} https://cloud.google.com/blog/products/ai-machine-learning/ google-supercharges-machine-learning-tasks-with-custom- chip

^{^[2]} https://ai.meta.com/blog/ai-rsc/

^{^[3]} https://ai.meta.com/research/impact/open-catalyst/

^{^[4]} https://www.nature.com/articles/s41586-021-03819-2

^{^[5]} https://www.microsoft.com/en-us/research/project/farmbeats- iot-agriculture/

^{^[6]} https://github.com/meta-llama/llama3/blob/main/MODEL_CARD. md

About the Author:Carole-Jean Wu is research director, Bilge Acun is research scientist, Ramya Raghavendra is technical program director, and Kim Hazelwood is senior director, all at Meta.