I received this intriguing question from Daria Marchenko, who is a contributor on Moteur de Recherche, on Canadian public radio. I went into a deep rabbit hole trying to find a good answer–especially since she also interviewed the wonderful Sasha Luccioni, an expert on ML and climate change. Most of the estimates I’ve found online seemed off by several orders of magnitude, so I reproduce the analysis so other journalists and curious people can find this information easily.
TL;DR: ChatGPT’s energy use might have peaked in February, with a trillion tokens produced by a server farm of several thousand A100 GPUs. This would have cost 6-figure hosting bills per day and lead to 7 to 15 metric tons of CO2 per day, or about the equivalent of 400-800 households in the US. That’s not nothing, but in the grand scheme of things is fairly modest, especially compared to more profligate uses of computing like cryptocurrency. What most surprised me about this is the sheer speed of adoption of ChatGPT, which shifts the bulk of the cost from training to inference. In February, ChatGPT saw about 1% of Google’s traffic, or 25% of Wikipedia’s, frankly shocking for a three-month-old product.
There’s nothing very open about OpenAI, and most details are under wraps. We can make estimates of its energy use from what we know about probable architecture, cost to users, and public releases, as well as what we know about similar architectures.
The original ChatGPT is powered by GPT3.5, a version of GPT-3 retrained with reinforcement learning from human feedback to be more useful. We’ll assume that this model was a 175 billion parameter model similar to davinci.
Patterson et al. (2022) estimated that the original GPT-3 cost 502 tons of CO2 to train. RLHF would add a bit of overhead on that, perhaps on the order of 1% of the original cost.
The more interesting question is the cost of serving the model. It’s been reported that ChatGPT is the fastest app to reach 100 million users. Similarweb reported 1 billion visits in February to chat.openai.com, compared to about 4 billion for Wikipedia or 80 billion for Google. With an average visit of just under 10 minutes, it seems reasonable to assume that 1000 tokens or so were generated by each visit, or 1 trillion tokens total.
We can estimate how these 1 trillion tokens would cost in inference from the performance of the similarly-sized, and open-source Bloom. It can produce about 1,400 tokens per second on a server with 8xA100 GPUs. To scale up to a trillion tokens in 28 days, we would need 300 8xA100 servers on average, or 2,400 A100 GPUs. That’s enormous!
Obviously, the capacity would need to be higher at the end of the month than at the start as more people adopt the product. Given the rapid growth, I’ll assume that at the end of February capacity was 1.4X higher than the average during the month. We can plug this information into the CO2 emissions calculator from Luccioni et al. I assumed inference was done on A100 SXM4 80GB on Azure cloud in the West US region. This comes to 7 metric tons of CO2 per day at the end of February. As a side note, on the public cloud, with a 3-year lease, it would cost OpenAI $90,000 per day to lease!
On March 1st, OpenAI switched to a new model for ChatGPT, gpt3.5-turbo. This new model is 1/10th the cost of davinci to the end user, and it is reasonable to assume that it has, consequently, 1/10th the carbon intensity. My hunch is that gpt3.5-turbo is a RLHF version of the curie model. Thus, in March, OpenAI’s emissions must have gone down drastically. This created extra capacity for GPT-4, which is likely at least as carbon-intensive as the original davinci GPT-3, and possibly more. Because GPT-4 is gated behind a payment, total energy outlay must have gone down in March, but in the long term should reach its past peak and exceed as higher capacity encourages new use cases. This is a good example of Jevons’ paradox in action.
As an aside, I came up with an alternative number of 15 tons of CO2 per day based on extrapolating Tom Goldstein’s work. So I think this is in the right order of magnitude.
I was surprised by the numbers I came up with. Inference is now a big chunk of the cost of LLMs, whereas most of the literature has been focused on training. In the current regime, with these numbers, total carbon output from serving will overshadow training by a large margin over the lifetime of a model. It’s much higher than some fanciful numbers that are in the top 5 results on Google, and much lower than other ones, by several orders of magnitude. I won’t link to the specific estimates, but I am frustrated by OpenAI’s lack of transparency about its hardware use; it has encouraged a cottage industry of data scientists on medium multiplying big numbers by tiny numbers and not double-checking their results.
For what it’s worth, I think the estimate is bounded by the number of GPUs involved, and based on availability, it cannot be much more than 10,000 A100’s (somebody has to pay that capital expense at the end of the day, and 10k A100’s is 100M$). It also can’t be much less than 1,000, since ChatGPT is reported to be costing OpenAI 6 or 7 figures per day.
In terms of cheer expensive stuff that needs to be bought or leased, 2,400 GPUs is a very large amount. The Jean-Zay supercomputer that was used to train Bloom used about 3000 GPUs. Granted, OpenAI’s servers don’t need very high bandwidth, as they would in training, but it’s a lot of machines going brrr.
On the other hand, it’s still a fairly small amount of total CO2, all things considered. In its current state, running ChatGPT over a year is in the same ballpark in terms of carbon intensity as running one large international conference (e.g. NeurIPS). Even if ChatGPT became as big as Google (100X growth), at the same intensity as the old gpt-3.5, it would have a total carbon output of 255 kilotons of CO2 per year, a large amount, but still [100X lower than the estimated cost of cryptocurrency in the US alone](https://www.whitehouse.gov/ostp/news-updates/2022/09/08/fact-sheet-climate-and-energy-implications-of-crypto-assets-in-the-united-states/#:~:text=Crypto-asset activity in the,railroads in the United States.).
A big caveat to this, however, is that this doesn’t take into account the embodied carbon cost in the GPUs used for inference. Sasha Luccioni claims that it could be very high, saying “it’s going to be bananas”. I will join her in stating that NVIDIA should release estimates of the embodied carbon footprint of its product.
You can listen to the segment from Daria Marchenko that aired on Radio-Canada here. In addition to being a founder of a non-profit on digital sobriety and radio contributor, Daria is a talented photographer. She took the professional headshots that grace this website; you can see her portfolio here, featuring the likes of Justin Trudeau and Charles Aznavour.
One response to “How much energy does ChatGPT use?”
“I will join her in stating that NVIDIA should release estimates of the embodied carbon footprint of its product”. Anxious to know absolutely. Regards, Tim