Fig. 1: CMOS inverter. [2] (Source: Wikimedia Commons) |
The continuous scaling of CMOS (Complementary Metal-Oxide-Semiconductor) technology has been a given for decades, driving corresponding improvements in computing performance. However, the early 2000s marked the arrival of the "power wall," a critical barrier in CMOS technology where increasing performance through clock speed and transistor density resulted in excessive power consumption and heat generation. [1] This power wall forced the semiconductor industry to pivot from raw performance scaling to energy efficiency as a primary design goal. Today, energy-efficient CMOS chip design not only addresses thermal and power constraints but also paves the way for sustainable computing at scales demanded by artificial intelligence and data centers.
Fig. 1 is an example CMOS inverter; all CMOS gates follow this same structure, wherein pMOS transistors in some combination are connected to the upper supply, Vdd, and nMOS transistors connected complementary to their corresponding pMOS are connected to the lower supply, Vss. These groupings are called the pull-up network and pull-down network, respectively. This connection is such that, at steady state, no current flows from Vdd to Vss at steady state. Indeed, this fact is a major reason for the dominance of CMOS over earlier digital circuit technologies (e.g. nMOS, pseudo-nMOS, BJT-based circuits, etc.). [2]
In static CMOS circuits, there are two broad categories of energy dissipation: static and dynamic. These in turn break down into the following sources of power dissipation.
(Dynamic) Charging and discharging load capacitances as logic gates switch. In Fig. 1, this obtains when node A goes low so that node Q is charged up from the power source, Vdd.
(Dynamic) Short-circuit current, which occurs when the transistors connected to each rail are (briefly) on at the same time.
(Static) Various leakage components, including gate leakage (from node A to rail Vss in Fig. 1), subthreshold leakage (between node Q and whichever rail is supposed to be disconnected from node Q), and leakage into the substrate (MOSFETs have intrinsic pn junctions which also leak; this fact is not evident at the level of description in Fig. 1).
The power wall emerged as a consequence of the pursuit of a generalized Moore's Law: that computing ability would double approximately every 2 years. From the 1970s to the early 2000s, increasing clock speeds were a primary driver of computational performance. [1] However, by the early 2000s, power dissipation had become a critical limitation. At the time, transistor counts on chips were doubling approximately every two years, but this resulted in power densities that threatened to exceed practical cooling capabilities.
The industry's solution was a paradigm shift toward multi-core processors, each operating at a frequency fixed by the power wall. By dividing workloads among multiple cores, chips could increase overall performance without a proportional increase in clock speed or power consumption. This architectural change marked a turning point in energy-efficient design. This trend is starkly observed in clock rates and cores over time, where we see that CPU clock rates have stagnated at about 5 GHz since the start of the millenium, with that stagnation precisely coinciding with a corresponding increase in the number of cores. This corresponding increase in core count has been one of a few ways by which the industry has maintained the breathless pace of Moore's Law.
In contrast to the high-level architectural change noted above, at the lower circuit design-level four main strategies for energy saving are used in industry today:
Dynamic Voltage and Frequency Scaling (DVFS): DVFS enables chips to adjust their supply voltage and clock frequency dynamically based on workload requirements. Since current drive from a given transistor increases monotonically with supply voltage (the actual relationship is quite complicated and depends on the generation of the transistor), by reducing voltage and frequency during periods of low activity, DVFS minimizes dynamic power consumption. [3]
Clock Gating: Clock gating reduces dynamic power by selectively disabling the clock signal to inactive portions of the chip. By ensuring that only active components receive power, this technique eliminates unnecessary switching activity, a major source of power dissipation. [3]
Low-Power Process Technologies: Advances in process technology have played a critical role in reducing power consumption. Techniques such as high-k dielectrics and metal gates reduced gate leakage currents significantly. Similarly, FinFET and gate-all-around transistor designs have minimized leakage power and improved switching efficiency in sub-20 nm nodes. [4]
Variable-Threshold Voltage Computing: Information flows through digital circuits via many different paths. Often, one such path - the so-called critical path - dominates chip timing, while others are somewhat faster and so do not limit operation. There is then the opportunity to increase the threshold voltage of gates off of the critical path, slowing these paths down to the extent that they are not critical. Raising the threshold voltage saves on leakage power. [4]
In 2024, by the far most active area of computing innovation is in ASICs for deep learning and AI more generally. Such workloads demand many floating-point operations (FLOPs), representing numbers with e.g. 16 (FP16) or 4 (FP4) bits. The current state of the art is the H100 GPU from Nvidia, which for dense floating point operations achieves about (1015 FP16 s-1)/(700 W)= 1.4 ×1012 FP16 J-1. [4] A recent IEEE review [5] of future CMOS scaling collated other reviews on energy scaling for different parts of a chip: both transistor and interconnect scaling in CMOS. Under their methodology, an ultimate CMOS energy efficiency of 2.9 × 1014 FP16 J-1 (4.7 × 1015 FP4 J-1 under the quadratic scaling usually used for FP bit representations) is possible. This suggests that a 290/1.4 = 207-fold improvement on energy efficiency for crucial FLOPs is possible. This is important because it suggests an upper bound on how much computing capacity (generally thought to scale with the capability of the model being trained) can be purchased on a finite budget, thereby limiting future developments unless there is corresponding cheapening of electricity costs.
As CMOS technology approaches its physical and practical limits, future advancements will require a combination of materials innovation, architectural shifts, and potentially new computing paradigms altogether. In a world increasingly reliant on sustainable energy practices, energy-efficient CMOS design will be critical to enabling the computational infrastructure of the future.
© Jonathan Sharir-Smith. The author warrants that the work is the author's own and that Stanford University provided no input other than typesetting and referencing guidelines. The author grants permission to copy, distribute and display this work in unaltered form, with attribution to the author, for noncommercial purposes only. All other rights, including commercial rights, are reserved to the author.
[1] J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, 2nd Ed. (Pearson, 2002).
[2] N. Weste, D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th Ed. (Pearson, 2010).
[3] S. Saurabh, Introduction to VLSI Design Flow (Cambridge University Press, 2023).
[4] J. del Alamo, Integrated Microelectronic Devices: Physics and Modeling, 1st Ed., (Pearson, 2017).
[4] "Nvidia H100 Tensor Core GPU Architecture, Nvidia 2023.
[5] A. Ho, E. Erdil, and T. Besiroglu, "Limits to the Energy Efficiency of CMOS Microprocessors," IEEE 10386559 International Conference on Rebooting Computing, 5 Dec 23.