Next-generation AI processors need 48V


0

AI processors require huge amounts of power, and the decrease in energy efficiency corresponds to the increase in losses in the entire power distribution network (PDN). How can the challenges of maintaining efficiency and enabling high-quality algorithm implementation be resolved? Robert Gendron, Vice President of PE for Vicor, noted in an interview with EEWeb that in data centers, rack power has jumped more than 200%, to the 20 kW range, thanks to the addition of artificial intelligence, machine learning, and deep learning. This led Vicor to re-evaluate its PDNs with 48-V solutions, and while the redesign solved the current PDN problem, it also led to new challenges in power conversion. Figure 1: Advances in Current Requirements for CPU/FPGA Peak (Source: Vicor) The increasing demands on PDN power delivery and power efficiency are becoming the biggest concerns in large scale computing systems (Figure 1). The industry has seen a huge increase in the power consumed by processors with the advent of ASICs and GPUs that handle complex AI functions. Shelf power requirements have also been scaled proportionally to the AI ​​capability used in large-scale learning and inference application deployments. In most cases, power delivery is now the limiting factor in computing performance as new CPUs look to consume ever-increasing currents. Optimum power delivery entails not only power distribution but also efficiency, size, cost and thermal performance. In order to support a large amount of data computing, traditional PDNs are subject to huge power requirements, which affects thermal management. Two options are adopted to reduce the resistance by lengthening the cables of PDN systems or increasing the operating voltage to reduce the current. To counter the increase in power, modern designs are adopting the second option to meet the stringent requirements in data centers more effectively. “Currently, power requirements far outweigh traditional power delivery networks,” Gendron said. “Switching to a 48V architecture and adopting more innovative methods of power delivery is the only way to deliver high performance power to meet the amazing AI/HPC demands.” When processor power began to increase dramatically in 2015, the Open Compute Project (OCP) consortium, which has the largest number of cloud, server, and CPU companies as members, continued to develop a 12-V rack design. The response was to switch from cable to rails and deploy more 12V AC single-phase inverters inside the rack to reduce PDN distance and server blade impedance. The main change was that single-phase alternating current was derived from the individual phases of the rack’s three-phase power supply due to increased power. Then, the introduction of AI into data centers with 500-A to 1,000-A processors directed some companies to switch to a 48-V distribution. This reduced the high current PDN problem to 250 amps for a 12 kW rack but presented new challenges for power conversion for the entire system. As the PDNs feeding the blades switch at 48V, a power conversion change is required on the blade. In any case, switching to 48V from a 12V distribution reduces the input current requirement by a factor of 4 and reduces losses by 16x. 48V architecture adoption 48V is used in rechargeable backup battery systems to power telecom equipment. A common architecture traditionally used in these systems has been called the intermediate bus architecture, which consists of an insulated unregulated bus transformer to convert 48 volts to 12 volts, which is then fed to a bank of multiphase buck regulators to handle the conversion to 12 volts and regulation to the load point ( PoL). With the increase in currents of AI processors and CPUs, the density of the power delivery solution to PoL has become the most important element in AI applications due to the PDN resistance between the regulator and PoL. PDN losses are a dominant factor in calculating the efficiency and performance of a DC/DC regulator design. To reduce losses, Vicor suggests using a 48-volt pre-regulating module (PRM), followed by a constant ratio (factor 1/K) phase voltage shunt (VTM). This special architecture allows to improve the performance of each stage. The PRM uses the zero voltage switching topology, while the VTM uses the special high frequency sine capacitive transformer (SAC) topology. The VTM can be seen as a DC/DC converter with a ratio of 1/K for voltage and K for current. The VTM offers high power density and can be placed very close to the processor. The VTM applies a SAC topology, so its emissions are low and narrowband compared to those of multiphase switches and their associated inductors. It also offers greater power density than multiphase designs, as the single-phase VTM replaces a six-phase switching multiphase. The VTM fits into a small footprint, within the layout constraints of advanced processors that support four-channel memory without infringing the memory subsystem layout areas. Figure 2: Bypass power delivery (Source: Vicor) High current delivery is provided via MCM modules that are positioned adjacent to the processor either on the motherboard or on the processor core. Placing MCMs on the substrate reduces PDN losses and reduces the number of BGA pins to the processor substrate required for power. LPD is designed to support the unique form factor and power delivery requirements of OCP Accelerator Module cards and custom AI acceleration cards. Figure 3: Vertical Power Delivery (Source: Vicor) Vertical Power Delivery (VPD) eliminates power distribution losses and VR PCB space consumption. VPD is similar in design to the Vicor LPD solution, with the additional bypass capacitance being further incorporated into a current multiplier unit or directional current multiplier (GCM). Depending on the processor voltage, engineers can choose between LPD or VPD. In the first case, the current multiplier is located next to the AI ​​processor either on the same substrate or directly on the motherboard within a few millimeters, allowing the PDN to be reduced to about 50. For higher performance, the VPD drives the current multiplier directly under the processor, also incorporating high ground capacitors frequency. This type of current multiplier is called a directed current multiplier. VPD reduces the impedance of the PDN to 5-7 µΩ, allowing AI processors the freedom to harness the full power. Figure 4: This AI solution highlights the Vicor 48-V direct-load solution that supports up to 650-A continuous and over 1,000-A for peak current delivery. (Source: Vicor) Maximizing AI processor performance is shown in Figure 4. A typical Vicor VR solution for advanced AI processor accelerators The Vicor VR consists of three powertrain modules, a modular current drive (MCD) and two MCM modules, which Provides 48-VIN to 0.8-VOUT VR with capacity up to 650-A DC and over 1000-A peak current delivery. Like aircraft jet fuel, this level of power delivery ensures that the AI ​​processor can operate at optimal clock rates and increase performance. “If our technology is not used in these advanced AI applications, the number of multiphase VR devices will exceed the size of the board and will not maintain the same form factor,” Gendron said. “In addition, the noise contribution is likely to be too high to maintain signal integrity.” By using the Vicor NBM2317, compatibility with legacy 12V server rack power distribution is maintained and provides 48V to the Vicor VR. The 12V to 48V converter can also be run in the “reverse” direction, allowing conversion from 48V to 12V. Traditional power architectures are not keeping pace with today’s power-hungry AI processors and their adoption in cloud computing. Vicor’s power approach enables 48-V and VR distribution that supports advanced AI processing needs. Far from the traditional multi-stage design used with CPUs, the Vicor solution has been specifically developed to address a new class of processors that are rapidly migrating into the cloud. A new approach is needed to run AI/HPC. It is no longer possible to continue distributing 12V from the cloud server rack as the leading companies ramp up the power. Running ASICs and GPUs today requires more than just increasing power by swapping parts. The most efficient solutions start with high-voltage power, incorporate innovative architectures and topologies, and use highly efficient high-density power units. .


Like it? Share with your friends!

0

What's Your Reaction?

hate hate
0
hate
confused confused
0
confused
fail fail
0
fail
fun fun
0
fun
geeky geeky
0
geeky
love love
0
love
lol lol
0
lol
omg omg
0
omg
win win
0
win
Joseph

0 Comments

Your email address will not be published. Required fields are marked *