Blog

Solving AI’s Bottlenecks – The Critical Role of Physical and Software Layers

January 9, 2025

Rita Waite, Partner | Esube Bekele, VP Technology | Abi Sivananthan, VP Technology

GPUs, TPUs, and ASICs are vital for training and running AI models, but the physical layer of AI infrastructure encompasses much more than raw compute power.

In our previous blog, we discussed the urgent need for a new digital infrastructure foundation to support the growing demands of generative AI. While developing specialized chips for AI workloads is critical, solving the technology’s infrastructure challenges requires a far broader approach. The bottlenecks extend well beyond hardware—they touch every layer of the infrastructure stack, from networking and storage to the software orchestrating systems. To scale generative AI successfully, we must address both the physical and software layers simultaneously.

‍

The Physical Layer – Beyond Chips

When we talk about AI infrastructure, it’s easy to focus solely on chips. GPUs, TPUs, and ASICs are vital for training and running AI models, but the physical layer of AI infrastructure encompasses much more than raw compute power: Each element of the system as a whole—networking, power, and storage—plays a critical role in determining whether AI workloads can scale effectively.

Networking

Generative AI models are so large that they typically don’t fit on a single chip, requiring multiple processors to work together in distributed computing clusters. This makes networking infrastructure as critical as the compute hardware itself.

Conventional interconnects use copper for electrical signal routing between chips and servers, while optical interconnects handle longer distances. However, traditional networking systems cannot meet the high bandwidth, and low latency demands of AI workloads. This challenge has led to new AI-first networking technologies that improve power and bandwidth efficiency at the chip level using advanced electrical and optical methods. Innovations in photonics are enabling faster optical communication, connecting boards, servers, and racks at the speed of light. Photonics interconnects can break the bandwidth limitations of electronic interconnects, enabling data movement that can be potentially orders of magnitude faster.

Power and Thermal Management

Power consumption is another critical issue. Running AI models at scale consumes enormous amounts of electricity. As data centers dedicated to AI expand, the need for energy-efficient solutions becomes paramount. Emerging innovations, such as improved power-conversion technologies and on-site power generation, offer promising paths forward. For instance, novel battery chemistries could provide low-cost grid storage to reduce dependency on traditional power sources.

These and other innovations have the potential to reduce the energy footprint of AI infrastructure if adopted; however, the transition from R&D to adoption has historically been difficult in this sector. Often, it can take ten years or more to scale supply chains and to demonstrate power reliability that can support the processors and algorithms these systems power for their entire lifetimes.

Power is not only needed for compute, but also for the cooling infrastructure surrounding AI chips and racks. Innovations in thermal management, like liquid cooling, are already allowing data centers to operate more efficiently by minimizing the power required to keep servers cool–which can amount to up to 40% of a data center’s total power consumption. Liquid cooling systems, which immerse components in non-conductive fluids, have the potential to drastically reduce these energy costs.

As the power rating of individual GPUs and ASICs scales towards several kilowatts, large scale, fan-less liquid cooling has emerged as an alternative solution to the traditional air cooling employed by data centers, with further innovations to come in the type of liquid cooling used and how close it is implemented to compute chips themselves. Liquid cooling systems that help to extract more heat from servers, such as fluid jet spraying and non-conductive fluids for immersion cooling, have the potential to drastically improve heat-transfer efficiency and reduce the overall cost of cooling. It’s vital to remember that it’s not enough to innovate at the server level; innovation must take a systems-level approach that considers data centers as a whole, with optimization of heat transfer across all channels within buildings and set up to work with a variety of ambient environments where data centers are located.

Storage and Memory

Storage and memory are also significant bottlenecks. While compute has advanced rapidly, memory technology has struggled to keep pace. AI models process vast amounts of data, and without fast, high-bandwidth memory (HBM), performance is severely limited. In fact, the current slowdown of scaling of AI models is the result of what is known as “The Memory Wall”— the growing gap between compute speed and memory access times.

Emerging memory technologies, such as FeRAM (ferroelectric RAM) and advanced packaging techniques that combine memory and logic chips, could help reduce the effect of this Wall. Additionally, innovations in cold storage (i.e., storage designed for infrequently accessed data) are addressing the need for long-term, energy-efficient data retention. Such storage is typically required for large, internet-scale data archives that are becoming very important due to generative AI’s immense training data needs.

‍

The Vital Importance of Software

Hardware alone cannot solve AI’s infrastructure bottlenecks. The software layer is equally crucial, particularly when it comes to orchestrating distributed systems and optimizing infrastructure for specific AI workloads. Without the right infrastructure software, even the most advanced hardware will fail to meet AI’s demands.

Orchestration and Optimization Are Critical

AI workloads are often distributed across hundreds, or even thousands, of GPUs and other chips, typically across geographically dispersed areas. Coordinating these workloads effectively is a massive challenge. This is where advanced orchestration tools are indispensable—they manage the distribution of data and computation across systems to avoid delays and inefficiencies.

Without effective orchestration, AI models can suffer from delays, data starvation, and inefficient GPU utilization. Improved orchestration needs to be supported by specialized hardware networking infrastructure, highlighting the need for tight integration between hardware and software.

Optimization is another critical factor in ensuring that AI models can scale efficiently. AI-specific libraries, compilers, and profilers help streamline the development process, ensuring that AI models can fully leverage the underlying hardware. These tools also reduce the time and resources needed to train models, making it easier to fine-tune systems for both performance and energy efficiency. Without them, organizations will struggle to make the most of their hardware investments.

Lastly, AI frameworks and software libraries play an essential role in enabling large-scale AI training and inference. Tools like TensorFlow, PyTorch, and specialized AI software libraries are designed to handle the massive scale and complexity of generative AI models. These frameworks need to evolve alongside the hardware to provide seamless integration and ensure maximum performance across a system. NVIDIA’s CUDA and cuDNN libraries are prime examples of how long-term investment in software and tight coupling it with hardware can unlock maximum performance afforded by the underlying hardware.

‍

Why System Co-optimization is Key

The most critical lesson from AI infrastructure development is that no single component can address scaling challenges on its own. Chips, power systems, networking, and software must all evolve together. This concept of co-optimization—the simultaneous improvement of both hardware and software layers—is essential to building scalable AI infrastructure.

For example, improving a chip’s performance will yield diminishing returns if the networking layer can’t handle the increased data flow or if orchestration software fails to optimize workloads at scale. Similarly, even the best cooling systems are ineffective without hardware capable of high utilization at scale. A cohesive, system-wide optimization strategy is required.

Solving AI’s infrastructure bottlenecks requires more than incremental improvements to chips or software—it demands a full-stack approach. As the demand for generative AI continues to rise, the importance of co-optimizing both physical and software layers cannot be overstated. Only through comprehensive, system-wide development can we address the bottlenecks facing AI and unlock its full and immensely exciting potential.

library