Join Patrick Kenedy of Serve the Home as he discusses AI storage with Kevin Deierling of NVIDIA and Greg Matson of Solidigm. In this conversation, they break down one of the biggest shifts happening in AI infrastructure today: The rise of storage as a critical performance layer in AI factories.
As AI moves from training to inference and now to agentic workflows, the demands on memory and data infrastructure are exploding. From KV cache reuse to context memory, this discussion explores why traditional architectures are no longer enough and how a new storage tier is emerging to keep GPUs fully utilized.
You’ll also hear how innovations like liquid-cooled SSDs, extreme co-design, and full-stack optimization are shaping the future of AI systems from hyperscale data centers to edge deployments.
The AI revolution has a storage problem and the industry is finally saying it out loud.
At major tech gatherings in early 2026, a striking consensus has emerged between some of the most influential players in silicon and systems: We are living through a pivotal inflection point for memory and storage.
Kevin Deierling, SVP at NVIDIA, put it plainly: "It's the year of memory and storage, and that's important because storage is an instrumental tier. It's so important. We just don't have enough, and AI needs more memory and storage."
It's a statement that cuts to the heart of the AI scaling challenge. For years, the headlines went to GPUs, model architectures, and training compute. But as models grow larger and inference demands explode, the bottleneck has quietly shifted. Data has to live somewhere and getting it to the compute fast has become the defining engineering challenge of the decade.
Solidigm, one of the world's leading NAND flash manufacturers, has been making this case for some time. Our perspective is straightforward: AI workloads are insatiable consumers of storage, from the petabytes needed to train foundation models to the low-latency retrieval demands of real-time inference and retrieval-augmented generation (RAG). The storage tier isn't peripheral any longer, it's foundational.
What makes 2026 different is the convergence of urgency and acknowledgment. Hyperscalers are racing to build AI factories. And every AI factory needs a memory and storage backbone capable of keeping pace with accelerators that can process trillions of operations per second. The gap between compute capability and storage bandwidth has never been more visible or more consequential.
The message from both Solidigm and NVIDIA is the same. The chips get the glory, but storage is what makes AI actually work.
For decades, the storage hierarchy was a settled matter: blazing-fast DRAM at the top, dense and affordable NAND flash below, and tape or cloud archival at the bottom. Each tier had its role. Each trade-off was understood.
As inference workloads scale and context windows stretch into the millions of tokens, a new architectural pressure has emerged one that neither DRAM nor traditional SSDs was designed to solve. The industry needed something in between: high enough capacity to hold massive AI context, fast enough to feed hungry accelerators, and freed from the overhead of full durability guarantees.
Nvidia's Kevin Deierling gave the clearest articulation of what's taking shape: "We now have that really nice tier of new storage that gives us the capacity and the performance that's in between and it doesn't have to be durable because we can always recompute it."
That last clause is the key insight. Traditional storage assumes data must be preserved at all costs. But AI context, the activations, KV cache, and intermediate states that keep a model grounded in a conversation or task is inherently ephemeral. If it's lost, you don't restore it from backup. You recompute. That changes the design requirements entirely, unlocking a class of storage solutions optimized for throughput and density rather than endurance and persistence.
This is where Solidigm and the broader NAND ecosystem see a significant opportunity. Purpose-built context storage, designed around the actual read/write patterns of AI inference, can dramatically reduce the memory pressure on DRAM while keeping latency low enough to stay out of the critical path.
The storage hierarchy isn't being replaced. It's gaining a new and essential layer built specifically for new AI workloads.
Building an AI factory is not a procurement exercise. It is an engineering “Act One” where every component, from the copper traces on a PCB to the airflow patterns in a server chassis, must be deliberately designed to work in concert. The era of mixing and matching off-the-shelf parts and hoping for the best is over.
That's the core philosophy behind what NVIDIA describes as "extreme co-design" a methodology that goes far beyond traditional vendor partnerships. As Kevin Deierling, SVP at NVIDIA, explains, "Extreme co-design is something together that really is optimized for the market. It has the right density, it has the right power, the right thermals. All of it needs to come together to build the most efficient AI factories."
The word "extreme" is doing real work in that sentence. Co-design at this level means storage vendors like Solidigm aren't simply responding to GPU specs after the fact. They are at the table early, shaping form factors, power envelopes, and thermal profiles alongside the compute architects who define what an AI accelerator demands from its surrounding infrastructure.
Density, power, and thermals form a tightly coupled triangle. Push density too hard without addressing power draw, and you create thermal problems that throttle performance. Optimize for cooling without considering physical layout, and you sacrifice the rack efficiency that makes AI factories economically viable. Every decision cascades.
For Solidigm, this means engineering NAND solutions that don't just meet spec on paper. They fit the actual physical and electrical reality of next-generation AI infrastructure: The right capacity in the right footprint, drawing the right power, dissipating heat in ways the system was designed to handle.
As AI adoption accelerates, the concept of the “AI factory” is rapidly evolving from centralized hyperscale infrastructure into a globally distributed intelligence platform. In conversations with Greg Matson and Kevin Deierling of NVIDIA, a key theme emerges: The next generation of AI infrastructure will be defined not only by scale, but by how efficiently compute, networking, storage, and power work together as a unified system.
Gigawatt-scale AI factories are becoming the new backbone of model training and large-scale inference, requiring unprecedented advances in energy efficiency, liquid cooling, high-performance interconnects, and data movement. The challenge is no longer simply building bigger clusters, it’s architecting infrastructure capable of sustaining continuous intelligence production at an industrial scale.
At the same time, the future of AI will not live solely inside massive, centralized data centers. Intelligence is increasingly moving closer to where data is created, and decisions are made across enterprises, factories, healthcare systems, financial markets, research institutions, and edge environments.
This shift toward “intelligence everywhere” is reshaping infrastructure design priorities. AI factories must support both centralized training at an extreme scale and highly distributed inference with low latency and real-time responsiveness.
As Greg Matson and Kevin Deierling discuss, success in this next era will depend on balancing performance, power, and data accessibility across the entire AI stack. The organizations that can efficiently orchestrate data from edge to core to cloud will define the next wave of AI innovation.
Subscribe for product updates, technical resources, and insights on accelerating AI workloads with Solidigm Storage