Understanding CMX in AI Workloads

CMX stands for Context Memory eXtension. It is also called Context Memory Storage in some instances. CMX was previously referred to during its development phase as ICMS (Inference Context Management Storage) when NVIDIA CEO Jensen Huang announced it at CES in January 2026. He announced the name change from ICMSP to CMX at NVIDIA GTC in March 2026 during his keynote address. The name change reflects NVIDIA's broader vision for AI-native storage that prioritizes the "context" or the mathematical state of an AI's reasoning over traditional file-based data.

No, CMX is much more than hardware. While CMX uses NVMe SSDs like the Solidigm D7-PS1010 or D5-P5336, the "magic" lies in the BlueField-4 DPU and the software stack (DOCA Memos, NIXL, and Dynamo). This combination allows the system to understand the specific structure of KV cache and move it between GPUs and storage without involving the host CPU, which traditional SSDs cannot do.

GPUs often sit idle while waiting for context data to be recomputed or fetched from slow storage. CMX keeps this data in a "hot" tier that is specifically tuned for AI workloads. By reusing precomputed KV cache, the GPU spends less time on redundant work and more time generating new tokens.

NVIDIA categorizes data center memory into tiers. G1 and G2 are on-chip and on-node memory. G3 is traditionally local DRAM. CMX creates "G3.5" as a new category of Ethernet-attached, pod-level context memory that is faster and more efficient than traditional networked storage (G4) but more scalable than local SSDs.

The rebranding to CMX was likely done for clarity and market alignment. "Context Memory Storage" or "Context Memory eXtension" is a more descriptive term for the technology's role in the AI stack, highlighting that it is a specialized storage platform for the "memory" of AI models rather than just a management system.

Yes, CMX is built to run on the NVIDIA Spectrum-X Ethernet platform. This is critical because it uses RDMA (Remote Direct Memory Access) to transfer data with zero-copy efficiency. Without the low-latency, lossless fabric of Spectrum-X, the performance benefits of the CMX tier would be bottlenecked by network jitter.

Currently, CMX is a full-stack NVIDIA solution. It is designed to work within the Vera Rubin and Blackwell platforms, leveraging NVIDIA-specific libraries like NIXL and DOCA. It is tightly integrated with the NVIDIA ecosystem to provide the sub-millisecond latency required for inference.

Data in the CMX tier is treated as "ephemeral context." Depending on the policy set in the orchestration layer (like NVIDIA Dynamo), the context can be cached for future reuse, moved to cold storage for long-term archiving, or deleted to free up space for new sessions.

Recomputing a million tokens of context every time a user asks a new question consumes a massive amount of electricity. By storing the precomputed state in CMX and simply "reading" it back, the system uses significantly less power than it would by running the full inference calculation again.

NVIDIA CMX and the underlying BlueField-4 STX architecture are expected to be available through hardware and storage partners starting in the second half of 2026. Major vendors like AIC, Supermicro, and QCT have already showcased the first CMX-compatible storage servers.

Metric	Traditional Storage	NVIDIA CMX Platform
Throughput (TPS)	Baseline (1x)	Up to 5x Higher²
Power Efficiency	Standard	Up to 5x Better²
TTFT Latency	High (Recompute)	Low (Cache Reuse)
Scaling Logic	General-purpose	AI-Native (KV-aware)

What are you looking for?

Welcome

My Profile

mySolidigm

Settings

Sign Out

Understanding CMX (Context Memory eXtension) in Data Storage for AI

What is CMX (Context Memory eXtension)?

From ICMS to CMX: The rebranding of AI memory

Why CMX is essential for agentic AI

How CMX architecture functions

The memory hierarchy with CMX

The role of BlueField-4 and DOCA Memos

Key features and capabilities of CMX

NVIDIA-Reported Performance and Efficiency Gains

KV cache reuse and NIXL

Where CMX fits in the memory hierarchy

Use cases for CMX in the modern enterprise

Multi-turn agentic reasoning

High-concurrency AI factories

Implementation and ecosystem

The ecosystem of CMX enclosures

What Solidigm™ products are the best SSDs for CMX?

Solidigm™ D7-PS1010 for the hot, reuse-dominant context tier

Solidigm™ D5-P5336 for capacity-anchored context and warm spillover

Choosing between them

The future of AI rests on Context Memory Storage

FAQs

About the Authors

References:

Related Articles

ICMSP unlocks greater AI scale when Solidigm SSDs store context in KV cache

Optimizing AI Workloads With Solidigm™ Solid-State Drives

Liquid Cooling for Data Center Components

Performance of Solidigm™ SSDs with BaM and GIDS

When AI Runs Out of Memory

Enabling Next-Gen Storage