The exponential growth and complexity of Graph Neural Networks (GNNs) have intensified demands on AI training infrastructure. Heterogeneous datasets like IGBH (with billions of edges and high dimensional features) require massive amounts of data movement between storage and GPU memory. Traditional storage infrastructure, where the CPU handles all data transfer, creates data bottlenecks and stalls pipelines leading to underutilized GPUs and extended training times. This bottleneck is addressed through two open-source research projects:
BaM rethinks data placement by enabling the GPU to manage storage interactions with NVMe SSDs instead of the traditional CPU model. BaM bypasses legacy drivers by using a custom software stack that utilizes parallelism in GPUs for bulk transfers from storage devices.
Complementing BaM is GIDS dataloader, which is built on top of BaM software architecture. GIDS enables the GPU to directly initiate and manage storage I/Os to specifically address requirements for GNN AI training, eliminating CPU intervention entirely. Together, BaM and GIDS minimize latency, maximize throughput, and saturate GPUs with data for optimal utilization.
The Solidigm™ D7-PS1010 SSD used in our testing enables BaM and GIDS technologies to function as tangible accelerators for real world GNN AI training workloads, such as the IGB heterogeneous dataset. This white paper presents the findings from BaM and GIDS technology evaluations conducted using Solidigm D7-PS1010 SSDs.
GIDS is designed to accelerate GNN training by shifting data loading from CPU to GPU. Unlike Deep Graph Library (DGL), the traditional CPU bound mmap (Memory-Mapped) approach, GIDS delivers faster end to end data loading by harnessing direct GPU data transfers from Solidigm D7-PS1010 SSDs using BaM, enabling greater efficiency for GNN training.
GIDS optimizes storage access for graph workloads via four key innovations:
DGL mmap (memory mapped I/O in Deep Graph Library) is a traditional, CPU-driven approach to graph data loading. Unlike GIDS, it offers performance tradeoffs, especially in large-scale GPU workloads.
Key characteristics of DGL mmap:
We are comparing the end to end time (100 steps) for data loading efficiency using GIDS' GPU centric architecture with Solidigm D7-PS1010 SSDs versus using DGL mmap. We have tested both small and full heterogeneous datasets from the Illinois Graph Benchmark (IGBH) for training the GNN.
The benchmark accuracy was set to 70% during training to align with the MLPerf target for IGBH. Note: MLPerf is an industry-standard benchmark suite for evaluating machine learning performance across diverse workloads, and this threshold indicates that the model has reached a meaningful level of convergence and quality.
We used publicly accessible code repositories to enable and run GIDS and BaM, with datasets from the Illinois Graph Benchmark suite. See appendix for links.
The training accuracy for both data loaders meet the baseline threshold of 70%, validating the training model.
When using Solidigm D7-PS1010 PCIe 5.0 SSD, GIDS cuts load time by almost 2x for small graphs by minimizing overhead and utilizing parallelism.
The training accuracy for both data loaders meets the baseline threshold of 70% validating the trained model.
GIDS scales with increasing edges by almost 9x compared to DGL mmap.
| Dataset | Cache Hit Rate | Cache Miss Rate |
|---|---|---|
| Small | 77.49% | 22.51% |
| Full | 10.80% | 89.20% |
Table 1. GPU Cache metrics achieved when using BAM & GiDS with IGBH dataset
Table 1 compares GPU cache performance metrics when training a GNN using the GIDS data loader on two dataset scales (Small vs Full) of the IGBH heterogeneous graph. The metrics include:
Cache hit and miss rates differ drastically between the small and full dataset (77% hits for the small run versus about 10% for the full dataset). However, this doesn’t translate into a huge end to end time gap because GPU compute dominates the training loop and overlaps with I/O. Since data is pre-stripped and accessed through BaM with prefetching, SSD reads are handled efficiently during computation, reducing the impact of cache misses on overall runtime.
DeepGraphLibrary method is not included in the table because it does not have any mechanisms for GPU software caching or SSD aware prefetching to report software cache access.
The Solidigm D7-PS1010 SSD unlocks performance gains when paired with BaM and GIDS technologies for GNN training.
For the largest evaluated dataset, IGBH-full dataset (5.8 billion edges), GIDS maintains high speed data loading that is 9 times faster than legacy methods. Our work on BaM and GIDS lays a solid foundation towards readiness of Solidigm D7-PS1010 SSDs for future storage configurations.
Ashwin Pai is a System Validation Engineer at Solidigm, with nearly a decade of experience in software, hardware, and systems engineering. He focuses on validating next-generation SSD technologies across diverse platforms, including those optimized for AI and data-intensive workloads. Ashwin collaborates across cross-functional teams utilizing advanced AI methodologies and breakthrough innovations to enhance the capabilities of Solidigm SSDs in AI-driven environments. He holds a Bachelor of Engineering in Electronics from VES Institute of Technology and an M.S. in Computer Engineering from North Carolina State University.
Akhil Srinivas is an Electrical & Systems Engineer at Solidigm. He collaborates with industry-leading ecosystem vendors to validate Solidigm SSDs for cutting-edge storage solutions. He leverages emerging AI technologies and pathfinding innovations to position Solidigm SSDs as critical components in next-generation platforms, strengthening partnerships in the AI space. Beyond the enterprise, he indulges in culinary adventures, exploring popular food trucks and restaurants across the country. Akhil holds a Bachelor of Telecommunications Engineering from R.V. College of Engineering and an M.S. in Electrical and Computer Engineering from University of California, Davis.
The Solidigm D7-PS1010 E1.S is the leading SSD for AI workloads. The E1.S D7-PS1010 is offered in the following form factors targeted for specific AI server environments:
| Form Factor | Capacity | |||
|---|---|---|---|---|
| 3.84TB | 7.68TB | 3.84TB | 7.68TB | |
| 9.5mm |
|
|
|
|
| 15mm |
|
|
|
|
For more details about the Solidigm 3.84TB E1.S D7 PS1010, please visit:
https://www.solidigm.com/E1.S-D7-PS1010-3.84TB
We have referred to the following links for BaM and GIDS tools and scripts.
©2026, Solidigm. “Solidigm” is a registered trademark of SK hynix NAND Product Solutions Corp. (d/b/a Solidigm) in the United States, People’s Republic of China, Singapore, Japan, the European Union, the United Kingdom, Mexico, and other countries.
Other names and brands may be claimed as the property of others.
Solidigm may make changes to specifications and product descriptions at any time, without notice.
Tests document the performance of components on a particular test, in specific systems.
Differences in hardware, software, or configuration will affect actual performance.
Consult other sources of information to evaluate performance as you consider your purchase.
These results are preliminary and provided for information purposes only. These values and claims are neither final nor official.
Drives are considered engineering samples. Refer to roadmap for production guidance.