Los Alamos National Laboratory stands at the forefront of national security research. Since its inception in 1943, LANL has consistently pushed the boundaries of scientific understanding, playing a pivotal role in shaping the modern world. From its historic contributions to the Manhattan Project to its ongoing leadership in cutting-edge fields like computational science, LANL has established itself as a cornerstone of scientific progress.
Today, LANL’s High-Performance Computing (HPC) division exemplifies this commitment to innovation, enabling researchers to manage exabytes of data from cutting-edge research. They continuously push the boundaries of extreme-scale supercomputing, enabling researchers to tackle the exabytes of data generated by cutting-edge scientific research.
This drive to leverage new technologies led to the adoption of the Versity S3 Gateway. This solution bridges the gap between object protocols and various storage backends, including computational storage. It allows researchers to directly access and query simulation data from NVMe storage devices using S3 commands and workflows. Pushing data reduction functions closer to the storage devices saves power and time, allowing analytics functions to be performed on a much smaller analytics cluster vs the traditional ‘big iron’ HPC machines.
Scientific research routinely generates massive datasets, often exceeding petabytes in size for one time step of a single simulation that might capture thousands of time steps. This sheer volume of data presents significant challenges in the realm of scientific data analytics.
Firstly, moving these datasets to analytics applications is time-consuming and expensive, especially since scientific queries typically focus on small data portions. Furthermore, the limitations of legacy data analysis workflows exacerbate this challenge. Traditional workflows necessitate transferring all raw scientific data associated with a query result to the application, demanding that the application execute analysis code on the entirety of the dataset. This leads to unnecessary overhead and undue strain on computational resources.
To address these limitations, LANL developed a novel approach. They envisioned a system where, upon query initiation, data processing occurs directly on a dedicated computational storage device. This device would then transmit only the relevant results to the host application, thereby significantly reducing unnecessary data movement.
LANL leverages an object-based computational storage (OCS) infrastructure, which allows NVMe devices to directly access and interpret data blocks, necessary for query pushdown capabilities. This system simplifies data mapping between data and NVMe blocks compared to traditional file systems. LANL partnered with SK Hynix, leveraging their advanced memory solutions, to develop this advanced computational storage device capable of handling query pushdown and data analytics.
However, in order to push analytic functions down from a logical object view users have of data to a block based NVMe, a translation has to be made. The Versity S3 Gateway facilitates seamless communication between disparate storage systems and enhances query pushdown capabilities. Combined with Apache columnar analytics tools, it bridges the gap between storage technologies, enabling efficient data analysis on massive datasets.
The Versity S3 Gateway streamlines scientific workflows by eliminating data transfers between object storage and NVMe. It removes server input/output (I/O) bottlenecks and improves data access times, allowing a single host to manage petabyte-scale data volumes efficiently. This marks a significant advancement in object data processing capabilities, resulting in faster analysis times, improved research productivity, and deeper scientific insights.
“We are thankful that Versity engaged to produce a flexible and performant S3 gateway that enabled our exploration of push-down analytics at scale,” said Dominic Manno, lead of hot storage research at LANL. “Versity’s open community gateway technology has and will play a part in our journey toward providing next-generation at-scale analytics that leverage the Apache ecosystem.”
Scientific research, particularly at institutions like LANL, often grapples with managing and analyzing massive datasets. These exabyte-sized datasets can be prohibitively expensive to move and analyze, hindering the pace of scientific discovery.
The Versity S3 Gateway bridges the traditional gaps between disparate storage technologies, significantly enhancing the efficiency and scalability of LANL’s HPC applications. By streamlining the integration of object storage and computational storage devices like NVMe, the Gateway accelerates data access, reduces bottlenecks, and empowers researchers to handle large data volumes more effectively.
As LANL continues to lead in computational science and national security research, the Versity S3 Gateway stands out as a critical component in their technological arsenal, driving faster research outcomes and enabling deeper, more insightful scientific discoveries. This advancement underscores LANL’s commitment to maintaining its status as a cornerstone of global scientific progress and innovation.