Efficient metadata indexing for hpc storage systems

AK Paul, B Wang, N Rutman, C Spitz… - 2020 20th IEEE/ACM …, 2020 - ieeexplore.ieee.org
The increase in data generation rate along with the scale of today's high performance
computing (HPC) storage systems make finding and managing files extremely difficult …

An integrated indexing and search service for distributed file systems

H Sim, A Khan, SS Vazhkudai, SH Lim… - … on Parallel and …, 2020 - ieeexplore.ieee.org
Data services such as search, discovery, and management in scalable distributed
environments have traditionally been decoupled from the underlying file systems, and are …

Miqs: Metadata indexing and querying service for self-describing file formats

W Zhang, S Byna, H Tang, B Williams… - Proceedings of the …, 2019 - dl.acm.org
Scientific applications often store datasets in self-describing data file formats, such as HDF5
and netCDF. Regrettably, to efficiently search the metadata within these files remains …

Strategy for research data management services in Indonesia

E Marlina, B Purwandari - Procedia Computer Science, 2019 - Elsevier
Research data management (RDM) ensures the availability of data access and long term
data preservation. Its practices are common in developed countries. On the other hand, it is …

SciSpace: A scientific collaboration workspace for geo-distributed HPC data centers

A Khan, T Kim, H Byun, Y Kim - Future Generation Computer Systems, 2019 - Elsevier
Future terabit networks are committed to dramatically improving big data motion between
geographically dispersed HPC data centers. The scientific community takes advantage of …

Gufi: fast, secure file system metadata search for both privileged and unprivileged users

D Manno, J Lee, P Challa, Q Zheng… - … Conference for High …, 2022 - ieeexplore.ieee.org
Modern High-Performance Computing (HPC) data centers routinely store massive data sets
resulting in millions of directories and billions of files. To efficiently search and sift through …

A content fingerprint-based cluster-wide inline deduplication for shared-nothing storage systems

A Khan, P Hamandawana, Y Kim - IEEE Access, 2020 - ieeexplore.ieee.org
Deduplication has been principally employed in distributed storage systems to improve
storage space efficiency. Traditional deduplication research ignores the design …

Exploring metadata search essentials for scientific data management

W Zhang, S Byna, C Niu, Y Chen - 2019 IEEE 26th …, 2019 - ieeexplore.ieee.org
Scientific experiments and observations store massive amounts of data in various scientific
file formats. Metadata, which describes the characteristics of the data, is commonly used to …

Scanns: Towards scalable and concurrent data indexing and searching in high-end computing system

AI Orhean, A Giannakou… - 2022 22nd IEEE …, 2022 - ieeexplore.ieee.org
Increasing data volumes, particularly in science and engineering, has resulted in the
widespread adoption of parallel and distributed file systems for data storage and access …

Hades: A context-aware active storage framework for accelerating large-scale data analysis

J Cernuda, L Logan, A Gainaru… - 2024 IEEE 24th …, 2024 - ieeexplore.ieee.org
Modern simulation workflows generate and analyze massive amounts of data using I/O
libraries like Adios2 and NetCDF. Although extensive work has optimized the I/O processes …