Proactive Data Containers for Scientific Storage
J Soumagne, R Warren, J Mu, N Bagha, V Vishwanath… - 2019 - osti.gov
2019•osti.gov
Emerging HPC systems are expected to be deployed with an unprecedented level of
complexity, due to a deep system memory/storage hierarchy and heterogeneity of the
storage hardware. This hierarchy is expected to range from CPU cache through several
levels of volatile memory to nonvolatile memory, traditional hard disks, and tape. Simple and
efficient methods of data management and movement through this hierarchy is critical for
scientific applications using exascale systems. Existing storage system and I/O (SSIO) …
complexity, due to a deep system memory/storage hierarchy and heterogeneity of the
storage hardware. This hierarchy is expected to range from CPU cache through several
levels of volatile memory to nonvolatile memory, traditional hard disks, and tape. Simple and
efficient methods of data management and movement through this hierarchy is critical for
scientific applications using exascale systems. Existing storage system and I/O (SSIO) …
Emerging HPC systems are expected to be deployed with an unprecedented level of complexity, due to a deep system memory/storage hierarchy and heterogeneity of the storage hardware. This hierarchy is expected to range from CPU cache through several levels of volatile memory to nonvolatile memory, traditional hard disks, and tape. Simple and efficient methods of data management and movement through this hierarchy is critical for scientific applications using exascale systems. Existing storage system and I/O (SSIO) technologies face severe challenges in dealing with these requirements. POSIX and MPI I/O standards that are the basis for existing I/O libraries and parallel file systems present fundamental challenges in the areas of scalable metadata operations, semantics-based data movement performance tuning, asynchronous operation, and support for scalable consistency of distributed operations. Moving toward new paradigms for SSIO in the extreme-scale era, we have proposed to investigate novel object-based data abstractions and storage mechanisms that take advantage of the deep storage hierarchy and enable proactive automated performance tuning. In order to achieve these overarching goals, we initiated an effort to develop a fundamental new data abstraction, called Proactive Data Containers (PDC). A PDC is a container within a locus of storage (memory, NVRAM, disk, etc.) that stores science data in an object-centric manner. Managing data as objects enables powerful optimization opportunities for data movement and transformations. The R&D focus of this project are: 1) formulation of object-oriented PDCs and their mapping in different levels of the exascale storage hierarchy; 2) efficient strategies for moving data in deep storage hierarchies using PDCs; 3) techniques for transforming and reorganizing data based on application requirements; and 4) novel analysis paradigms for enabling data transformations and user-defined analysis on data in PDCs. Toward achieving these overarching goals, we designed an object-centric application programing interface (API) for HPC, scalable metadata management for object-centric storage systems, and data movement optimizations such as Data Elevator for moving data between two levels of storage devices and TAPIOCA for efficient aggregation of data on compute nodes. We then implemented several components of the PDC system. They include metadata management, data placement services, remote procedure calls, data aggregation, etc. We have put them together into the overall PDC framework.
osti.gov
以上显示的是最相近的搜索结果。 查看全部搜索结果