[PDF][PDF] H5spark: bridging the i/o gap between spark and scientific data formats on hpc systems
The Spark framework has been tremendously powerful for performing Big Data analytics in
distributed data centers. However, using Spark to analyze large-scale scientific data on HPC …
distributed data centers. However, using Spark to analyze large-scale scientific data on HPC …
A data-driven approach to nation-scale building energy modeling
In 2019, 125 million US residential and commercial buildings consumed $412 billion in
energy bills. These buildings currently consume 40% of the nation's primary energy, 73% of …
energy bills. These buildings currently consume 40% of the nation's primary energy, 73% of …
Client-side straggler-aware I/O scheduler for object-based parallel file systems
Object-based parallel file systems have emerged as promising storage solutions for high-
performance computing (HPC) systems. Despite the fact that object storage provides a …
performance computing (HPC) systems. Despite the fact that object storage provides a …
Log-assisted straggler-aware I/O scheduler for high-end computing
Object-based parallel file systems have emerged as promising storage solutions for high-
end computing (HEC) systems. Despite the fact that object storage provides a flexible …
end computing (HEC) systems. Despite the fact that object storage provides a flexible …
Concurrent dynamic memory coalescing on GoblinCore-64 architecture
The majority of modern microprocessors are architected to utilize multi-level data caches as
a primary optimization to reduce the latency and increase the perceived bandwidth from an …
a primary optimization to reduce the latency and increase the perceived bandwidth from an …
Accelerating Columnar Storage Based on Asynchronous Skipping Strategy
W Li, Z Yang, L Deng, Z Cheng, W Wen, Y He - Big Data Research, 2023 - Elsevier
Many database applications, such as OnLine Analytical Processing (OLAP), web-based
information extraction or scientific computation, need to select a subset of fields based on …
information extraction or scientific computation, need to select a subset of fields based on …
In situ storage layout optimization for amr spatio-temporal read accesses
Analyses of large simulation data often concentrate on regions in space and in time that
contain important information. As simulations adopt Adaptive Mesh Refinement (AMR), the …
contain important information. As simulations adopt Adaptive Mesh Refinement (AMR), the …
Heavy-tailed distribution of parallel I/O system response time
Estimating I/O time of applications is critical for computing system research and
developments, such as performance tuning and job scheduling. Parallel I/O systems on …
developments, such as performance tuning and job scheduling. Parallel I/O systems on …
[PDF][PDF] Debugging in Parallel or Sequential: An Empirical Study.
Faults need to be identified, localized, and removed from programs. Empirical studies show
that coverage-based faults localizations effectively target bugs, even in the presence of …
that coverage-based faults localizations effectively target bugs, even in the presence of …
[PDF][PDF] Distributed nosql storage for extreme-scale system services
Today with the rapidly accumulated data, datadriven applications are emerging in science
and commercial areas. On both HPC systems and clouds the continuously widening …
and commercial areas. On both HPC systems and clouds the continuously widening …