Matrix factorizations at scale: A comparison of scientific data analytics in Spark and C+ MPI using three case studies
We explore the trade-offs of performing linear algebra using Apache Spark, compared to
traditional C and MPI implementations on HPC platforms. Spark is designed for data …
traditional C and MPI implementations on HPC platforms. Spark is designed for data …
ArrayUDF: User-defined scientific data analysis on arrays
User-Defined Functions (UDF) allow application programmers to specify analysis operations
on data, while leaving the data management tasks to the system. This general approach …
on data, while leaving the data management tasks to the system. This general approach …
A high performance query analytical framework for supporting data-intensive climate studies
Climate observations and model simulations produce vast amounts of data. The
unprecedented data volume and the complexity of geospatial statistics and analysis requires …
unprecedented data volume and the complexity of geospatial statistics and analysis requires …
The case for alternative web archival formats to expedite the data-to-insight cycle
The WARC file format is widely used by web archives to preserve collected web content for
future use. With the rapid growth of web archives and the increasing interest to reuse these …
future use. With the rapid growth of web archives and the increasing interest to reuse these …
Spark and HPC for high energy physics data analyses
S Sehrish, J Kowalkowski… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
A full High Energy Physics (HEP) data analysis is divided into multiple data reduction
phases. Processing within these phases is extremely time consuming, therefore …
phases. Processing within these phases is extremely time consuming, therefore …
Zero-cost, Arrow-enabled data interface for Apache Spark
SA Rodriguez, J Chackrabroty, A Chu… - … Conference on Big …, 2021 - ieeexplore.ieee.org
Distributed data processing ecosystems are widespread and their components are highly
specialized, such that efficient interoperability is urgent. Recently, Apache Arrow was …
specialized, such that efficient interoperability is urgent. Recently, Apache Arrow was …
SciDP: Support HPC and big data applications via integrated scientific data processing
Modern High Performance Computing (HPC) applications, such as Earth science
simulations, produce large amounts of data due to the surging of computing power, while big …
simulations, produce large amounts of data due to the surging of computing power, while big …
Distributed interactive visualization using GPU-optimized spark
With the advent of advances in imaging and computing technologies, large-scale data
acquisition and processing have become commonplace in many science and engineering …
acquisition and processing have become commonplace in many science and engineering …
Fits data source for apache spark
J Peloton, C Arnault, S Plaszczynski - Computing and Software for Big …, 2018 - Springer
We investigate the performance of Apache Spark, a cluster computing framework, for
analyzing data from future LSST-like galaxy surveys. Apache Spark attempts to address big …
analyzing data from future LSST-like galaxy surveys. Apache Spark attempts to address big …
[PDF][PDF] Towards Implicit Parallel Programming for Systems
S Ertel - 2019 - core.ac.uk
Processor architectures have reached a physical boundary that prevents scaling
performance with the number of transistors. Effectively, this means that the sequential …
performance with the number of transistors. Effectively, this means that the sequential …