SystemDS: A declarative machine learning system for the end-to-end data science lifecycle

M Boehm, I Antonov, S Baunsgaard, M Dokter… - arXiv preprint arXiv …, 2019 - arxiv.org
Machine learning (ML) applications become increasingly common in many domains. ML
systems to execute these workloads include numerical computing frameworks and libraries …

How to architect a query compiler, revisited

RY Tahboub, GM Essertel, T Rompf - Proceedings of the 2018 …, 2018 - dl.acm.org
To leverage modern hardware platforms to their fullest, more and more database systems
embrace compilation of query plans to native code. In the research community, there is an …

Babelfish: Efficient execution of polyglot queries

PM Grulich, S Zeuch, V Markl - Proceedings of the VLDB Endowment, 2021 - dl.acm.org
Today's users of data processing systems come from different domains, have different levels
of expertise, and prefer different programming languages. As a result, analytical workload …

HetExchange: Encapsulating heterogeneous CPU-GPU parallelism in JIT compiled engines

P Chrysogelos, M Karpathiotakis… - Proceedings of the …, 2019 - infoscience.epfl.ch
Modern server hardware is increasingly heterogeneous as hardware accelerators, such as
GPUs, are used together with multicore CPUs to meet the computational demands of …

Filter before you parse: Faster analytics on raw data with sparser

S Palkar, F Abuzaid, P Bailis, M Zaharia - Proceedings of the VLDB …, 2018 - dl.acm.org
Exploratory big data applications often run on raw unstructured or semi-structured data
formats, such as JSON files or text logs. These applications can spend 80--90% of their …

JSON tiles: Fast analytics on semi-structured data

D Durner, V Leis, T Neumann - … of the 2021 International Conference on …, 2021 - dl.acm.org
Developers often prefer flexibility over upfront schema design, making semi-structured data
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …

On optimizing operator fusion plans for large-scale machine learning in systemml

M Boehm, B Reinwald, D Hutchison… - arXiv preprint arXiv …, 2018 - arxiv.org
Many large-scale machine learning (ML) systems allow specifying custom ML algorithms by
means of linear algebra programs, and then automatically generate efficient execution …

[PDF][PDF] The case for heterogeneous HTAP

R Appuswamy, M Karpathiotakis… - … on Innovative Data …, 2017 - infoscience.epfl.ch
Modern database engines balance the demanding requirements of mixed, hybrid
transactional and analytical processing (HTAP) workloads by relying on i) global shared …

Adaptive partitioning and indexing for in situ query processing

M Olma, M Karpathiotakis, I Alagiannis… - The VLDB Journal, 2020 - Springer
The constant flux of data and queries alike has been pushing the boundaries of data
analysis systems. The increasing size of raw data files has made data loading an expensive …

Slalom: Coasting through raw data via adaptive partitioning and indexing

M Olma, M Karpathiotakis, I Alagiannis… - Proceedings of the …, 2017 - dl.acm.org
The constant flux of data and queries alike has been pushing the boundaries of data
analysis systems. The increasing size of raw data files has made data loading an expensive …