JSON tiles: Fast analytics on semi-structured data

D Durner, V Leis, T Neumann - … of the 2021 International Conference on …, 2021 - dl.acm.org
Developers often prefer flexibility over upfront schema design, making semi-structured data
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …

Efficient streaming subgraph isomorphism with graph neural networks

CT Duong, TD Hoang, H Yin, M Weidlich… - Proceedings of the …, 2021 - dl.acm.org
Queries to detect isomorphic subgraphs are important in graph-based data management.
While the problem of subgraph isomorphism search has received considerable attention for …

Gridformation: Towards self-driven online data partitioning using reinforcement learning

GC Durand, M Pinnecke, R Piriyev, M Mohsen… - Proceedings of the First …, 2018 - dl.acm.org
In this paper we define a research agenda to develop a general framework supporting
online autonomous tuning of data partitioning and layouts with a reinforcement learning …

Accelerating raw data analysis with the accorda software and hardware architecture

Y Fang, C Zou, AA Chien - Proceedings of the VLDB Endowment, 2019 - dl.acm.org
The data science revolution and growing popularity of data lakes make efficient processing
of raw data increasingly important. To address this, we propose the ACCelerated Operators …

Resource monitoring framework for big raw data processing

M Patel, M Bhise - International Journal of Big Data …, 2024 - inderscienceonline.com
Scientific experiments, simulations, and modern applications generate large amounts of
data. Analysing resources required to process such big datasets is essential to identify …

Intermittent query processing

D Tang, Z Shang, AJ Elmore, S Krishnan… - Proceedings of the …, 2019 - dl.acm.org
Many applications ingest data in an intermittent, yet largely predictable, pattern. Existing
systems tend to ignore how data arrives when making decisions about how to update (or …

Generating application-specific data layouts for in-memory databases

C Yan, A Cheung - Proceedings of the VLDB Endowment, 2019 - dl.acm.org
Database applications are often developed with object-oriented languages while using
relational databases as the backend. To accelerate these applications, developers would …

ParPaRaw: Massively parallel parsing of delimiter-separated raw data

E Stehle, HA Jacobsen - arXiv preprint arXiv:1905.13415, 2019 - arxiv.org
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading,
and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major …

In-memory caching for multi-query optimization of data-intensive scalable computing workloads

M Pietro, D Carra, S Migliorini - Proceedings of the Workshops of the …, 2019 - iris.univr.it
In modern large-scale distributed systems, analytics jobs submitted by various users often
share similar work. Instead of optimizing jobs independently, multi-query optimization …

A cost-based storage format selector for materialized results in big data frameworks

RF Munir, A Abelló, O Romero, M Thiele… - Distributed and Parallel …, 2020 - Springer
Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-
scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of …