JSON tiles: Fast analytics on semi-structured data
Developers often prefer flexibility over upfront schema design, making semi-structured data
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …
formats such as JSON increasingly popular. Large amounts of JSON data are therefore …
Efficient streaming subgraph isomorphism with graph neural networks
Queries to detect isomorphic subgraphs are important in graph-based data management.
While the problem of subgraph isomorphism search has received considerable attention for …
While the problem of subgraph isomorphism search has received considerable attention for …
Gridformation: Towards self-driven online data partitioning using reinforcement learning
GC Durand, M Pinnecke, R Piriyev, M Mohsen… - Proceedings of the First …, 2018 - dl.acm.org
In this paper we define a research agenda to develop a general framework supporting
online autonomous tuning of data partitioning and layouts with a reinforcement learning …
online autonomous tuning of data partitioning and layouts with a reinforcement learning …
Accelerating raw data analysis with the accorda software and hardware architecture
The data science revolution and growing popularity of data lakes make efficient processing
of raw data increasingly important. To address this, we propose the ACCelerated Operators …
of raw data increasingly important. To address this, we propose the ACCelerated Operators …
Resource monitoring framework for big raw data processing
Scientific experiments, simulations, and modern applications generate large amounts of
data. Analysing resources required to process such big datasets is essential to identify …
data. Analysing resources required to process such big datasets is essential to identify …
Intermittent query processing
Many applications ingest data in an intermittent, yet largely predictable, pattern. Existing
systems tend to ignore how data arrives when making decisions about how to update (or …
systems tend to ignore how data arrives when making decisions about how to update (or …
Generating application-specific data layouts for in-memory databases
C Yan, A Cheung - Proceedings of the VLDB Endowment, 2019 - dl.acm.org
Database applications are often developed with object-oriented languages while using
relational databases as the backend. To accelerate these applications, developers would …
relational databases as the backend. To accelerate these applications, developers would …
ParPaRaw: Massively parallel parsing of delimiter-separated raw data
E Stehle, HA Jacobsen - arXiv preprint arXiv:1905.13415, 2019 - arxiv.org
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading,
and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major …
and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major …
In-memory caching for multi-query optimization of data-intensive scalable computing workloads
In modern large-scale distributed systems, analytics jobs submitted by various users often
share similar work. Instead of optimizing jobs independently, multi-query optimization …
share similar work. Instead of optimizing jobs independently, multi-query optimization …
A cost-based storage format selector for materialized results in big data frameworks
Modern big data frameworks (such as Hadoop and Spark) allow multiple users to do large-
scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of …
scale analysis simultaneously, by deploying data-intensive workflows (DIWs). These DIWs of …