A survey on advancing the dbms query optimizer: Cardinality estimation, cost model, and plan enumeration

H Lan, Z Bao, Y Peng - Data Science and Engineering, 2021 - Springer
Query optimizer is at the heart of the database systems. Cost-based optimizer studied in this
paper is adopted in almost all current database systems. A cost-based optimizer introduces …

The art of balance: a RateupDB™ experience of building a CPU/GPU hybrid database product

R Lee, M Zhou, C Li, S Hu, J Teng, D Li… - Proceedings of the VLDB …, 2021 - dl.acm.org
GPU-accelerated database systems have been studied for more than 10 years, ranging from
prototyping development to industry products serving in multiple domains of data …

Predicate pushdown for data science pipelines

C Yan, Y Lin, Y He - Proceedings of the ACM on Management of Data, 2023 - dl.acm.org
Predicate pushdown is a widely adopted query optimization. Existing systems and prior work
mostly use pattern-matching rules to decide when a predicate can be pushed through …

Phoebe: a learning-based checkpoint optimizer

Y Zhu, M Interlandi, A Roy, K Das, H Patel… - arXiv preprint arXiv …, 2021 - arxiv.org
Easy-to-use programming interfaces paired with cloud-scale processing engines have
enabled big data system users to author arbitrarily complex analytical jobs over massive …

SlabCity: Whole-Query Optimization Using Program Synthesis

R Dong, J Liu, Y Zhu, C Yan, B Mozafari… - Proceedings of the VLDB …, 2023 - dl.acm.org
Query rewriting is often a prerequisite for effective query optimization, particularly for poorly-
written queries. Prior work on query rewriting has relied on a set of" rules" based on syntactic …

Unshackling Database Benchmarking from Synthetic Workloads

P Negi, L Bindschaedler, M Alizadeh… - 2023 IEEE 39th …, 2023 - ieeexplore.ieee.org
Introducing new (learned) features into a DBMS requires considerable experimentation and
benchmarking to avoid regressions in production (customer) workloads. Using standard …

The cosmos big data platform at microsoft: Over a decade of progress and a decade to look forward

C Power, H Patel, A Jindal, J Leeka, B Jenkins… - Proceedings of the …, 2021 - dl.acm.org
The twenty-first century has been dominated by the need for large scale data processing,
marking the birth of big data platforms such as Cosmos. This paper describes the evolution …

[PDF][PDF] Welding Natural Language Queries to Analytics IRs with LLMs.

K Rajan, A Rastogi, A Lal, S Rajendra, K Subramanian… - CIDR, 2024 - cidrdb.org
From the recent momentum behind translating natural language to SQL (nl2sql), to
commercial product offerings such as Co-Pilot for Microsoft Fabric, Large Language Models …

Optimizing ETL Processes for Big Data Applications

HG Kola - International Journal of Engineering and Management …, 2024 - indianjournals.com
Optimizing large-scale data processing has become crucial in the area of data management
due to the constantly growing quantity and complexity of data. Big data analysis involves …

Computation reuse via fusion in Amazon Athena

N Bruno, J Debrodt, C Song… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Amazon Athena is a serverless, interactive query service that allows efficiently analyzing
large volumes of data stored in Amazon S3 using ANSI SQL. Some design choices in the …