Adaptive query processing

A Deshpande, Z Ives, V Raman - Foundations and Trends® …, 2007 - nowpublishers.com
As the data management field has diversified to consider settings in which queries are
increasingly complex, statistics are less available, or data is stored remotely, there has been …

Presto: SQL on everything

R Sethi, M Traverso, D Sundstrom… - 2019 IEEE 35th …, 2019 - ieeexplore.ieee.org
Presto is an open source distributed query engine that supports much of the SQL analytics
workload at Facebook. Presto is designed to be adaptive, flexible, and extensible. It supports …

Dryad: distributed data-parallel programs from sequential building blocks

M Isard, M Budiu, Y Yu, A Birrell, D Fetterly - Proceedings of the 2nd …, 2007 - dl.acm.org
Dryad is a general-purpose distributed execution engine for coarse-grain data-parallel
applications. A Dryad application combines computational" vertices" with communication" …

[PDF][PDF] MapReduce online.

T Condie, N Conway, P Alvaro, JM Hellerstein… - Nsdi, 2010 - usenix.org
MapReduce is a popular framework for data-intensive distributed computing of batch jobs.
To simplify fault tolerance, many implementations of MapReduce materialize the entire …

Streamcloud: An elastic and scalable data streaming system

V Gulisano, R Jimenez-Peris… - … on Parallel and …, 2012 - ieeexplore.ieee.org
Many applications in several domains such as telecommunications, network security, large-
scale sensor networks, require online processing of continuous data flows. They produce …

Timestream: Reliable stream computation in the cloud

Z Qian, Y He, C Su, Z Wu, H Zhu, T Zhang… - Proceedings of the 8th …, 2013 - dl.acm.org
TimeStream is a distributed system designed specifically for low-latency continuous
processing of big streaming data on a large cluster of commodity machines. The unique …

Out-of-order processing: a new architecture for high-performance stream systems

J Li, K Tufte, V Shkapenyuk, V Papadimos… - Proceedings of the …, 2008 - dl.acm.org
Many stream-processing systems enforce an order on data streams during query evaluation
to help unblock blocking operators and purge state from stateful operators. Such in-order …

A survey of state management in big data processing systems

QC To, J Soto, V Markl - The VLDB Journal, 2018 - Springer
The concept of state and its applications vary widely across big data processing systems.
This is evident in both the research literature and existing systems, such as Apache Flink …

{StreamScope}: Continuous Reliable Distributed Processing of Big Data Streams

W Lin, Z Qian, J Xu, S Yang, J Zhou… - 13th USENIX Symposium …, 2016 - usenix.org
STREAMSCOPE (or STREAMS) is a reliable distributed stream computation engine that has
been deployed in shared 20,000-server production clusters at Microsoft. STREAMS provides …

Scientific workflow design for mere mortals

T McPhillips, S Bowers, D Zinn, B Ludäscher - Future Generation Computer …, 2009 - Elsevier
Recent years have seen a dramatic increase in research and development of scientific
workflow systems. These systems promise to make scientists more productive by automating …