Big data systems: A software engineering perspective

A Davoudian, M Liu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big Data Systems (BDSs) are an emerging class of scalable software technologies whereby
massive amounts of heterogeneous data are gathered from multiple sources, managed …

Recent advancements in event processing

M Dayarathna, S Perera - ACM Computing Surveys (CSUR), 2018 - dl.acm.org
Event processing (EP) is a data processing technology that conducts online processing of
event information. In this survey, we summarize the latest cutting-edge work done on EP …

Samza: stateful scalable stream processing at LinkedIn

SA Noghabi, K Paramasivam, Y Pan… - Proceedings of the …, 2017 - dl.acm.org
Distributed stream processing systems need to support stateful processing, recover quickly
from failures to resume such processing, and reprocess an entire data stream quickly. We …

What's really new with NewSQL?

A Pavlo, M Aslett - ACM Sigmod Record, 2016 - dl.acm.org
A new class of database management systems (DBMSs) called NewSQL tout their ability to
scale modern on-line transaction processing (OLTP) workloads in a way that is not possible …

Realtime data processing at facebook

GJ Chen, JL Wiener, S Iyer, A Jaiswal, R Lei… - Proceedings of the …, 2016 - dl.acm.org
Realtime data processing powers many use cases at Facebook, including realtime reporting
of the aggregated, anonymized voice of Facebook users, analytics for mobile applications …

Consistency and completeness: Rethinking distributed stream processing in apache kafka

G Wang, L Chen, A Dikshit, J Gustafson… - Proceedings of the …, 2021 - dl.acm.org
An increasingly important system requirement for distributed stream processing applications
is to provide strong correctness guarantees under unexpected failures and out-of-order data …

A survey on the evolution of stream processing systems

M Fragkoulis, P Carbone, V Kalavri, A Katsifodimos - The VLDB Journal, 2024 - Springer
Stream processing has been an active research field for more than 20 years, but it is now
witnessing its prime time due to recent successful efforts by the research community and …

[PDF][PDF] Data Ingestion for the Connected World.

J Meehan, C Aslantas, S Zdonik, N Tatbul, J Du - Cidr, 2017 - people.csail.mit.edu
In this paper, we argue that in many “Big Data” applications, getting data into the system
correctly and at scale via traditional ETL (Extract, Transform, and Load) processes is a …

Beyond analytics: The evolution of stream processing systems

P Carbone, M Fragkoulis, V Kalavri… - Proceedings of the 2020 …, 2020 - dl.acm.org
Stream processing has been an active research field for more than 20 years, but it is now
witnessing its prime time due to recent successful efforts by the research community and …

A survey of state management in big data processing systems

QC To, J Soto, V Markl - The VLDB Journal, 2018 - Springer
The concept of state and its applications vary widely across big data processing systems.
This is evident in both the research literature and existing systems, such as Apache Flink …