A survey on automatic parameter tuning for big data processing systems

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

Automatic performance tuning for distributed data stream processing systems

H Herodotou, L Odysseos, Y Chen… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Distributed data stream processing systems (DSPSs) such as Storm, Flink, and Spark
Streaming are now routinely used to process continuous data streams in (near) real-time …

[HTML][HTML] Deductive verification of active objects with crowbar

E Kamburjan, M Scaletta, N Rollshausen - Science of Computer …, 2023 - Elsevier
We present Crowbar, a deductive verification tool for the Active Object language ABS.
Crowbar implements novel specification approaches specifically for distributed systems. For …

A Survey of Actor-Like Programming Models for Serverless Computing

J Spenger, P Carbone, P Haller - Active Object Languages: Current …, 2024 - Springer
Serverless computing promises to significantly simplify cloud computing by providing
Functions-as-a-Service where invocations of functions, triggered by events, are …

KORDI: A Framework for Real-Time Performance and Cost Optimization of Apache Spark Streaming

A Kordelas, T Spyrou, S Voulgaris… - … Analysis of Systems …, 2023 - ieeexplore.ieee.org
Apache Spark is one of the most commonly used frameworks for Big Data processing.
Research on the provided streaming dynamic resource allocation feature, has been shown …

A configurable and executable model of Spark Streaming on Apache YARN

JC Lin, MC Lee, IC Yu… - International Journal of …, 2020 - inderscienceonline.com
Streams of data are produced today at an unprecedented scale. Efficient and stable
processing of these streams requires a careful interplay between the parameters of the …

[PDF][PDF] An innovative parameter optimization of Spark Streaming based on D3QN with Gaussian process regression

H Zhang, Z Xu, Y Wang, Y Shen - Math. Biosci. Eng., 2023 - aimspress.com
Nowadays, Spark Streaming, a computing framework based on Spark, is widely used to
process streaming data such as social media data, IoT sensor data or web logs. Due to the …

[PDF][PDF] 面向大数据流的分布式索引构建

杨良怀, 卢晨曦, 范玉雷, 朱镇洋, 潘建 - 软件学报, 2021 - jos.org.cn
大数据流的高效存储与索引是当今数据领域的一大难点. 面向带有时间属性的数据流,
根据其时间属性, 将数据流划分为连续的时间窗口, 提出了基于双层B+ 树的分布式索引结构WB …

Simulating user journeys with active objects

P Kobialka, R Schlatte, GR Bergersen… - Active Object …, 2024 - Springer
The servitization of business makes companies increasingly dependent on providing
carefully designed user experiences for their service offerings. User journeys model services …

On combining system and machine learning performance tuning for distributed data stream applications

L Odysseos, H Herodotou - Distributed and Parallel Databases, 2023 - Springer
The growing need to identify patterns in data and automate decisions based on them in near-
real time, has stimulated the development of new machine learning (ML) applications …