Big data systems meet machine learning challenges: towards big data science as a service

R Elshawi, S Sakr, D Talia, P Trunfio - Big data research, 2018 - Elsevier
Recently, we have been witnessing huge advancements in the scale of data we routinely
generate and collect in pretty much everything we do, as well as our ability to exploit modern …

A survey on spatio-temporal data analytics systems

MM Alam, L Torgo, A Bifet - ACM Computing Surveys, 2022 - dl.acm.org
Due to the surge of spatio-temporal data volume, the popularity of location-based services
and applications, and the importance of extracted knowledge from spatio-temporal data to …

One trillion edges: Graph processing at facebook-scale

A Ching, S Edunov, M Kabiljo, D Logothetis… - Proceedings of the …, 2015 - dl.acm.org
Analyzing large graphs provides valuable insights for social networking and web companies
in content ranking and recommendations. While numerous graph processing systems have …

In-memory big data management and processing: A survey

H Zhang, G Chen, BC Ooi, KL Tan… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Growing main memory capacity has fueled the development of in-memory big data
management and processing. By eliminating disk I/O bottleneck, it is now possible to support …

The stratosphere platform for big data analytics

A Alexandrov, R Bergmann, S Ewen, JC Freytag… - The VLDB Journal, 2014 - Springer
We present Stratosphere, an open-source software stack for parallel data analysis.
Stratosphere brings together a unique set of features that allow the expressive, easy, and …

Sprocket: A serverless video processing framework

L Ao, L Izhikevich, GM Voelker, G Porter - Proceedings of the ACM …, 2018 - dl.acm.org
Sprocket is a highly configurable, stage-based, scalable, serverless video processing
framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …

Semeru: A {Memory-Disaggregated} managed runtime

C Wang, H Ma, S Liu, Y Li, Z Ruan, K Nguyen… - … USENIX Symposium on …, 2020 - usenix.org
Resource-disaggregated architectures have risen in popularity for large datacenters.
However, prior disaggregation systems are designed for native applications; in addition, all …

Shark: SQL and rich analytics at scale

RS Xin, J Rosen, M Zaharia, MJ Franklin… - Proceedings of the …, 2013 - dl.acm.org
Shark is a new data analysis system that marries query processing with complex analytics
on large clusters. It leverages a novel distributed memory abstraction to provide a unified …

The MADlib analytics library or MAD skills, the SQL

J Hellerstein, C Ré, F Schoppmann, DZ Wang… - arXiv preprint arXiv …, 2012 - arxiv.org
MADlib is a free, open source library of in-database analytic methods. It provides an
evolving suite of SQL-based algorithms for machine learning, data mining and statistics that …

[PDF][PDF] MLbase: A Distributed Machine-learning System.

T Kraska, A Talwalkar, JC Duchi, R Griffith, MJ Franklin… - Cidr, 2013 - i.stanford.edu
Machine learning (ML) and statistical techniques are key to transforming big data into
actionable knowledge. In spite of the modern primacy of data, the complexity of existing ML …