Big data systems meet machine learning challenges: towards big data science as a service
Recently, we have been witnessing huge advancements in the scale of data we routinely
generate and collect in pretty much everything we do, as well as our ability to exploit modern …
generate and collect in pretty much everything we do, as well as our ability to exploit modern …
A survey on spatio-temporal data analytics systems
Due to the surge of spatio-temporal data volume, the popularity of location-based services
and applications, and the importance of extracted knowledge from spatio-temporal data to …
and applications, and the importance of extracted knowledge from spatio-temporal data to …
One trillion edges: Graph processing at facebook-scale
Analyzing large graphs provides valuable insights for social networking and web companies
in content ranking and recommendations. While numerous graph processing systems have …
in content ranking and recommendations. While numerous graph processing systems have …
In-memory big data management and processing: A survey
Growing main memory capacity has fueled the development of in-memory big data
management and processing. By eliminating disk I/O bottleneck, it is now possible to support …
management and processing. By eliminating disk I/O bottleneck, it is now possible to support …
The stratosphere platform for big data analytics
We present Stratosphere, an open-source software stack for parallel data analysis.
Stratosphere brings together a unique set of features that allow the expressive, easy, and …
Stratosphere brings together a unique set of features that allow the expressive, easy, and …
Sprocket: A serverless video processing framework
Sprocket is a highly configurable, stage-based, scalable, serverless video processing
framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …
framework that exploits intra-video parallelism to achieve low latency. Sprocket enables …
Semeru: A {Memory-Disaggregated} managed runtime
Resource-disaggregated architectures have risen in popularity for large datacenters.
However, prior disaggregation systems are designed for native applications; in addition, all …
However, prior disaggregation systems are designed for native applications; in addition, all …
Shark: SQL and rich analytics at scale
Shark is a new data analysis system that marries query processing with complex analytics
on large clusters. It leverages a novel distributed memory abstraction to provide a unified …
on large clusters. It leverages a novel distributed memory abstraction to provide a unified …
The MADlib analytics library or MAD skills, the SQL
MADlib is a free, open source library of in-database analytic methods. It provides an
evolving suite of SQL-based algorithms for machine learning, data mining and statistics that …
evolving suite of SQL-based algorithms for machine learning, data mining and statistics that …
[PDF][PDF] MLbase: A Distributed Machine-learning System.
Machine learning (ML) and statistical techniques are key to transforming big data into
actionable knowledge. In spite of the modern primacy of data, the complexity of existing ML …
actionable knowledge. In spite of the modern primacy of data, the complexity of existing ML …