From infrastructure to culture: A/B testing challenges in large scale social networks

Y Xu, N Chen, A Fernandez, O Sinno… - Proceedings of the 21th …, 2015 - dl.acm.org
A/B testing, also known as bucket testing, split testing, or controlled experiment, is a
standard way to evaluate user engagement or satisfaction from a new service, feature, or …

From theory to practice: Efficient join query evaluation in a parallel database system

S Chu, M Balazinska, D Suciu - Proceedings of the 2015 ACM SIGMOD …, 2015 - dl.acm.org
Big data analytics often requires processing complex queries using massive parallelism,
where the main performance metrics is the communication cost incurred during data …

Towards scalability and data skew handling in groupby-joins using mapreduce model

MAH Hassan, M Bamha - Procedia Computer Science, 2015 - Elsevier
For over a decade, MapReduce has become the leading programming model for parallel
and massive processing of large volumes of data. This has been driven by the development …

Sasm: Improving spark performance with adaptive skew mitigation

J Yu, H Chen, F Hu - … conference on progress in informatics and …, 2015 - ieeexplore.ieee.org
Skew is a common phenomenon widely existing in parallel computing platforms, resulting in
slowing down the entire complete time and many idle resources. We present Spark Adaptive …

An efficient MapReduce cube algorithm for varied DataDistributions

T Milo, E Altshuler - Proceedings of the 2016 international conference on …, 2016 - dl.acm.org
Data cubes allow users to discover insights from their data and are commonly used in data
analysis. While very useful, the data cube is expensive to compute, in particular when the …

[HTML][HTML] Computing marginals using MapReduce

FN Afrati, S Sharma, JR Ullman, JD Ullman - Journal of Computer and …, 2018 - Elsevier
We consider the problem of computing data-cube marginals by a single round of
MapReduce, focusing on the relationship between the reducer size and the replication rate …

Materialized views in distributed key-value stores

J Adler - 2020 - mediatum.ub.tum.de
Distributed key-value stores have become the solution of choice for warehousing large
volumes of data. However, their architecture is not suitable for real-time analytics. To …

[PDF][PDF] Scalability and optimisation of groupby-joins in mapreduce

M Bamha, MAH Hassan - Technical report LIFO, Universit´ ed' …, 2015 - researchgate.net
For over a decade, MapReduce has become the leading programming model for parallel
and massive processing of large volumes of data. This has been driven by the development …

Scalability and Optimisation of GroupBy-Joins in MapReduce Scalability and Optimisation of GroupBy-Joins in MapReduce

M Bamha, MAH Hassan - 2015 - hal.science
For over a decade, MapReduce has become the leading programming model for parallel
and massive processing of large volumes of data. This has been driven by the development …

Computing Marginals Using MapReduce: Keynote talk paper

FN Afrati, S Sharma, JD Ullman… - Proceedings of the 20th …, 2016 - dl.acm.org
We consider the problem of computing the data-cube marginals of a fixed order k (ie, all
marginals that aggregate over k dimensions), using a single round of MapReduce. The …