The anatomy of big data computing

R Kune, PK Konugurthi, A Agarwal… - Software: Practice …, 2016 - Wiley Online Library
Advances in information technology and its widespread growth in several areas of business,
engineering, medical, and scientific studies are resulting in information/data explosion …

Apache spark: a unified engine for big data processing

M Zaharia, RS Xin, P Wendell, T Das… - Communications of the …, 2016 - dl.acm.org
Apache Spark: a unified engine for big data processing Page 1 56 COMMUNICATIONS OF THE
ACM | NOVEMBER 2016 | VOL. 59 | NO. 11 contributed articles DOI:10.1145/2934664 This …

Security and privacy aspects in MapReduce on clouds: A survey

P Derbeko, S Dolev, E Gudes, S Sharma - Computer science review, 2016 - Elsevier
MapReduce is a programming system for distributed processing of large-scale data in an
efficient and fault tolerant manner on a private, public, or hybrid cloud. MapReduce is …

Polylogarithmic-time deterministic network decomposition and distributed derandomization

V Rozhoň, M Ghaffari - Proceedings of the 52nd Annual ACM SIGACT …, 2020 - dl.acm.org
We present a simple polylogarithmic-time deterministic distributed algorithm for network
decomposition. This improves on a celebrated 2 O (√ log n)-time algorithm of Panconesi …

Scalable k-means++

B Bahmani, B Moseley, A Vattani, R Kumar… - arXiv preprint arXiv …, 2012 - arxiv.org
Over half a century old and showing no signs of aging, k-means remains one of the most
popular data processing algorithms. As is well-known, a proper initialization of k-means is …

Communication steps for parallel query processing

P Beame, P Koutris, D Suciu - Journal of the ACM (JACM), 2017 - dl.acm.org
We study the problem of computing conjunctive queries over large databases on parallel
architectures without shared storage. Using the structure of such a query q and the skew in …

Massively parallel computation: Algorithms and applications

S Im, R Kumar, S Lattanzi, B Moseley… - … and Trends® in …, 2023 - nowpublishers.com
The algorithms community has been modeling the underlying key features and constraints of
massively parallel frameworks and using these models to discover new algorithmic …

[图书][B] Data-intensive text processing with MapReduce

J Lin, C Dyer - 2022 - books.google.com
Our world is being revolutionized by data-driven methods: access to large amounts of data
has generated new insights and opened exciting new opportunities in commerce, science …

Counting triangles and the curse of the last reducer

S Suri, S Vassilvitskii - Proceedings of the 20th international conference …, 2011 - dl.acm.org
The clustering coefficient of a node in a social network is a fundamental measure that
quantifies how tightly-knit the community is around the node. Its computation can be reduced …

The k-clique densest subgraph problem

C Tsourakakis - Proceedings of the 24th international conference on …, 2015 - dl.acm.org
Numerous graph mining applications rely on detecting subgraphs which are large near-
cliques. Since formulations that are geared towards finding large near-cliques are hard and …