[PDF][PDF] Rankreduce–processing k-nearest neighbor queries on top of mapreduce

A Stupar, S Michel, R Schenkel - Large-Scale Distributed Systems for …, 2010 - Citeseer
We consider the problem of processing K-Nearest Neighbor (KNN) queries over large data
sets where the index is jointly maintained by a set of machines in a computing cluster. The …

Max-cover in map-reduce

F Chierichetti, R Kumar, A Tomkins - Proceedings of the 19th …, 2010 - dl.acm.org
The NP-hard Max-k-cover problem requires selecting k sets from a collection so as to
maximize the size of the union. This classic problem occurs commonly in many settings in …

MapReduce indexing strategies: Studying scalability and efficiency

R McCreadie, C Macdonald, I Ounis - Information Processing & …, 2012 - Elsevier
In Information Retrieval (IR), the efficient indexing of terabyte-scale and larger corpora is still
a difficult problem. MapReduce has been proposed as a framework for distributing data …

An efficient data mining framework on Hadoop using Java persistence API

Y Lai, S ZhongZhi - 2010 10th IEEE International Conference …, 2010 - ieeexplore.ieee.org
Data indexing is common in data mining when working with high-dimensional, large-scale
data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has …

[PDF][PDF] University of Glasgow at TREC 2009: Experiments with Terrier.

R McCreadie, C Macdonald, I Ounis, J Peng… - TREC, 2009 - academia.edu
In TREC 2009, we extend our Voting Model for the faceted blog distillation, top stories
identification, and related entity finding tasks. Moreover, we experiment with our novel …

DH-TRIE frequent pattern mining on Hadoop using JPA

L Yang, Z Shi, LD Xu, F Liang… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
The FPgrowth is a famous frequent pattern's algorithm in data mining when working with
high-dimensional, large-scale data sets. It is also known as great complexity on memory for …

BSP cost and scalability analysis for MapReduce operations

H Senger, V Gil‐Costa, L Arantes… - Concurrency and …, 2016 - Wiley Online Library
Data abundance poses the need for powerful and easy‐to‐use tools that support processing
large amounts of data. MapReduce has been increasingly adopted for over a decade by …

[PDF][PDF] A new data mining algorithm based on MapReduce and Hadoop

H Xinxiang, X Henan - Int. J. Signal Proc. Image Process. Pattern …, 2014 - academia.edu
The goal of data mining is to discover hidden useful information in large databases. Mining
frequent patterns from transaction databases is an important problem in data mining. As the …

[PDF][PDF] Comparing distributed indexing: To MapReduce or not?

R McCreadie, C Macdonald, I Ounis - Proc. LSDS-IR, 2009 - academia.edu
Information Retrieval (IR) systems require input corpora to be indexed. The advent of
terabyte-scale Web corpora has reinvigorated the need for efficient indexing. In this work, we …

Indexing word sequences for ranked retrieval

S Huston, JS Culpepper, WB Croft - ACM Transactions on Information …, 2014 - dl.acm.org
Formulating and processing phrases and other term dependencies to improve query
effectiveness is an important problem in information retrieval. However, accessing word …