[HTML][HTML] MapReduce scheduling algorithms in Hadoop: a systematic study

S Hedayati, N Maleki, T Olsson, F Ahlgren… - Journal of Cloud …, 2023 - Springer
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses
Hadoop Distributed File System (HDFS) for storing data and uses MapReduce to process …

Task scheduling in big data platforms: a systematic literature review

M Soualhia, F Khomh, S Tahar - Journal of Systems and Software, 2017 - Elsevier
Abstract Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both
research and industrial communities that allow expressing and processing distributed …

A delay-based dynamic scheduling algorithm for bag-of-task workflows with stochastic task execution times in clouds

Z Cai, X Li, R Ruiz, Q Li - Future Generation Computer Systems, 2017 - Elsevier
Abstract Bag-of-Tasks (BoT) workflows are widespread in many big data analysis fields.
However, there are very few cloud resource provisioning and scheduling algorithms tailored …

Budget aware scheduling algorithm for workflow applications in IaaS clouds

K Kalyan Chakravarthi, L Shyamala, V Vaidehi - Cluster Computing, 2020 - Springer
Cloud computing, a novel and promising model of Service-oriented computing, provides a
pay-per-use framework to solve large-scale scientific and business workflow applications …

[HTML][HTML] Elastic scheduling of scientific workflows under deadline constraints in cloud computing environments

N Anwar, H Deng - Future Internet, 2018 - mdpi.com
Scientific workflow applications are collections of several structured activities and fine-
grained computational tasks. Scientific workflow scheduling in cloud computing is a …

An intelligent clustering algorithm for high-dimensional multiview data in big data applications

Q Tao, C Gu, Z Wang, D Jiang - Neurocomputing, 2020 - Elsevier
There are many high-dimensional multiview data in various big data applications. It is very
difficult to deal with those high-dimensional multiview data for the classic clustering …

Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems

H Herodotou, E Kakoulli - ACM Transactions on Database Systems, 2023 - dl.acm.org
The use of storage tiering is becoming popular in data-intensive compute clusters due to the
recent advancements in storage technologies. The Hadoop Distributed File System, for …

SPO: a secure and performance-aware optimization for MapReduce scheduling

N Maleki, AM Rahmani, M Conti - Journal of Network and Computer …, 2021 - Elsevier
MapReduce is a common framework that effectively processes multi-petabyte data in a
distributed manner. Therefore, MapReduce is widely used in heterogeneous environments …

MapReduce: an infrastructure review and research insights

N Maleki, AM Rahmani, M Conti - The Journal of Supercomputing, 2019 - Springer
In the current decade, doing the search on massive data to find “hidden” and valuable
information within it is growing. This search can result in heavy processing on considerable …

Replica-aware task scheduling and load balanced cache placement for delay reduction in multi-cloud environment

C Li, J Zhang, H Tang - The Journal of Supercomputing, 2019 - Springer
With the development of content-sharing and collaborative computing services such as
online social networks, scientific workflow, there are huge amounts of data generated. To …