The state of the art of metadata managements in large-scale distributed file systems—scalability, performance and availability

H Dai, Y Wang, KB Kent, L Zeng… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
File system metadata is the data in charge of maintaining namespace, permission semantics
and location of file data blocks. Operations on the metadata can account for up to 80% of …

Exploring the role of machine learning in scientific workflows: Opportunities and challenges

A Nouri, PE Davis, P Subedi, M Parashar - arXiv preprint arXiv …, 2021 - arxiv.org
In this survey, we discuss the challenges of executing scientific workflows as well as existing
Machine Learning (ML) techniques to alleviate those challenges. We provide the context …

Autoscaling tiered cloud storage in Anna

C Wu, V Sreekanti, JM Hellerstein - Proceedings of the VLDB …, 2019 - dl.acm.org
In this paper, we describe how we extended a distributed key-value store called Anna into
an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to …

Automating distributed tiered storage management in cluster computing

H Herodotou, E Kakoulli - arXiv preprint arXiv:1907.02394, 2019 - arxiv.org
Data-intensive platforms such as Hadoop and Spark are routinely used to process massive
amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and …

Mosaic: a budget-conscious storage engine for relational database systems

L Vogel, V Leis, A van Renen, T Neumann… - Proceedings of the …, 2020 - dl.acm.org
Relational database systems are purpose-built for a specific storage device class (eg, HDD,
SSD, or DRAM). They do not cope well with the multitude of storage devices that are …

Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems

H Herodotou, E Kakoulli - ACM Transactions on Database Systems, 2023 - dl.acm.org
The use of storage tiering is becoming popular in data-intensive compute clusters due to the
recent advancements in storage technologies. The Hadoop Distributed File System, for …

Multi-objective optimization of data placement in a storage-as-a-service federated cloud

A Chikhaoui, L Lemarchand, K Boukhalfa… - ACM Transactions on …, 2021 - dl.acm.org
Cloud federation enables service providers to collaborate to provide better services to
customers. For cloud storage services, optimizing customer object placement for a member …

Autoscaling tiered cloud storage in anna

C Wu, V Sreekanti, JM Hellerstein - The VLDB Journal, 2021 - Springer
In this paper, we describe how we extended a distributed key-value store called Anna into
an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to …

Cfs: A distributed file system for large scale container platforms

H Liu, W Ding, Y Chen, W Guo, S Liu, T Li… - Proceedings of the …, 2019 - dl.acm.org
We propose CFS, a distributed file system for large scale container platforms. CFS supports
both sequential and random file accesses with optimized storage for both large files and …

Netco: Cache and i/o management for analytics over disaggregated stores

V Jalaparti, C Douglas, M Ghosh, A Agrawal… - Proceedings of the …, 2018 - dl.acm.org
We consider a common setting where storage is disaggregated from the compute in data-
parallel systems. Colocating caching tiers with the compute machines can reduce load on …