The state of the art of metadata managements in large-scale distributed file systems—scalability, performance and availability
File system metadata is the data in charge of maintaining namespace, permission semantics
and location of file data blocks. Operations on the metadata can account for up to 80% of …
and location of file data blocks. Operations on the metadata can account for up to 80% of …
Exploring the role of machine learning in scientific workflows: Opportunities and challenges
In this survey, we discuss the challenges of executing scientific workflows as well as existing
Machine Learning (ML) techniques to alleviate those challenges. We provide the context …
Machine Learning (ML) techniques to alleviate those challenges. We provide the context …
Autoscaling tiered cloud storage in Anna
In this paper, we describe how we extended a distributed key-value store called Anna into
an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to …
an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to …
Automating distributed tiered storage management in cluster computing
H Herodotou, E Kakoulli - arXiv preprint arXiv:1907.02394, 2019 - arxiv.org
Data-intensive platforms such as Hadoop and Spark are routinely used to process massive
amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and …
amounts of data residing on distributed file systems like HDFS. Increasing memory sizes and …
Mosaic: a budget-conscious storage engine for relational database systems
Relational database systems are purpose-built for a specific storage device class (eg, HDD,
SSD, or DRAM). They do not cope well with the multitude of storage devices that are …
SSD, or DRAM). They do not cope well with the multitude of storage devices that are …
Cost-based Data Prefetching and Scheduling in Big Data Platforms over Tiered Storage Systems
H Herodotou, E Kakoulli - ACM Transactions on Database Systems, 2023 - dl.acm.org
The use of storage tiering is becoming popular in data-intensive compute clusters due to the
recent advancements in storage technologies. The Hadoop Distributed File System, for …
recent advancements in storage technologies. The Hadoop Distributed File System, for …
Multi-objective optimization of data placement in a storage-as-a-service federated cloud
Cloud federation enables service providers to collaborate to provide better services to
customers. For cloud storage services, optimizing customer object placement for a member …
customers. For cloud storage services, optimizing customer object placement for a member …
Autoscaling tiered cloud storage in anna
In this paper, we describe how we extended a distributed key-value store called Anna into
an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to …
an autoscaling, multi-tier service for the cloud. In its extended form, Anna is designed to …
Cfs: A distributed file system for large scale container platforms
H Liu, W Ding, Y Chen, W Guo, S Liu, T Li… - Proceedings of the …, 2019 - dl.acm.org
We propose CFS, a distributed file system for large scale container platforms. CFS supports
both sequential and random file accesses with optimized storage for both large files and …
both sequential and random file accesses with optimized storage for both large files and …
Netco: Cache and i/o management for analytics over disaggregated stores
We consider a common setting where storage is disaggregated from the compute in data-
parallel systems. Colocating caching tiers with the compute machines can reduce load on …
parallel systems. Colocating caching tiers with the compute machines can reduce load on …