Multi-tenant cloud data services: state-of-the-art, challenges and opportunities

V Narasayya, S Chaudhuri - … of the 2022 International Conference on …, 2022 - dl.acm.org
Enterprises are moving their business-critical workloads to public clouds at an accelerating
pace. Multi-tenancy is a crucial tenet for cloud data service providers allowing them to …

Exploiting Cloud Object Storage for High-Performance Analytics

D Durner, V Leis, T Neumann - Proceedings of the VLDB Endowment, 2023 - dl.acm.org
Elasticity of compute and storage is crucial for analytical cloud database systems. All cloud
vendors provide disaggregated object stores, which can be used as storage backend for …

Silod: A co-design of caching and scheduling for deep learning clusters

H Zhao, Z Han, Z Yang, Q Zhang, M Li, F Yang… - Proceedings of the …, 2023 - dl.acm.org
Deep learning training on cloud platforms usually follows the tradition of the separation of
storage and computing. The training executes on a compute cluster equipped with …

Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML

A Agrawal, R Chatterjee, C Curino, A Floratou… - arXiv preprint arXiv …, 2019 - arxiv.org
Machine learning (ML) has proven itself in high-value web applications such as search
ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios …

Peregrine: Workload optimization for cloud query engines

A Jindal, H Patel, A Roy, S Qiao, Z Yin, R Sen… - Proceedings of the …, 2019 - dl.acm.org
Database administrators (DBAs) were traditionally responsible for optimizing the on-premise
database workloads. However, with the rise of cloud data services, where cloud providers …

Is-hbase: An in-storage computing optimized hbase with i/o offloading and self-adaptive caching in compute-storage disaggregated infrastructure

Z Cao, H Dong, Y Wei, S Liu, DHC Du - ACM Transactions on Storage …, 2022 - dl.acm.org
Active storage devices and in-storage computing are proposed and developed in recent
years to effectively reduce the amount of required data traffic and to improve the overall …

DFMan: A graph-based optimization of dataflow scheduling on high-performance computing systems

F Chowdhury, F Di Natale, A Moody… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Scientific research and development campaigns are materialized by workflows of
applications executing on high-performance computing (HPC) systems. These applications …

DAG-aware harmonizing job scheduling and data caching for disaggregated analytics frameworks

Y Tong, J Liu, H Wang, M He, K Zhou, R He… - Future Generation …, 2024 - Elsevier
Modern data analytics frameworks often integrate with external storage services, which can
lead to storage bottlenecks. Existing caching and prefetching solutions utilize high-level …

Hard: a heterogeneity-aware replica deletion for hdfs

HE Ciritoglu, J Murphy, C Thorpe - Journal of big data, 2019 - Springer
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets
reliably on clusters of commodity machines. The HDFS takes advantage of replication to …

A community cache with complete information

M Abdi, A Mosayyebzadeh, MH Hajkazemi… - … USENIX Conference on …, 2021 - usenix.org
Kariz is a new architecture for caching data from datalakes accessed, potentially
concurrently, by multiple analytic platforms. It integrates rich information from analytics …