Multi-tenant cloud data services: state-of-the-art, challenges and opportunities
V Narasayya, S Chaudhuri - … of the 2022 International Conference on …, 2022 - dl.acm.org
Enterprises are moving their business-critical workloads to public clouds at an accelerating
pace. Multi-tenancy is a crucial tenet for cloud data service providers allowing them to …
pace. Multi-tenancy is a crucial tenet for cloud data service providers allowing them to …
Exploiting Cloud Object Storage for High-Performance Analytics
Elasticity of compute and storage is crucial for analytical cloud database systems. All cloud
vendors provide disaggregated object stores, which can be used as storage backend for …
vendors provide disaggregated object stores, which can be used as storage backend for …
Silod: A co-design of caching and scheduling for deep learning clusters
Deep learning training on cloud platforms usually follows the tradition of the separation of
storage and computing. The training executes on a compute cluster equipped with …
storage and computing. The training executes on a compute cluster equipped with …
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
Machine learning (ML) has proven itself in high-value web applications such as search
ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios …
ranking and is emerging as a powerful tool in a much broader range of enterprise scenarios …
Peregrine: Workload optimization for cloud query engines
Database administrators (DBAs) were traditionally responsible for optimizing the on-premise
database workloads. However, with the rise of cloud data services, where cloud providers …
database workloads. However, with the rise of cloud data services, where cloud providers …
Is-hbase: An in-storage computing optimized hbase with i/o offloading and self-adaptive caching in compute-storage disaggregated infrastructure
Active storage devices and in-storage computing are proposed and developed in recent
years to effectively reduce the amount of required data traffic and to improve the overall …
years to effectively reduce the amount of required data traffic and to improve the overall …
DFMan: A graph-based optimization of dataflow scheduling on high-performance computing systems
F Chowdhury, F Di Natale, A Moody… - 2022 IEEE …, 2022 - ieeexplore.ieee.org
Scientific research and development campaigns are materialized by workflows of
applications executing on high-performance computing (HPC) systems. These applications …
applications executing on high-performance computing (HPC) systems. These applications …
DAG-aware harmonizing job scheduling and data caching for disaggregated analytics frameworks
Y Tong, J Liu, H Wang, M He, K Zhou, R He… - Future Generation …, 2024 - Elsevier
Modern data analytics frameworks often integrate with external storage services, which can
lead to storage bottlenecks. Existing caching and prefetching solutions utilize high-level …
lead to storage bottlenecks. Existing caching and prefetching solutions utilize high-level …
Hard: a heterogeneity-aware replica deletion for hdfs
The Hadoop distributed file system (HDFS) is responsible for storing very large data-sets
reliably on clusters of commodity machines. The HDFS takes advantage of replication to …
reliably on clusters of commodity machines. The HDFS takes advantage of replication to …
A community cache with complete information
Kariz is a new architecture for caching data from datalakes accessed, potentially
concurrently, by multiple analytic platforms. It integrates rich information from analytics …
concurrently, by multiple analytic platforms. It integrates rich information from analytics …