{CXL-ANNS}:{Software-Hardware} collaborative memory disaggregation and computation for {Billion-Scale} approximate nearest neighbor search
J Jang, H Choi, H Bae, S Lee, M Kwon… - 2023 USENIX Annual …, 2023 - usenix.org
We propose CXL-ANNS, a software-hardware collaborative approach to enable highly
scalable approximate nearest neighbor search (ANNS) services. To this end, we first …
scalable approximate nearest neighbor search (ANNS) services. To this end, we first …
FLASH: Fast model adaptation in ML-centric cloud platforms
The emergence of ML in various cloud system management tasks (eg, workload autoscaling
and job scheduling) has become a core driver of ML-centric cloud platforms. However, there …
and job scheduling) has become a core driver of ML-centric cloud platforms. However, there …
Baleen:{ML} Admission & Prefetching for Flash Caches
DLK Wong, H Wu, C Molder, S Gunasekar… - … USENIX Conference on …, 2024 - usenix.org
Flash caches are used to reduce peak backend load for throughput-constrained data center
services, reducing the total number of backend servers required. Bulk storage systems are a …
services, reducing the total number of backend servers required. Bulk storage systems are a …
Cori: Dancing to the right beat of periodic data movements over hybrid memory systems
TD Doudali, D Zahka… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Emerging hybrid memory systems that comprise technologies such as Intel's Optane DC
Persistent Memory, exhibit disparities in the access speeds and capacity ratios of their …
Persistent Memory, exhibit disparities in the access speeds and capacity ratios of their …
On Modular Learning of Distributed Systems for Predicting {End-to-End} Latency
An emerging trend in cloud deployments is to adopt machine learning (ML) models to
characterize end-to-end system performance. Despite early success, such methods can …
characterize end-to-end system performance. Despite early success, such methods can …
BOAT: A bayesian optimization automl time-series framework for industrial applications
Driven by the increasing degree of automation, industrial production plants have become
very data reliant, which poses a great potential for machine learning applications. AutoML is …
very data reliant, which poses a great potential for machine learning applications. AutoML is …
Phronesis: Efficient performance modeling for high-dimensional configuration tuning
Y Li, BC Lee - ACM Transactions on Architecture and Code …, 2022 - dl.acm.org
We present Phronesis, a learning framework for efficiently modeling the performance of data
analytic workloads as a function of their high-dimensional software configuration …
analytic workloads as a function of their high-dimensional software configuration …
Herti: A reinforcement learning-augmented system for efficient real-time inference on heterogeneous embedded systems
Real-time inference is the key technology that enables a variety of latency-critical intelligent
services such as autonomous driving and augmented reality. Heterogeneous embedded …
services such as autonomous driving and augmented reality. Heterogeneous embedded …
FSbrain: An intelligent I/O performance tuning system
To bridge the performance gap between network and storage, performance tuning of file
systems according to different workloads has become an important work on the Internet of …
systems according to different workloads has become an important work on the Internet of …
Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor Search
J Jang, H Choi, H Bae, S Lee, M Kwon… - ACM Transactions on …, 2024 - dl.acm.org
We propose CXL-ANNS, a software-hardware collaborative approach to enable scalable
approximate nearest neighbor search (ANNS) services. To this end, we first disaggregate …
approximate nearest neighbor search (ANNS) services. To this end, we first disaggregate …