vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design
The most widely used machine learning frameworks require users to carefully tune their
memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU …
memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU …
Oblivm: A programming framework for secure computation
We design and develop ObliVM, a programming framework for secure computation. ObliVM
offers a domain specific language designed for compilation of programs into efficient …
offers a domain specific language designed for compilation of programs into efficient …
Compressing DMA engine: Leveraging activation sparsity for training deep neural networks
Popular deep learning frameworks require users to fine-tune their memory usage so that the
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …
Pump up the volume: Processing large data on gpus with fast interconnects
GPUs have long been discussed as accelerators for database query processing because of
their high processing power and memory bandwidth. However, two main challenges limit the …
their high processing power and memory bandwidth. However, two main challenges limit the …
Mosaic: a GPU memory manager with application-transparent support for multiple page sizes
R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org
Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …
memory and demand paging. These features simplify GPU programming by providing a …
Efficient address translation for architectures with multiple page sizes
G Cox, A Bhattacharjee - ACM SIGPLAN Notices, 2017 - dl.acm.org
Processors and operating systems (OSes) support multiple memory page sizes. Superpages
increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained …
increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained …
The art of balance: a RateupDB™ experience of building a CPU/GPU hybrid database product
R Lee, M Zhou, C Li, S Hu, J Teng, D Li… - Proceedings of the VLDB …, 2021 - dl.acm.org
GPU-accelerated database systems have been studied for more than 10 years, ranging from
prototyping development to industry products serving in multiple domains of data …
prototyping development to industry products serving in multiple domains of data …
Batch-aware unified memory management in GPUs for irregular workloads
While unified virtual memory and demand paging in modern GPUs provide convenient
abstractions to programmers for working with large-scale applications, they come at a …
abstractions to programmers for working with large-scale applications, they come at a …
A framework for memory oversubscription management in graphics processing units
C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …
management of data movement between CPU memory and GPU memory dramatically …
G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations
To break the GPU memory wall for scaling deep learning workloads, a variety of architecture
and system techniques have been proposed recently. Their typical approaches include …
and system techniques have been proposed recently. Their typical approaches include …