vDNN: Virtualized deep neural networks for scalable, memory-efficient neural network design

M Rhu, N Gimelshein, J Clemons… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
The most widely used machine learning frameworks require users to carefully tune their
memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU …

Oblivm: A programming framework for secure computation

C Liu, XS Wang, K Nayak, Y Huang… - 2015 IEEE Symposium …, 2015 - ieeexplore.ieee.org
We design and develop ObliVM, a programming framework for secure computation. ObliVM
offers a domain specific language designed for compilation of programs into efficient …

Compressing DMA engine: Leveraging activation sparsity for training deep neural networks

M Rhu, M O'Connor, N Chatterjee… - … Symposium on High …, 2018 - ieeexplore.ieee.org
Popular deep learning frameworks require users to fine-tune their memory usage so that the
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …

Pump up the volume: Processing large data on gpus with fast interconnects

C Lutz, S Breß, S Zeuch, T Rabl, V Markl - Proceedings of the 2020 ACM …, 2020 - dl.acm.org
GPUs have long been discussed as accelerators for database query processing because of
their high processing power and memory bandwidth. However, two main challenges limit the …

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes

R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org
Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …

Efficient address translation for architectures with multiple page sizes

G Cox, A Bhattacharjee - ACM SIGPLAN Notices, 2017 - dl.acm.org
Processors and operating systems (OSes) support multiple memory page sizes. Superpages
increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained …

The art of balance: a RateupDB™ experience of building a CPU/GPU hybrid database product

R Lee, M Zhou, C Li, S Hu, J Teng, D Li… - Proceedings of the VLDB …, 2021 - dl.acm.org
GPU-accelerated database systems have been studied for more than 10 years, ranging from
prototyping development to industry products serving in multiple domains of data …

Batch-aware unified memory management in GPUs for irregular workloads

H Kim, J Sim, P Gera, R Hadidi, H Kim - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org
While unified virtual memory and demand paging in modern GPUs provide convenient
abstractions to programmers for working with large-scale applications, they come at a …

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org
Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations

H Zhang, Y Zhou, Y Xue, Y Liu, J Huang - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
To break the GPU memory wall for scaling deep learning workloads, a variety of architecture
and system techniques have been proposed recently. Their typical approaches include …