Towards high performance paged memory for GPUs

M Rhu, N Gimelshein, J Clemons… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org

The most widely used machine learning frameworks require users to carefully tune their
memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU …

被引用次数：528 相关文章所有 9 个版本

[PDF] ieee-security.org

Oblivm: A programming framework for secure computation

C Liu, XS Wang, K Nayak, Y Huang… - 2015 IEEE Symposium …, 2015 - ieeexplore.ieee.org

We design and develop ObliVM, a programming framework for secure computation. ObliVM
offers a domain specific language designed for compilation of programs into efficient …

被引用次数：471 相关文章所有 17 个版本

[PDF] arxiv.org

Compressing DMA engine: Leveraging activation sparsity for training deep neural networks

M Rhu, M O'Connor, N Chatterjee… - … Symposium on High …, 2018 - ieeexplore.ieee.org

Popular deep learning frameworks require users to fine-tune their memory usage so that the
training data of a deep neural network (DNN) fits within the GPU physical memory. Prior …

被引用次数：240 相关文章所有 12 个版本

[PDF] hpi.de

Pump up the volume: Processing large data on gpus with fast interconnects

C Lutz, S Breß, S Zeuch, T Rabl, V Markl - Proceedings of the 2020 ACM …, 2020 - dl.acm.org

GPUs have long been discussed as accelerators for database query processing because of
their high processing power and memory bandwidth. However, two main challenges limit the …

被引用次数：113 相关文章所有 10 个版本

[PDF] acm.org

Mosaic: a GPU memory manager with application-transparent support for multiple page sizes

R Ausavarungnirun, J Landgraf, V Miller… - Proceedings of the 50th …, 2017 - dl.acm.org

Contemporary discrete GPUs support rich memory management features such as virtual
memory and demand paging. These features simplify GPU programming by providing a …

被引用次数：160 相关文章所有 26 个版本

[PDF] acm.org

Efficient address translation for architectures with multiple page sizes

G Cox, A Bhattacharjee - ACM SIGPLAN Notices, 2017 - dl.acm.org

Processors and operating systems (OSes) support multiple memory page sizes. Superpages
increase Translation Lookaside Buffer (TLB) hits, while small pages provide fine-grained …

被引用次数：145 相关文章所有 10 个版本

[PDF] vldb.org

The art of balance: a RateupDB™ experience of building a CPU/GPU hybrid database product

R Lee, M Zhou, C Li, S Hu, J Teng, D Li… - Proceedings of the VLDB …, 2021 - dl.acm.org

GPU-accelerated database systems have been studied for more than 10 years, ranging from
prototyping development to industry products serving in multiple domains of data …

被引用次数：47 相关文章所有 4 个版本

[PDF] gatech.edu

Batch-aware unified memory management in GPUs for irregular workloads

H Kim, J Sim, P Gera, R Hadidi, H Kim - Proceedings of the Twenty-Fifth …, 2020 - dl.acm.org

While unified virtual memory and demand paging in modern GPUs provide convenient
abstractions to programmers for working with large-scale applications, they come at a …

被引用次数：81 相关文章所有 3 个版本

[PDF] acm.org

A framework for memory oversubscription management in graphics processing units

C Li, R Ausavarungnirun, CJ Rossbach… - Proceedings of the …, 2019 - dl.acm.org

Modern discrete GPUs support unified memory and demand paging. Automatic
management of data movement between CPU memory and GPU memory dramatically …

被引用次数：99 相关文章所有 10 个版本

[PDF] arxiv.org

G10: Enabling an efficient unified gpu memory and storage architecture with smart tensor migrations

H Zhang, Y Zhou, Y Xue, Y Liu, J Huang - … of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org

To break the GPU memory wall for scaling deep learning workloads, a variety of architecture
and system techniques have been proposed recently. Their typical approaches include …

被引用次数：16 相关文章所有 9 个版本