A unified optimization approach for cnn model inference on integrated gpus

L Wang, Z Chen, Y Liu, Y Wang, L Zheng, M Li… - Proceedings of the 48th …, 2019 - dl.acm.org
Modern deep learning applications urge to push the model inference taking place at the
edge devices for multiple reasons such as achieving shorter latency, relieving the burden of …

Chopin: Scalable graphics rendering in multi-gpu systems via parallel image composition

X Ren, M Lis - 2021 IEEE International Symposium on High …, 2021 - ieeexplore.ieee.org
The appetite for higher and higher 3D graphics quality continues to drive GPU computing
requirements. To satisfy these demands, GPU vendors are moving towards new …

Emerald: Graphics modeling for SoC systems

AA Gubran, TM Aamodt - … of the 46th International Symposium on …, 2019 - dl.acm.org
Mobile systems-on-chips (SoCs) have become ubiquitous computing platforms, and, in
recent years, they have become increasingly heterogeneous and complex. A typical SoC …

A benchmarking framework for interactive 3d applications in the cloud

T Liu, S He, S Huang, D Tsang, L Tang… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
With the growing popularity of cloud gaming and cloud virtual reality (VR), interactive 3D
applications have become a major class of workloads for the cloud. However, despite their …

Wasp: Warp scheduling to mimic prefetching in graphics workloads

D Joseph, JL Aragón, JM Parcerisa… - arXiv preprint arXiv …, 2024 - arxiv.org
Contemporary GPUs are designed to handle long-latency operations effectively; however,
challenges such as core occupancy (number of warps in a core) and pipeline width can …

Omega-test: A predictive early-z culling to improve the graphics pipeline energy-efficiency

D Corbalan-Navarro, JL Aragón… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
The most common task of GPUs is to render images in real time. When rendering a 3D
scene, a key step is to determine which parts of every object are visible in the final image …

Triangle dropping: an occluded-geometry predictor for energy-efficient mobile GPUs

D Corbalán-Navarro, JL Aragón, M Anglada… - ACM Transactions on …, 2022 - dl.acm.org
This article proposes a novel micro-architecture approach for mobile GPUs aimed at early
removing the occluded geometry in a scene by leveraging frame-to-frame coherence, thus …

Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs

D Joseph, JL Aragón, JM Parcerisa… - 2023 32nd …, 2023 - ieeexplore.ieee.org
Literature is plentiful in works exploiting cache locality for GPUs. A majority of them explore
replacement or bypassing policies. In this paper, however, we surpass this exploration by …

Mesh clustering and reordering based on normal locality for efficient rendering

S Kim, CH Lee - Symmetry, 2022 - mdpi.com
Recently, the size of models for real-time rendering has been significantly increasing for
realism, and many graphics applications are being developed in mobile devices with …

[PDF][PDF] ImpRoving MemoRy Access Efficiency foR Real-time RendeRing in Tile-based GPU ARchitectuRes

D Joseph - 2024 - personals.ac.upc.edu
In recent years, mobile devices have become an integral part of modern life and are here to
stay. Given that vision is one of the fastest and most intuitive ways of human perception, it …