A unified optimization approach for cnn model inference on integrated gpus
Modern deep learning applications urge to push the model inference taking place at the
edge devices for multiple reasons such as achieving shorter latency, relieving the burden of …
edge devices for multiple reasons such as achieving shorter latency, relieving the burden of …
Chopin: Scalable graphics rendering in multi-gpu systems via parallel image composition
The appetite for higher and higher 3D graphics quality continues to drive GPU computing
requirements. To satisfy these demands, GPU vendors are moving towards new …
requirements. To satisfy these demands, GPU vendors are moving towards new …
Emerald: Graphics modeling for SoC systems
AA Gubran, TM Aamodt - … of the 46th International Symposium on …, 2019 - dl.acm.org
Mobile systems-on-chips (SoCs) have become ubiquitous computing platforms, and, in
recent years, they have become increasingly heterogeneous and complex. A typical SoC …
recent years, they have become increasingly heterogeneous and complex. A typical SoC …
A benchmarking framework for interactive 3d applications in the cloud
With the growing popularity of cloud gaming and cloud virtual reality (VR), interactive 3D
applications have become a major class of workloads for the cloud. However, despite their …
applications have become a major class of workloads for the cloud. However, despite their …
Wasp: Warp scheduling to mimic prefetching in graphics workloads
Contemporary GPUs are designed to handle long-latency operations effectively; however,
challenges such as core occupancy (number of warps in a core) and pipeline width can …
challenges such as core occupancy (number of warps in a core) and pipeline width can …
Omega-test: A predictive early-z culling to improve the graphics pipeline energy-efficiency
D Corbalan-Navarro, JL Aragón… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
The most common task of GPUs is to render images in real time. When rendering a 3D
scene, a key step is to determine which parts of every object are visible in the final image …
scene, a key step is to determine which parts of every object are visible in the final image …
Triangle dropping: an occluded-geometry predictor for energy-efficient mobile GPUs
This article proposes a novel micro-architecture approach for mobile GPUs aimed at early
removing the occluded geometry in a scene by leveraging frame-to-frame coherence, thus …
removing the occluded geometry in a scene by leveraging frame-to-frame coherence, thus …
Boustrophedonic Frames: Quasi-Optimal L2 Caching for Textures in GPUs
Literature is plentiful in works exploiting cache locality for GPUs. A majority of them explore
replacement or bypassing policies. In this paper, however, we surpass this exploration by …
replacement or bypassing policies. In this paper, however, we surpass this exploration by …
Mesh clustering and reordering based on normal locality for efficient rendering
S Kim, CH Lee - Symmetry, 2022 - mdpi.com
Recently, the size of models for real-time rendering has been significantly increasing for
realism, and many graphics applications are being developed in mobile devices with …
realism, and many graphics applications are being developed in mobile devices with …
[PDF][PDF] ImpRoving MemoRy Access Efficiency foR Real-time RendeRing in Tile-based GPU ARchitectuRes
D Joseph - 2024 - personals.ac.upc.edu
In recent years, mobile devices have become an integral part of modern life and are here to
stay. Given that vision is one of the fastest and most intuitive ways of human perception, it …
stay. Given that vision is one of the fastest and most intuitive ways of human perception, it …