A hardware unit for fast SAH-optimised BVH construction

MJ Doyle, C Fowler, M Manzke - ACM Transactions on Graphics (TOG), 2013 - dl.acm.org
MJ Doyle, C Fowler, M Manzke
ACM Transactions on Graphics (TOG), 2013dl.acm.org
Ray-tracing algorithms are known for producing highly realistic images, but at a significant
computational cost. For this reason, a large body of research exists on various techniques
for accelerating these costly algorithms. One approach to achieving superior performance
which has received comparatively little attention is the design of specialised ray-tracing
hardware. The research that does exist on this topic has consistently demonstrated that
significant performance and efficiency gains can be achieved with dedicated …
Ray-tracing algorithms are known for producing highly realistic images, but at a significant computational cost. For this reason, a large body of research exists on various techniques for accelerating these costly algorithms. One approach to achieving superior performance which has received comparatively little attention is the design of specialised ray-tracing hardware. The research that does exist on this topic has consistently demonstrated that significant performance and efficiency gains can be achieved with dedicated microarchitectures. However, previous work on hardware ray-tracing has focused almost entirely on the traversal and intersection aspects of the pipeline. As a result, the critical aspect of the management and construction of acceleration data-structures remains largely absent from the hardware literature.
We propose that a specialised microarchitecture for this purpose could achieve considerable performance and efficiency improvements over programmable platforms. To this end, we have developed the first dedicated microarchitecture for the construction of binned SAH BVHs. Cycle-accurate simulations show that our design achieves significant improvements in raw performance and in the bandwidth required for construction, as well as large efficiency gains in terms of performance per clock and die area compared to manycore implementations. We conclude that such a design would be useful in the context of a heterogeneous graphics processor, and may help future graphics processor designs to reduce predicted technology-imposed utilisation limits.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果