Impact of data layouts on the efficiency of GPU-accelerated IDW interpolation
This paper focuses on evaluating the impact of different data layouts on the computational
efficiency of GPU-accelerated Inverse Distance Weighting (IDW) interpolation algorithm.
First we redesign and improve our previous GPU implementation that was performed by
exploiting the feature of CUDA dynamic parallelism (CDP). Then we implement three
versions of GPU implementations, ie, the naive version, the tiled version, and the improved
CDP version, based upon five data layouts, including the Structure of Arrays (SoA), the Array …
efficiency of GPU-accelerated Inverse Distance Weighting (IDW) interpolation algorithm.
First we redesign and improve our previous GPU implementation that was performed by
exploiting the feature of CUDA dynamic parallelism (CDP). Then we implement three
versions of GPU implementations, ie, the naive version, the tiled version, and the improved
CDP version, based upon five data layouts, including the Structure of Arrays (SoA), the Array …
Abstract
This paper focuses on evaluating the impact of different data layouts on the computational efficiency of GPU-accelerated Inverse Distance Weighting (IDW) interpolation algorithm. First we redesign and improve our previous GPU implementation that was performed by exploiting the feature of CUDA dynamic parallelism (CDP). Then we implement three versions of GPU implementations, i.e., the naive version, the tiled version, and the improved CDP version, based upon five data layouts, including the Structure of Arrays (SoA), the Array of Structures (AoS), the Array of aligned Structures (AoaS), the Structure of Arrays of aligned Structures (SoAoS), and the Hybrid layout. We also carry out several groups of experimental tests to evaluate the impact. Experimental results show that: the layouts AoS and AoaS achieve better performance than the layout SoA for both the naive version and tiled version, while the layout SoA is the best choice for the improved CDP version. We also observe that: for the two combined data layouts (the SoAoS and the Hybrid), there are no notable performance gains when compared to other three basic layouts. We recommend that: in practical applications, the layout AoaS is the best choice since the tiled version is the fastest one among three versions. The source code of all implementations are publicly available.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果