A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads

S Che, JW Sheaffer, M Boyer… - IEEE International …, 2010 - ieeexplore.ieee.org
The recently released Rodinia benchmark suite enables users to evaluate heterogeneous
systems including both accelerators, such as GPUs, and multicore CPUs. As Rodinia sees …

Liszt: a domain specific language for building portable mesh-based PDE solvers

Z DeVito, N Joubert, F Palacios, S Oakley… - Proceedings of 2011 …, 2011 - dl.acm.org
Heterogeneous computers with processors and accelerators are becoming widespread in
scientific computing. However, it is difficult to program hybrid architectures and there is no …

[PDF][PDF] A fast double precision CFD code using CUDA

JM Cohen, MJ Molemaker - … of the Physical Society of Japan, 2009 - people.atmos.ucla.edu
We describe a second order double precision finite volume Boussinesq code designed to
run on the CUDA architecture. We perform detailed validation of the code on a variety of …

Adaptive page migration for irregular data-intensive applications under gpu memory oversubscription

D Ganguly, Z Zhang, J Yang… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Unified Memory in heterogeneous systems serves a wide range of applications. However,
limited capacity of the device memory becomes a first order performance bottleneck for data …

SPEC ACCEL: A standard application suite for measuring hardware accelerator performance

G Juckeland, W Brantley, S Chandrasekaran… - … and Simulation of High …, 2014 - Springer
Hybrid nodes with hardware accelerators are becoming very common in systems today.
Users often find it difficult to characterize and understand the performance advantage of …

2012 Freeman scholar lecture: computational fluid dynamics on graphics processing units

SP Vanka - Journal of fluids engineering, 2013 - asmedigitalcollection.asme.org
This paper discusses the various issues of using graphics processing units (GPU) for
computing fluid flows. GPUs, used primarily for processing graphics functions in a computer …

A simple yet effective balanced edge partition model for parallel computing

L Li, R Geda, AB Hayes, Y Chen, P Chaudhari… - Proceedings of the …, 2017 - dl.acm.org
Graph edge partition models have recently become an appealing alternative to graph vertex
partition models for distributed computing due to their flexibility in balancing loads and their …

[图书][B] High-order energy stable flux reconstruction schemes for fluid flow simulations on unstructured grids

P Castonguay - 2012 - search.proquest.com
Nowadays, most commercial CFD software relies exclusively on low-order methods
(methods for which the spatial order of accuracy is at most two) for the simulation of flows …

Under the hood of sycl–an initial performance analysis with an unstructured-mesh cfd application

IZ Reguly, AMB Owenson, A Powell, SA Jarvis… - … Conference, ISC High …, 2021 - Springer
As the computing hardware landscape gets more diverse, and the complexity of hardware
grows, the need for a general purpose parallel programming model capable of developing …

Acceleration of a finite-difference WENO scheme for large-scale simulations on many-core architectures

A Antoniou, K Karantasis, E Polychronopoulos… - 48th AIAA Aerospace …, 2010 - arc.aiaa.org
Current trends on high performance computing are moving towards the deployment of
several cores on the same chip of modern processors in order to achieve substantial …