Enabling preemptive multiprogramming on GPUs

I Tanasic, I Gelado, J Cabezas, A Ramirez… - ACM SIGARCH …, 2014 - dl.acm.org
GPUs are being increasingly adopted as compute accelerators in many domains, spanning
environments from mobile systems to cloud computing. These systems are usually running …

SPIDER-based out-of-order execution scheme for Ht-MPSOC

R Karthick, M Sundararajan - International Journal of …, 2021 - inderscienceonline.com
In this work, the influence of the dynamic task scheduling process is examined. Out-of-order
(OoO) implementation processes exhibit remarkable guarantee for task-level parallelism in …

Adaptive, efficient, parallel execution of parallel programs

S Sridharan, G Gupta, GS Sohi - Proceedings of the 35th ACM SIGPLAN …, 2014 - dl.acm.org
Future multicore processors will be heterogeneous, be increasingly less reliable, and
operate in dynamically changing operating conditions. Such environments will result in a …

A ubiquitous machine learning accelerator with automatic parallelization on FPGA

C Wang, L Gong, X Li, X Zhou - IEEE Transactions on Parallel …, 2020 - ieeexplore.ieee.org
Machine learning has been widely applied in various emerging data-intensive applications,
and has to be optimized and accelerated by powerful engines to process very large scale …

Hybrid dataflow/von-Neumann architectures

F Yazdanpanah, C Alvarez-Martinez… - … on Parallel and …, 2013 - ieeexplore.ieee.org
General purpose hybrid dataflow/von-Neumann architectures are gaining attraction as
effective parallel platforms. Although different implementations differ in the way they merge …

A scalable architecture for reprioritizing ordered parallelism

G Posluns, Y Zhu, G Zhang, MC Jeffrey - Proceedings of the 49th Annual …, 2022 - dl.acm.org
Many algorithms schedule their work, or tasks, according to a priority order for correctness or
faster convergence. While priority schedulers commonly implement task enqueue and …

TERAFLUX: Harnessing dataflow in next generation teradevices

R Giorgi, RM Badia, F Bodin, A Cohen… - Microprocessors and …, 2014 - Elsevier
The improvements in semiconductor technologies are gradually enabling extreme-scale
systems such as teradevices (ie, chips composed by 1000 billion of transistors), most likely …

Accelerating RTL Simulation with Hardware-Software Co-Design

F Elsabbagh, S Sheikhha, VA Ying… - Proceedings of the 56th …, 2023 - dl.acm.org
Fast simulation of digital circuits is crucial to build modern chips. But RTL (Register-Transfer-
Level) simulators are slow, as they cannot exploit multicores well. Slow simulation lengthens …

Dataflow execution of sequential imperative programs on multicore architectures

G Gupta, GS Sohi - Proceedings of the 44th annual IEEE/ACM …, 2011 - dl.acm.org
As multicore processors become the default, researchers are aggressively looking for
program execution models that make it easier to use the available resources. Multithreaded …

MUSA: a multi-level simulation approach for next-generation HPC machines

T Grass, C Allande, A Armejach, A Rico… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
The complexity of High Performance Computing (HPC) systems is increasing in the number
of components and their heterogeneity. Interactions between software and hardware involve …