Task superscalar: An out-of-order task pipeline

I Tanasic, I Gelado, J Cabezas, A Ramirez… - ACM SIGARCH …, 2014 - dl.acm.org

GPUs are being increasingly adopted as compute accelerators in many domains, spanning
environments from mobile systems to cloud computing. These systems are usually running …

被引用次数：257 相关文章所有 15 个版本

SPIDER-based out-of-order execution scheme for Ht-MPSOC

R Karthick, M Sundararajan - International Journal of …, 2021 - inderscienceonline.com

In this work, the influence of the dynamic task scheduling process is examined. Out-of-order
(OoO) implementation processes exhibit remarkable guarantee for task-level parallelism in …

被引用次数：74 相关文章所有 3 个版本

[PDF] wisc.edu

Adaptive, efficient, parallel execution of parallel programs

S Sridharan, G Gupta, GS Sohi - Proceedings of the 35th ACM SIGPLAN …, 2014 - dl.acm.org

Future multicore processors will be heterogeneous, be increasingly less reliable, and
operate in dynamically changing operating conditions. Such environments will result in a …

被引用次数：93 相关文章所有 12 个版本

A ubiquitous machine learning accelerator with automatic parallelization on FPGA

C Wang, L Gong, X Li, X Zhou - IEEE Transactions on Parallel …, 2020 - ieeexplore.ieee.org

Machine learning has been widely applied in various emerging data-intensive applications,
and has to be optimized and accelerated by powerful engines to process very large scale …

被引用次数：34 相关文章所有 3 个版本

[PDF] archive.org

Hybrid dataflow/von-Neumann architectures

F Yazdanpanah, C Alvarez-Martinez… - … on Parallel and …, 2013 - ieeexplore.ieee.org

General purpose hybrid dataflow/von-Neumann architectures are gaining attraction as
effective parallel platforms. Although different implementations differ in the way they merge …

被引用次数：92 相关文章所有 6 个版本

[PDF] utoronto.ca

A scalable architecture for reprioritizing ordered parallelism

G Posluns, Y Zhu, G Zhang, MC Jeffrey - Proceedings of the 49th Annual …, 2022 - dl.acm.org

Many algorithms schedule their work, or tasks, according to a priority order for correctness or
faster convergence. While priority schedulers commonly implement task enqueue and …

被引用次数：10 相关文章所有 6 个版本

[PDF] academia.edu

TERAFLUX: Harnessing dataflow in next generation teradevices

R Giorgi, RM Badia, F Bodin, A Cohen… - Microprocessors and …, 2014 - Elsevier

The improvements in semiconductor technologies are gradually enabling extreme-scale
systems such as teradevices (ie, chips composed by 1000 billion of transistors), most likely …

被引用次数：91 相关文章所有 15 个版本

[PDF] acm.org

Accelerating RTL Simulation with Hardware-Software Co-Design

F Elsabbagh, S Sheikhha, VA Ying… - Proceedings of the 56th …, 2023 - dl.acm.org

Fast simulation of digital circuits is crucial to build modern chips. But RTL (Register-Transfer-
Level) simulators are slow, as they cannot exploit multicores well. Slow simulation lengthens …

被引用次数：2 相关文章所有 5 个版本

[PDF] wisc.edu

Dataflow execution of sequential imperative programs on multicore architectures

G Gupta, GS Sohi - Proceedings of the 44th annual IEEE/ACM …, 2011 - dl.acm.org

As multicore processors become the default, researchers are aggressively looking for
program execution models that make it easier to use the available resources. Multithreaded …

被引用次数：82 相关文章所有 11 个版本

[PDF] upc.edu

MUSA: a multi-level simulation approach for next-generation HPC machines

T Grass, C Allande, A Armejach, A Rico… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org

The complexity of High Performance Computing (HPC) systems is increasing in the number
of components and their heterogeneity. Interactions between software and hardware involve …

被引用次数：47 相关文章所有 8 个版本