Slipstream execution mode for cmp-based multiprocessors

A case for core-assisted bottleneck acceleration in GPUs: enabling flexible data compression with assist warps

N Vijaykumar, G Pekhimenko, A Jog… - ACM SIGARCH …, 2015 - dl.acm.org

Modern Graphics Processing Units (GPUs) are well provisioned to support the concurrent
execution of thousands of threads. Unfortunately, different bottlenecks during execution and …

被引用次数：131 相关文章所有 6 个版本

[PDF] academia.edu

Inter-core prefetching for multicore processors using migrating helper threads

M Kamruzzaman, S Swanson, DM Tullsen - Proceedings of the sixteenth …, 2011 - dl.acm.org

Multicore processors have become ubiquitous in today's systems, but exploiting the
parallelism they offer remains difficult, especially for legacy application and applications with …

被引用次数：126 相关文章所有 15 个版本

[PDF] arxiv.org

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …

被引用次数：12 相关文章所有 6 个版本

[PDF] diva-portal.org

Towards more efficient execution: A decoupled access-execute approach

K Koukos, D Black-Schaffer, V Spiliopoulos… - Proceedings of the 27th …, 2013 - dl.acm.org

The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting
the energy savings of this technique. This paper evaluates how much we can increase the …

被引用次数：54 相关文章所有 13 个版本

Highly concurrent latency-tolerant register files for GPUs

M Sadrosadati, A Mirhosseini, A Hajiabadi… - ACM Transactions on …, 2021 - dl.acm.org

Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

被引用次数：7 相关文章所有 2 个版本

[PDF] psu.edu

RLoad: Reputation-based load-balancing network selection strategy for heterogeneous wireless environments

T Bi, R Trestian, GM Muntean - 2013 21st IEEE International …, 2013 - ieeexplore.ieee.org

In the current telecommunication environment, network operators are trying to cope with a
significant increase in data traffic by adopting different solutions to expand their network …

被引用次数：22 相关文章所有 11 个版本

Accelerating sequential applications on CMPs using core spilling

J Cong, G Han, A Jagannathan… - … on Parallel and …, 2007 - ieeexplore.ieee.org

Chip multiprocessors (CMPs) provide a scalable means of exploiting thread-level
parallelism for multitasking or multithreaded applications. However, single-threaded …

被引用次数：26 相关文章所有 5 个版本

[PDF] ul.pt

Architectural support for thread communications in multi-core processors

S Varoglu, S Jenks - Parallel Computing, 2011 - Elsevier

In the ongoing quest for greater computational power, efficiently exploiting parallelism is of
paramount importance. Architectural trends have shifted from improving single-threaded …

被引用次数：17 相关文章所有 4 个版本

[PDF] acm.org

Future execution: A prefetching mechanism that uses multiple cores to speed up single threads

I Ganusov, M Burtscher - ACM Transactions on Architecture and Code …, 2006 - dl.acm.org

This paper describes future execution (FE), a simple hardware-only technique to accelerate
individual program threads running on multicore microprocessors. Our approach uses …

被引用次数：22 相关文章所有 11 个版本

The case for domain-specialized branch predictors for graph-processing

A Samara, J Tuck - IEEE Computer Architecture Letters, 2020 - ieeexplore.ieee.org

Branch prediction is believed by many to be a solved problem, with state-of-the-art
predictors achieving near-perfect prediction for many programs. In this article, we conduct a …

被引用次数：5 相关文章所有 3 个版本