Libra: Tailoring simd execution using heterogeneous hardware and dynamic configurability

T Nowatzki, V Gangadhar, N Ardalani… - Proceedings of the 44th …, 2017 - dl.acm.org

Demand for low-power data processing hardware continues to rise inexorably. Existing
programmable and" general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as …

被引用次数：216 相关文章所有 20 个版本

[PDF] ieee.org

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org

The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

被引用次数：2 相关文章所有 2 个版本

[PDF] psu.edu

Coordinated static and dynamic cache bypassing for GPUs

X Xie, Y Liang, Y Wang, G Sun… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org

The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …

被引用次数：166 相关文章所有 10 个版本

[PDF] ucla.edu

Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign

T Nowatzki, N Ardalani, K Sankaralingam… - Proceedings of the 27th …, 2018 - dl.acm.org

Recent programmable accelerators are faster and more energy efficient than general
purpose processors, but expose complex hardware/software abstractions for compilers. A …

被引用次数：56 相关文章所有 4 个版本

[PDF] wisc.edu

Exploring the potential of heterogeneous von neumann/dataflow execution models

T Nowatzki, V Gangadhar… - Proceedings of the 42nd …, 2015 - dl.acm.org

General purpose processors (GPPs), from small inorder designs to many-issue out-of-order,
incur large power overheads which must be addressed for future technology generations …

被引用次数：90 相关文章所有 12 个版本

[PDF] acm.org

Stream-based memory access specialization for general purpose processors

Z Wang, T Nowatzki - Proceedings of the 46th International Symposium …, 2019 - dl.acm.org

Because of severe limitations in technology scaling, architects have innovated in
specializing general purpose processors for computation primitives (eg vector instructions …

被引用次数：51 相关文章所有 4 个版本

[PDF] acm.org

μC-States: Fine-grained GPU datapath power management

O Kayiran, A Jog, A Pattnaik… - Proceedings of the …, 2016 - dl.acm.org

To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing
core count, architects are recently adopting a scale-up approach: the peak throughput and …

被引用次数：51 相关文章所有 22 个版本

[PDF] iastate.edu

Software transparent dynamic binary translation for coarse-grain reconfigurable architectures

MA Watkins, T Nowatzki, A Carno - 2016 IEEE International …, 2016 - ieeexplore.ieee.org

The end of Dennard Scaling has forced architects to focus on designing for execution
efficiency. Course-grained reconfigurable architectures (CGRAs) are a class of architectures …

被引用次数：34 相关文章所有 3 个版本

[PDF] nsf.gov

Big. VLITTLE: On-demand data-parallel acceleration for mobile systems on chip

T Ta, K Al-Hawaj, N Cebry, Y Ou, E Hall… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and
high-efficiency solution to executing task-parallel workloads in mobile systems on chip …

被引用次数：4 相关文章所有 6 个版本

[PDF] wisc.edu

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

V Govindaraju, T Nowatzki… - Proceedings of the …, 2013 - ieeexplore.ieee.org

Modern microprocessors exploit data level parallelism through in-core data-parallel
accelerators in the form of short vector ISA extentions such as SSE/AVX and NEON …

被引用次数：35 相关文章所有 7 个版本