Stream-dataflow acceleration

T Nowatzki, V Gangadhar, N Ardalani… - Proceedings of the 44th …, 2017 - dl.acm.org
Demand for low-power data processing hardware continues to rise inexorably. Existing
programmable and" general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as …

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

Coordinated static and dynamic cache bypassing for GPUs

X Xie, Y Liang, Y Wang, G Sun… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …

Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign

T Nowatzki, N Ardalani, K Sankaralingam… - Proceedings of the 27th …, 2018 - dl.acm.org
Recent programmable accelerators are faster and more energy efficient than general
purpose processors, but expose complex hardware/software abstractions for compilers. A …

Exploring the potential of heterogeneous von neumann/dataflow execution models

T Nowatzki, V Gangadhar… - Proceedings of the 42nd …, 2015 - dl.acm.org
General purpose processors (GPPs), from small inorder designs to many-issue out-of-order,
incur large power overheads which must be addressed for future technology generations …

Stream-based memory access specialization for general purpose processors

Z Wang, T Nowatzki - Proceedings of the 46th International Symposium …, 2019 - dl.acm.org
Because of severe limitations in technology scaling, architects have innovated in
specializing general purpose processors for computation primitives (eg vector instructions …

μC-States: Fine-grained GPU datapath power management

O Kayiran, A Jog, A Pattnaik… - Proceedings of the …, 2016 - dl.acm.org
To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing
core count, architects are recently adopting a scale-up approach: the peak throughput and …

Software transparent dynamic binary translation for coarse-grain reconfigurable architectures

MA Watkins, T Nowatzki, A Carno - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
The end of Dennard Scaling has forced architects to focus on designing for execution
efficiency. Course-grained reconfigurable architectures (CGRAs) are a class of architectures …

Big. VLITTLE: On-demand data-parallel acceleration for mobile systems on chip

T Ta, K Al-Hawaj, N Cebry, Y Ou, E Hall… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and
high-efficiency solution to executing task-parallel workloads in mobile systems on chip …

Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG

V Govindaraju, T Nowatzki… - Proceedings of the …, 2013 - ieeexplore.ieee.org
Modern microprocessors exploit data level parallelism through in-core data-parallel
accelerators in the form of short vector ISA extentions such as SSE/AVX and NEON …