Stream-dataflow acceleration
Demand for low-power data processing hardware continues to rise inexorably. Existing
programmable and" general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as …
programmable and" general purpose" solutions (eg. SIMD, GPGPUs) are insufficient, as …
MIMD Programs Execution Support on SIMD Machines: A Holistic Survey
D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …
Coordinated static and dynamic cache bypassing for GPUs
The massive parallel architecture enables graphics processing units (GPUs) to boost
performance for a wide range of applications. Initially, GPUs only employ scratchpad …
performance for a wide range of applications. Initially, GPUs only employ scratchpad …
Hybrid optimization/heuristic instruction scheduling for programmable accelerator codesign
Recent programmable accelerators are faster and more energy efficient than general
purpose processors, but expose complex hardware/software abstractions for compilers. A …
purpose processors, but expose complex hardware/software abstractions for compilers. A …
Exploring the potential of heterogeneous von neumann/dataflow execution models
T Nowatzki, V Gangadhar… - Proceedings of the 42nd …, 2015 - dl.acm.org
General purpose processors (GPPs), from small inorder designs to many-issue out-of-order,
incur large power overheads which must be addressed for future technology generations …
incur large power overheads which must be addressed for future technology generations …
Stream-based memory access specialization for general purpose processors
Z Wang, T Nowatzki - Proceedings of the 46th International Symposium …, 2019 - dl.acm.org
Because of severe limitations in technology scaling, architects have innovated in
specializing general purpose processors for computation primitives (eg vector instructions …
specializing general purpose processors for computation primitives (eg vector instructions …
μC-States: Fine-grained GPU datapath power management
To improve the performance of Graphics Processing Units (GPUs) beyond simply increasing
core count, architects are recently adopting a scale-up approach: the peak throughput and …
core count, architects are recently adopting a scale-up approach: the peak throughput and …
Software transparent dynamic binary translation for coarse-grain reconfigurable architectures
MA Watkins, T Nowatzki, A Carno - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
The end of Dennard Scaling has forced architects to focus on designing for execution
efficiency. Course-grained reconfigurable architectures (CGRAs) are a class of architectures …
efficiency. Course-grained reconfigurable architectures (CGRAs) are a class of architectures …
Big. VLITTLE: On-demand data-parallel acceleration for mobile systems on chip
Single-ISA heterogeneous multi-core architectures offer a compelling high-performance and
high-efficiency solution to executing task-parallel workloads in mobile systems on chip …
high-efficiency solution to executing task-parallel workloads in mobile systems on chip …
Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG
V Govindaraju, T Nowatzki… - Proceedings of the …, 2013 - ieeexplore.ieee.org
Modern microprocessors exploit data level parallelism through in-core data-parallel
accelerators in the form of short vector ISA extentions such as SSE/AVX and NEON …
accelerators in the form of short vector ISA extentions such as SSE/AVX and NEON …