Mr. Wolf: An energy-precision scalable parallel ultra low power SoC for IoT edge processing

A Pullini, D Rossi, I Loi, G Tagliavini… - IEEE Journal of Solid …, 2019 - ieeexplore.ieee.org
This paper presents Mr. Wolf, a parallel ultra-low power (PULP) system on chip (SoC)
featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class …

A 4-transistor nMOS-only logic-compatible gain-cell embedded DRAM with over 1.6-ms retention time at 700 mV in 28-nm FD-SOI

R Giterman, A Fish, A Burg… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Gain-cell embedded DRAM (GC-eDRAM) is a possible alternative to traditional static
random access memories (SRAM). While GC-eDRAM provides high-density, low-leakage …

A dynamic timing enhanced DNN accelerator with compute-adaptive elastic clock chain technique

T Jia, Y Ju, J Gu - IEEE Journal of Solid-State Circuits, 2020 - ieeexplore.ieee.org
This article presents a deep neural network (DNN) accelerator using an adaptive clocking
technique (ie, elastic clock chain) to exploit the dynamic timing margin for the 2-D …

An instruction-driven adaptive clock management through dynamic phase scaling and compiler assistance for a low power microprocessor

T Jia, R Joseph, J Gu - IEEE Journal of Solid-State Circuits, 2019 - ieeexplore.ieee.org
This paper presents an instruction-driven adaptive clock management scheme using a
dynamic phase scaling (DPS) operation and compiler-assisted cross-layer design …

19.4 an adaptive clock management scheme exploiting instruction-based dynamic timing slack for a general-purpose graphics processor unit with deep pipeline and …

T Jia, R Joseph, J Gu - 2019 IEEE International Solid-State …, 2019 - ieeexplore.ieee.org
Cycle-by-cycle dynamic timing slack (DTS), which represents extra timing margin from the
critical-path timing slack reported by the static timing analysis (STA), has been observed at …

Silicon evaluation of multimode dual mode logic for PVT-aware datapaths

I Stanger, N Shavit, R Taco… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This brief presents the unique capabilities of the multimode Dual Mode Logic (DML) design
technique to define run-time adaptive datapaths to overcome process and environmental (ie …

LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks

YC Lo, YC Tsai, RS Liu - IEEE Computer Architecture Letters, 2023 - ieeexplore.ieee.org
Computing latency is an important system metric for Deep Neural Networks (DNNs)
accelerators. To reduce latency, this work proposes LV, a latency-versatile floating-point …

Time squeezing for tiny devices

Y Fan, S Campanoni, R Joseph - … of the 46th International Symposium on …, 2019 - dl.acm.org
Dynamic timing slack has emerged as a compelling opportunity for eliminating inefficiency in
ultra-low power embedded systems. This slack arises when all the signals have propagated …

An instruction driven adaptive clock phase scaling with timing encoding and online instruction calibration for a low power microprocessor

T Jia, R Joseph, J Gu - … 2018-IEEE 44th European Solid State …, 2018 - ieeexplore.ieee.org
This paper presents an adaptive clock phase scaling operation based on the dynamic
instruction timing variation for a low power microprocessor. Through the use of instruction …

DBFS: Dynamic Bitwidth-Frequency Scaling for Efficient Software-defined SIMD

P Yu, F Ponzina, A Levisse, D Biswas… - 2024 IEEE Computer …, 2024 - ieeexplore.ieee.org
Machine learning algorithms such as Convolutional Neural Networks (CNNs) are
characterized by high robustness towards quantization, supporting small-bitwidth fixed-point …