Mr. Wolf: An energy-precision scalable parallel ultra low power SoC for IoT edge processing
This paper presents Mr. Wolf, a parallel ultra-low power (PULP) system on chip (SoC)
featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class …
featuring a hierarchical architecture with a small (12 kgates) microcontroller (MCU) class …
A 4-transistor nMOS-only logic-compatible gain-cell embedded DRAM with over 1.6-ms retention time at 700 mV in 28-nm FD-SOI
Gain-cell embedded DRAM (GC-eDRAM) is a possible alternative to traditional static
random access memories (SRAM). While GC-eDRAM provides high-density, low-leakage …
random access memories (SRAM). While GC-eDRAM provides high-density, low-leakage …
A dynamic timing enhanced DNN accelerator with compute-adaptive elastic clock chain technique
This article presents a deep neural network (DNN) accelerator using an adaptive clocking
technique (ie, elastic clock chain) to exploit the dynamic timing margin for the 2-D …
technique (ie, elastic clock chain) to exploit the dynamic timing margin for the 2-D …
An instruction-driven adaptive clock management through dynamic phase scaling and compiler assistance for a low power microprocessor
This paper presents an instruction-driven adaptive clock management scheme using a
dynamic phase scaling (DPS) operation and compiler-assisted cross-layer design …
dynamic phase scaling (DPS) operation and compiler-assisted cross-layer design …
19.4 an adaptive clock management scheme exploiting instruction-based dynamic timing slack for a general-purpose graphics processor unit with deep pipeline and …
Cycle-by-cycle dynamic timing slack (DTS), which represents extra timing margin from the
critical-path timing slack reported by the static timing analysis (STA), has been observed at …
critical-path timing slack reported by the static timing analysis (STA), has been observed at …
Silicon evaluation of multimode dual mode logic for PVT-aware datapaths
This brief presents the unique capabilities of the multimode Dual Mode Logic (DML) design
technique to define run-time adaptive datapaths to overcome process and environmental (ie …
technique to define run-time adaptive datapaths to overcome process and environmental (ie …
LV: Latency-Versatile Floating-Point Engine for High-Performance Deep Neural Networks
Computing latency is an important system metric for Deep Neural Networks (DNNs)
accelerators. To reduce latency, this work proposes LV, a latency-versatile floating-point …
accelerators. To reduce latency, this work proposes LV, a latency-versatile floating-point …
Time squeezing for tiny devices
Dynamic timing slack has emerged as a compelling opportunity for eliminating inefficiency in
ultra-low power embedded systems. This slack arises when all the signals have propagated …
ultra-low power embedded systems. This slack arises when all the signals have propagated …
An instruction driven adaptive clock phase scaling with timing encoding and online instruction calibration for a low power microprocessor
This paper presents an adaptive clock phase scaling operation based on the dynamic
instruction timing variation for a low power microprocessor. Through the use of instruction …
instruction timing variation for a low power microprocessor. Through the use of instruction …
DBFS: Dynamic Bitwidth-Frequency Scaling for Efficient Software-defined SIMD
Machine learning algorithms such as Convolutional Neural Networks (CNNs) are
characterized by high robustness towards quantization, supporting small-bitwidth fixed-point …
characterized by high robustness towards quantization, supporting small-bitwidth fixed-point …