Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …
bandwidth. However, under stringent physical constraints of chip packages and system …
25.4 a 20nm 6gb function-in-memory dram, based on hbm2 with a 1.2 tflops programmable computing unit using bank-level parallelism, for machine learning …
YC Kwon, SH Lee, J Lee, SH Kwon… - … Solid-State Circuits …, 2021 - ieeexplore.ieee.org
In recent years, artificial intelligence (AI) technology has proliferated rapidly and widely into
application areas such as speech recognition, health care, and autonomous driving. To …
application areas such as speech recognition, health care, and autonomous driving. To …
IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management
P Vivet, E Guthmuller, Y Thonnart… - IEEE Journal of Solid …, 2020 - ieeexplore.ieee.org
In the context of high-performance computing, the integration of more computing capabilities
with generic cores or dedicated accelerators for artificial intelligence (AI) application is …
with generic cores or dedicated accelerators for artificial intelligence (AI) application is …
A survey on deep learning hardware accelerators for heterogeneous hpc platforms
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …
solution for several classes of high-performance computing (HPC) applications such as …
iPIM: Programmable in-memory image processing accelerator using near-bank architecture
Image processing is becoming an increasingly important domain for many applications on
workstations and the datacenter that require accelerators for high performance and energy …
workstations and the datacenter that require accelerators for high performance and energy …
Neurostream: Scalable and energy efficient deep learning with smart memory cubes
High-performance computing systems are moving towards 2.5 D and 3D memory
hierarchies, based on High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) to …
hierarchies, based on High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) to …
Review of bumpless build cube (BBCube) using wafer-on-wafer (WOW) and chip-on-wafer (COW) for tera-scale three-dimensional integration (3DI)
T Ohba, K Sakui, S Sugatani, H Ryoson, N Chujo - Electronics, 2022 - mdpi.com
Bumpless Build Cube (BBCube) using Wafer-on-Wafer (WOW) and Chip-on-Wafer (COW)
for Tera-Scale Three-Dimensional Integration (3DI) is discussed. Bumpless interconnects …
for Tera-Scale Three-Dimensional Integration (3DI) is discussed. Bumpless interconnects …
A 192-Gb 12-high 896-GB/s HBM3 DRAM with a TSV auto-calibration scheme and machine-learning-based layout optimization
This article introduces a 192-Gb 896-GB/s 12-high stacked third-generation high-bandwidth
memory (HBM3 DRAM) with low power consumption and high-reliability traits. New design …
memory (HBM3 DRAM) with low power consumption and high-reliability traits. New design …
A classification of memory-centric computing
HAD Nguyen, J Yu, MA Lebdeh, M Taouil… - ACM Journal on …, 2020 - dl.acm.org
Technological and architectural improvements have been constantly required to sustain the
demand of faster and cheaper computers. However, CMOS down-scaling is suffering from …
demand of faster and cheaper computers. However, CMOS down-scaling is suffering from …
A survey on memory-centric computer architectures
A Gebregiorgis, HA Du Nguyen, J Yu… - ACM Journal on …, 2022 - dl.acm.org
Faster and cheaper computers have been constantly demanding technological and
architectural improvements. However, current technology is suffering from three technology …
architectural improvements. However, current technology is suffering from three technology …