Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product

S Lee, S Kang, J Lee, H Kim, E Lee… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …

25.4 a 20nm 6gb function-in-memory dram, based on hbm2 with a 1.2 tflops programmable computing unit using bank-level parallelism, for machine learning …

YC Kwon, SH Lee, J Lee, SH Kwon… - … Solid-State Circuits …, 2021 - ieeexplore.ieee.org
In recent years, artificial intelligence (AI) technology has proliferated rapidly and widely into
application areas such as speech recognition, health care, and autonomous driving. To …

IntAct: A 96-core processor with six chiplets 3D-stacked on an active interposer with distributed interconnects and integrated power management

P Vivet, E Guthmuller, Y Thonnart… - IEEE Journal of Solid …, 2020 - ieeexplore.ieee.org
In the context of high-performance computing, the integration of more computing capabilities
with generic cores or dedicated accelerators for artificial intelligence (AI) application is …

A survey on deep learning hardware accelerators for heterogeneous hpc platforms

C Silvano, D Ielmini, F Ferrandi, L Fiorin… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …

iPIM: Programmable in-memory image processing accelerator using near-bank architecture

P Gu, X Xie, Y Ding, G Chen, W Zhang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Image processing is becoming an increasingly important domain for many applications on
workstations and the datacenter that require accelerators for high performance and energy …

Neurostream: Scalable and energy efficient deep learning with smart memory cubes

E Azarkhish, D Rossi, I Loi… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
High-performance computing systems are moving towards 2.5 D and 3D memory
hierarchies, based on High Bandwidth Memory (HBM) and Hybrid Memory Cube (HMC) to …

Review of bumpless build cube (BBCube) using wafer-on-wafer (WOW) and chip-on-wafer (COW) for tera-scale three-dimensional integration (3DI)

T Ohba, K Sakui, S Sugatani, H Ryoson, N Chujo - Electronics, 2022 - mdpi.com
Bumpless Build Cube (BBCube) using Wafer-on-Wafer (WOW) and Chip-on-Wafer (COW)
for Tera-Scale Three-Dimensional Integration (3DI) is discussed. Bumpless interconnects …

A 192-Gb 12-high 896-GB/s HBM3 DRAM with a TSV auto-calibration scheme and machine-learning-based layout optimization

MJ Park, J Lee, K Cho, J Park, J Moon… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
This article introduces a 192-Gb 896-GB/s 12-high stacked third-generation high-bandwidth
memory (HBM3 DRAM) with low power consumption and high-reliability traits. New design …

A classification of memory-centric computing

HAD Nguyen, J Yu, MA Lebdeh, M Taouil… - ACM Journal on …, 2020 - dl.acm.org
Technological and architectural improvements have been constantly required to sustain the
demand of faster and cheaper computers. However, CMOS down-scaling is suffering from …

A survey on memory-centric computer architectures

A Gebregiorgis, HA Du Nguyen, J Yu… - ACM Journal on …, 2022 - dl.acm.org
Faster and cheaper computers have been constantly demanding technological and
architectural improvements. However, current technology is suffering from three technology …