Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models

Y Byun, S Moon, B Park, SJ Kwon… - Proceedings of …, 2023 - proceedings.mlsys.org
This paper presents a new algorithm-hardware co-optimization approach that maximizes
memory bandwidth utilization even for the pruned deep neural network (DNN) models …

Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution

E McMilin - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Modern language modeling tasks are often underspecified: for a given token prediction,
many words may satisfy the user's intent of producing natural language at inference time …

A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression

H Lee, J Hong, S Kim, SY Lee… - 2023 60th ACM/IEEE …, 2023 - ieeexplore.ieee.org
Model compression is widely adopted for edge inference of neural networks (NNs) to
minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model …