Sparsity-Aware Memory Interface Architecture using Stacked XORNet Compression for Accelerating Pruned-DNN Models
This paper presents a new algorithm-hardware co-optimization approach that maximizes
memory bandwidth utilization even for the pruned deep neural network (DNN) models …
memory bandwidth utilization even for the pruned deep neural network (DNN) models …
Underspecification in Language Modeling Tasks: A Causality-Informed Study of Gendered Pronoun Resolution
E McMilin - Proceedings of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Modern language modeling tasks are often underspecified: for a given token prediction,
many words may satisfy the user's intent of producing natural language at inference time …
many words may satisfy the user's intent of producing natural language at inference time …
A Memory-Efficient Edge Inference Accelerator with XOR-based Model Compression
Model compression is widely adopted for edge inference of neural networks (NNs) to
minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model …
minimize both costly DRAM accesses and memory footprints. Recently, XOR-based model …