GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values

在引用文章中搜索

[PDF] arxiv.org

Is 3D Convolution with 5D Tensors Really Necessary for Video Analysis?

H Hajimolahoseini, W Ahmed, A Wen, Y Liu - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we present a comprehensive study and propose several novel techniques for
implementing 3D convolutional blocks using 2D and/or 1D convolutions with only 4D and/or …

SkipViT: Speeding Up Vision Transformers with a Token-Level Skip Connection

F Ataiefard, W Ahmed, H Hajimolahoseini… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision transformers are known to be more computationally and data-intensive than CNN
models. These transformer models such as ViT, require all the input image tokens to learn …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Beyond Uniform Query Distribution: Key-Driven Grouped Query Attention

Z Khan, M Khaquan, O Tafveez, AA Raza - arXiv preprint arXiv …, 2024 - arxiv.org

The Transformer architecture has revolutionized deep learning through its Self-Attention
mechanism, which effectively captures contextual information. However, the memory …