Mitigating over-smoothing in transformers via regularized nonlocal functionals

文章

学术资源搜索

获得 3 条结果（用时0.04秒）

我的图书馆

Mitigating over-smoothing in transformers via regularized nonlocal functionals

在引用文章中搜索

[PDF] arxiv.org

Elliptical attention

SK Nielsen, LU Abdullaev, RSY Teo… - arXiv preprint arXiv …, 2024 - arxiv.org

Pairwise dot-product self-attention is key to the success of transformers that achieve state-of-
the-art performance across a variety of applications in language and vision. This dot-product …

被引用次数：1 相关文章

[PDF] arxiv.org

Value Residual Learning For Alleviating Attention Concentration In Transformers

Z Zhou, T Wu, Z Jiang, Z Lan - arXiv preprint arXiv:2410.17897, 2024 - arxiv.org

Transformers can capture long-range dependencies using self-attention, allowing tokens to
attend to all others directly. However, stacking multiple attention layers leads to attention …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Transformer-based Graph Neural Networks for Battery Range Prediction in AIoT Battery-Swap Services

Z Li, Y Liu, C Zhou, X Liu, X Pan… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

The concept of the sharing economy has gained broad recognition, and within this context,
Sharing E-Bike Battery (SEB) have emerged as a focal point of societal interest. Despite the …