When Does Re-initialization Work?

G Sokar, R Agarwal, PS Castro… - … Conference on Machine …, 2023 - proceedings.mlr.press

In this work we identify the dormant neuron phenomenon in deep reinforcement learning,
where an agent's network suffers from an increasing number of inactive neurons, thereby …

被引用次数：50 相关文章所有 6 个版本

[PDF] mlr.press

Random teachers are good teachers

F Sarnthein, G Bachmann… - International …, 2023 - proceedings.mlr.press

In this work, we investigate the implicit regularization induced by teacher-student learning
dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we …

被引用次数：5 相关文章所有 8 个版本

Robust Commonsense Reasoning Against Noisy Labels Using Adaptive Correction

X Yang, C Deng, K Wei, D Tao - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Commonsense reasoning based on knowledge graphs (KGs) is a challenging task that
requires predicting complex questions over the described textual contexts and relevant …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Reset it and forget it: Relearning last-layer weights improves continual and transfer learning

L Frati, N Traft, J Clune, N Cheney - arXiv preprint arXiv:2310.07996, 2023 - arxiv.org

This work identifies a simple pre-training mechanism that leads to representations exhibiting
better continual and transfer learning. This mechanism--the repeated resetting of weights in …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Learn, unlearn and relearn: An online learning paradigm for deep neural networks

VRT Ramkumar, E Arani, B Zonooz - arXiv preprint arXiv:2303.10455, 2023 - arxiv.org

Deep neural networks (DNNs) are often trained on the premise that the complete training
data set is provided ahead of time. However, in real-world scenarios, data often arrive in …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

CRAFT: Contextual Re-Activation of Filters for face recognition Training

A Bhatta, D Mery, H Wu, KW Bowyer - arXiv preprint arXiv:2312.00072, 2023 - arxiv.org

The first layer of a deep CNN backbone applies filters to an image to extract the basic
features available to later layers. During training, some filters may go inactive, mean ing all …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org