The dormant neuron phenomenon in deep reinforcement learning
In this work we identify the dormant neuron phenomenon in deep reinforcement learning,
where an agent's network suffers from an increasing number of inactive neurons, thereby …
where an agent's network suffers from an increasing number of inactive neurons, thereby …
Random teachers are good teachers
F Sarnthein, G Bachmann… - International …, 2023 - proceedings.mlr.press
In this work, we investigate the implicit regularization induced by teacher-student learning
dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we …
dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we …
Robust Commonsense Reasoning Against Noisy Labels Using Adaptive Correction
Commonsense reasoning based on knowledge graphs (KGs) is a challenging task that
requires predicting complex questions over the described textual contexts and relevant …
requires predicting complex questions over the described textual contexts and relevant …
Reset it and forget it: Relearning last-layer weights improves continual and transfer learning
This work identifies a simple pre-training mechanism that leads to representations exhibiting
better continual and transfer learning. This mechanism--the repeated resetting of weights in …
better continual and transfer learning. This mechanism--the repeated resetting of weights in …
Learn, unlearn and relearn: An online learning paradigm for deep neural networks
Deep neural networks (DNNs) are often trained on the premise that the complete training
data set is provided ahead of time. However, in real-world scenarios, data often arrive in …
data set is provided ahead of time. However, in real-world scenarios, data often arrive in …
CRAFT: Contextual Re-Activation of Filters for face recognition Training
The first layer of a deep CNN backbone applies filters to an image to extract the basic
features available to later layers. During training, some filters may go inactive, mean ing all …
features available to later layers. During training, some filters may go inactive, mean ing all …
Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
This study investigates the loss of generalization ability in neural networks, revisiting warm-
starting experiments from Ash & Adams. Our empirical analysis reveals that common …
starting experiments from Ash & Adams. Our empirical analysis reveals that common …
Diagnosing and Re-learning for Balanced Multimodal Learning
To overcome the imbalanced multimodal learning problem, where models prefer the training
of specific modalities, existing methods propose to control the training of uni-modal …
of specific modalities, existing methods propose to control the training of uni-modal …
Shrink-Perturb Improves Architecture Mixing During Population Based Training for Neural Architecture Search.
In this work, we show that simultaneously training and mixing neural networks is a promising
way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing …
way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing …
Novelty Not Found: Adaptive Fuzzer Restarts to Improve Input Space Coverage (Registered Report)
N Schiller, X Xu, L Bernhard, N Bars… - Proceedings of the 2nd …, 2023 - dl.acm.org
Feedback-driven greybox fuzzing is one of the cornerstones of modern bug detection
techniques. Its flexibility, automated nature, and effectiveness render it an indispensable tool …
techniques. Its flexibility, automated nature, and effectiveness render it an indispensable tool …