Generalized entropy regularization or: There's nothing special about label smoothing

B Plank - arXiv preprint arXiv:2211.02570, 2022 - arxiv.org

Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …

被引用次数：144 相关文章所有 3 个版本

[PDF] neurips.cc

Quark: Controllable text generation with reinforced unlearning

X Lu, S Welleck, J Hessel, L Jiang… - Advances in neural …, 2022 - proceedings.neurips.cc

Large-scale language models often learn behaviors that are misaligned with user
expectations. Generated text may contain offensive or toxic language, contain significant …

被引用次数：140 相关文章所有 5 个版本

[PDF] neurips.cc

Rethinking calibration of deep neural networks: Do not be afraid of overconfidence

DB Wang, L Feng, ML Zhang - Advances in Neural …, 2021 - proceedings.neurips.cc

Capturing accurate uncertainty quantification of the prediction from deep neural networks is
important in many real-world decision-making applications. A reliable predictor is expected …

被引用次数：105 相关文章所有 8 个版本

[PDF] mit.edu

Locally typical sampling

C Meister, T Pimentel, G Wiher… - Transactions of the …, 2023 - direct.mit.edu

Today's probabilistic language generators fall short when it comes to producing coherent
and fluent text despite the fact that the underlying models perform well under standard …

被引用次数：88 相关文章所有 12 个版本

[PDF] neurips.cc

Why do better loss functions lead to less transferable features?

S Kornblith, T Chen, H Lee… - Advances in Neural …, 2021 - proceedings.neurips.cc

Previous work has proposed many new loss functions and regularizers that improve test
accuracy on image classification tasks. However, it is not clear whether these loss functions …

被引用次数：93 相关文章所有 8 个版本

[PDF] mlr.press

When does data augmentation help with membership inference attacks?

Y Kaya, T Dumitras - International conference on machine …, 2021 - proceedings.mlr.press

Deep learning models often raise privacy concerns as they leak information about their
training data. This leakage enables membership inference attacks (MIA) that can identify …

被引用次数：62 相关文章所有 4 个版本

[PDF] mlr.press

Why do nearest neighbor language models work?

FF Xu, U Alon, G Neubig - International Conference on …, 2023 - proceedings.mlr.press

Abstract Language models (LMs) compute the probability of a text by sequentially computing
a representation of an already-seen context and using this representation to predict the next …

被引用次数：17 相关文章所有 6 个版本

[PDF] arxiv.org

Fusemoe: Mixture-of-experts transformers for fleximodal fusion

X Han, H Nguyen, C Harris, N Ho, S Saria - arXiv preprint arXiv …, 2024 - arxiv.org

As machine learning models in critical fields increasingly grapple with multimodal data, they
face the dual challenges of handling a wide array of modalities, often incomplete due to …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Tailoring self-rationalizers with multi-reward distillation

S Ramnath, B Joshi, S Hallinan, X Lu, LH Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LMs) are capable of generating free-text rationales to aid question
answering. However, prior work 1) suggests that useful self-rationalization is emergent only …

被引用次数：8 相关文章所有 4 个版本

Sparsing and smoothing for the seq2seq models

S Zhao, Z Liang, J Wen, J Chen - IEEE Transactions on Artificial …, 2022 - ieeexplore.ieee.org

Current neural language models are trained to minimize cross-entropy and use softmax to
compute the locally normalized probabilities over the target. While this setup provides solid …

被引用次数：17 相关文章所有 2 个版本