NoiER: an approach for training more reliable fine-tuned downstream task models
M Jang, T Lukasiewicz - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022•ieeexplore.ieee.org
The recent development in pretrained language models that are trained in a self-supervised
fashion, such as BERT, is driving rapid progress in natural language processing. However,
their brilliant performance is based on leveraging syntactic artefacts of the training data
rather than fully understanding the intrinsic meaning of language. The excessive exploitation
of spurious artefacts is a problematic issue: the distribution collapse problem, which is the
phenomenon that the model fine-tuned on downstream tasks is unable to distinguish out-of …
fashion, such as BERT, is driving rapid progress in natural language processing. However,
their brilliant performance is based on leveraging syntactic artefacts of the training data
rather than fully understanding the intrinsic meaning of language. The excessive exploitation
of spurious artefacts is a problematic issue: the distribution collapse problem, which is the
phenomenon that the model fine-tuned on downstream tasks is unable to distinguish out-of …
The recent development in pretrained language models that are trained in a self-supervised fashion, such as BERT, is driving rapid progress in natural language processing. However, their brilliant performance is based on leveraging syntactic artefacts of the training data rather than fully understanding the intrinsic meaning of language. The excessive exploitation of spurious artefacts is a problematic issue: the distribution collapse problem, which is the phenomenon that the model fine-tuned on downstream tasks is unable to distinguish out-of-distribution sentences while producing a high-confidence score. In this paper, we argue that the distribution collapse is a prevalent issue in pretrained language models and propose noise entropy regularisation (NoiER) as an efficient learning paradigm that solves the problem without auxiliary models and additional data. The proposed approach improved traditional out-of-distribution detection evaluation metrics by 55% on average compared to the original fine-tuned models.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果