Interpretable deep learning: Interpretation, interpretability, trustworthiness, and beyond
Deep neural networks have been well-known for their superb handling of various machine
learning and artificial intelligence tasks. However, due to their over-parameterized black-box …
learning and artificial intelligence tasks. However, due to their over-parameterized black-box …
Data and its (dis) contents: A survey of dataset development and use in machine learning research
In this work, we survey a breadth of literature that has revealed the limitations of
predominant practices for dataset collection and use in the field of machine learning. We …
predominant practices for dataset collection and use in the field of machine learning. We …
Datacomp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Dynabench: Rethinking benchmarking in NLP
We introduce Dynabench, an open-source platform for dynamic dataset creation and model
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the …
The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
B Plank - arXiv preprint arXiv:2211.02570, 2022 - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …
learning (ML) aim at minimizing human label variation, with the assumption to maximize …
Understanding Dataset Difficulty with -Usable Information
K Ethayarajh, Y Choi… - … Conference on Machine …, 2022 - proceedings.mlr.press
Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to
humans; the bigger the performance gap, the harder the dataset is said to be. However, this …
humans; the bigger the performance gap, the harder the dataset is said to be. However, this …
Speak, memory: An archaeology of books known to chatgpt/gpt-4
In this work, we carry out a data archaeology to infer books that are known to ChatGPT and
GPT-4 using a name cloze membership inference query. We find that OpenAI models have …
GPT-4 using a name cloze membership inference query. We find that OpenAI models have …
Active learning by acquiring contrastive examples
Common acquisition functions for active learning use either uncertainty or diversity
sampling, aiming to select difficult and diverse data points from the pool of unlabeled data …
sampling, aiming to select difficult and diverse data points from the pool of unlabeled data …
Efficient methods for natural language processing: A survey
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …
scaling model parameters and training data; however, using only scale to improve …
A survey of active learning for natural language processing
In this work, we provide a survey of active learning (AL) for its applications in natural
language processing (NLP). In addition to a fine-grained categorization of query strategies …
language processing (NLP). In addition to a fine-grained categorization of query strategies …