Imbalance problems in object detection: A review

K Oksuz, BC Cam, S Kalkan… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
In this paper, we present a comprehensive review of the imbalance problems in object
detection. To analyze the problems in a systematic manner, we introduce a problem-based …

[HTML][HTML] A survey on dataset quality in machine learning

Y Gong, G Liu, Y Xue, R Li, L Meng - Information and Software Technology, 2023 - Elsevier
With the rise of big data, the quality of datasets has become a crucial factor affecting the
performance of machine learning models. High-quality datasets are essential for the …

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

Beyond neural scaling laws: beating power law scaling via data pruning

B Sorscher, R Geirhos, S Shekhar… - Advances in …, 2022 - proceedings.neurips.cc
Widely observed neural scaling laws, in which error falls off as a power of the training set
size, model size, or both, have driven substantial performance improvements in deep …

D4: Improving llm pretraining via document de-duplication and diversification

K Tirumala, D Simig, A Aghajanyan… - Advances in Neural …, 2023 - proceedings.neurips.cc
Over recent years, an increasing amount of compute and data has been poured into training
large language models (LLMs), usually by doing one-pass learning on as many tokens as …

Adaptive second order coresets for data-efficient machine learning

O Pooladzandi, D Davini… - … on Machine Learning, 2022 - proceedings.mlr.press
Training machine learning models on massive datasets incurs substantial computational
costs. To alleviate such costs, there has been a sustained effort to develop data-efficient …

Compute-efficient deep learning: Algorithmic trends and opportunities

BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …

Towards sustainable learning: Coresets for data-efficient deep learning

Y Yang, H Kang… - … Conference on Machine …, 2023 - proceedings.mlr.press
To improve the efficiency and sustainability of learning deep models, we propose CREST,
the first scalable framework with rigorous theoretical guarantees to identify the most valuable …

Sequential graph convolutional network for active learning

R Caramalau, B Bhattarai… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We propose a novel pool-based Active Learning frame-work constructed on a sequential
Graph Convolution Net-work (GCN). Each image's feature from a pool of data rep-resents a …

Towards free data selection with general-purpose models

Y Xie, M Ding, M Tomizuka… - Advances in Neural …, 2024 - proceedings.neurips.cc
A desirable data selection algorithm can efficiently choose the most informative samples to
maximize the utility of limited annotation budgets. However, current approaches …