Imbalance problems in object detection: A review
In this paper, we present a comprehensive review of the imbalance problems in object
detection. To analyze the problems in a systematic manner, we introduce a problem-based …
detection. To analyze the problems in a systematic manner, we introduce a problem-based …
[HTML][HTML] A survey on dataset quality in machine learning
Y Gong, G Liu, Y Xue, R Li, L Meng - Information and Software Technology, 2023 - Elsevier
With the rise of big data, the quality of datasets has become a crucial factor affecting the
performance of machine learning models. High-quality datasets are essential for the …
performance of machine learning models. High-quality datasets are essential for the …
Datacomp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Beyond neural scaling laws: beating power law scaling via data pruning
Widely observed neural scaling laws, in which error falls off as a power of the training set
size, model size, or both, have driven substantial performance improvements in deep …
size, model size, or both, have driven substantial performance improvements in deep …
D4: Improving llm pretraining via document de-duplication and diversification
Over recent years, an increasing amount of compute and data has been poured into training
large language models (LLMs), usually by doing one-pass learning on as many tokens as …
large language models (LLMs), usually by doing one-pass learning on as many tokens as …
Adaptive second order coresets for data-efficient machine learning
O Pooladzandi, D Davini… - … on Machine Learning, 2022 - proceedings.mlr.press
Training machine learning models on massive datasets incurs substantial computational
costs. To alleviate such costs, there has been a sustained effort to develop data-efficient …
costs. To alleviate such costs, there has been a sustained effort to develop data-efficient …
Compute-efficient deep learning: Algorithmic trends and opportunities
BR Bartoldson, B Kailkhura, D Blalock - Journal of Machine Learning …, 2023 - jmlr.org
Although deep learning has made great progress in recent years, the exploding economic
and environmental costs of training neural networks are becoming unsustainable. To …
and environmental costs of training neural networks are becoming unsustainable. To …
Towards sustainable learning: Coresets for data-efficient deep learning
To improve the efficiency and sustainability of learning deep models, we propose CREST,
the first scalable framework with rigorous theoretical guarantees to identify the most valuable …
the first scalable framework with rigorous theoretical guarantees to identify the most valuable …
Sequential graph convolutional network for active learning
R Caramalau, B Bhattarai… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We propose a novel pool-based Active Learning frame-work constructed on a sequential
Graph Convolution Net-work (GCN). Each image's feature from a pool of data rep-resents a …
Graph Convolution Net-work (GCN). Each image's feature from a pool of data rep-resents a …
Towards free data selection with general-purpose models
A desirable data selection algorithm can efficiently choose the most informative samples to
maximize the utility of limited annotation budgets. However, current approaches …
maximize the utility of limited annotation budgets. However, current approaches …