Does knowledge distillation really work?

O Parraga, MD More, CM Oliveira, NS Gavenski… - ACM Computing …, 2023 - dl.acm.org

Despite being responsible for state-of-the-art results in several computer vision and natural
language processing tasks, neural networks have faced harsh criticism due to some of their …

被引用次数：33 相关文章

[PDF] arxiv.org

Weak-to-strong generalization: Eliciting strong capabilities with weak supervision

C Burns, P Izmailov, JH Kirchner, B Baker… - arXiv preprint arXiv …, 2023 - arxiv.org

Widely used alignment techniques, such as reinforcement learning from human feedback
(RLHF), rely on the ability of humans to supervise model behavior-for example, to evaluate …

被引用次数：152 相关文章所有 7 个版本

[PDF] thecvf.com

Flexivit: One model for all patch sizes

L Beyer, P Izmailov, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision Transformers convert images to sequences by slicing them into patches. The size of
these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher …

被引用次数：82 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on model compression for large language models

X Zhu, J Li, Y Liu, C Ma, W Wang - arXiv preprint arXiv:2308.07633, 2023 - arxiv.org

Large Language Models (LLMs) have revolutionized natural language processing tasks with
remarkable success. However, their formidable size and computational demands present …

被引用次数：177 相关文章所有 2 个版本

[PDF] neurips.cc

Fedrolex: Model-heterogeneous federated learning with rolling sub-model extraction

S Alam, L Liu, M Yan, M Zhang - Advances in neural …, 2022 - proceedings.neurips.cc

Most cross-device federated learning (FL) studies focus on the model-homogeneous setting
where the global server model and local client models are identical. However, such …

被引用次数：117 相关文章所有 10 个版本

[PDF] mit.edu

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023 - direct.mit.edu

Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

被引用次数：93 相关文章所有 10 个版本

[PDF] neurips.cc

Learning generalizable models for vehicle routing problems via knowledge distillation

J Bi, Y Ma, J Wang, Z Cao, J Chen… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent neural methods for vehicle routing problems always train and test the deep models
on the same instance distribution (ie, uniform). To tackle the consequent cross-distribution …

被引用次数：52 相关文章所有 8 个版本

[PDF] arxiv.org

A survey on green deep learning

J Xu, W Zhou, Z Fu, H Zhou, L Li - arXiv preprint arXiv:2111.05193, 2021 - arxiv.org

In recent years, larger and deeper models are springing up and continuously pushing state-
of-the-art (SOTA) results across various fields like natural language processing (NLP) and …

被引用次数：103 相关文章所有 3 个版本

[PDF] arxiv.org

Generalizable heterogeneous federated cross-correlation and instance similarity learning

W Huang, M Ye, Z Shi, B Du - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org

Federated learning is an important privacy-preserving multi-party learning paradigm,
involving collaborative learning with others and local updating on private data. Model …

被引用次数：32 相关文章所有 8 个版本

[PDF] thecvf.com

Hoiclip: Efficient knowledge transfer for hoi detection with vision-language models

S Ning, L Qiu, Y Liu, X He - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Abstract Human-Object Interaction (HOI) detection aims to localize human-object pairs and
recognize their interactions. Recently, Contrastive Language-Image Pre-training (CLIP) has …

被引用次数：57 相关文章所有 7 个版本