RD-NAS: Enhancing one-shot supernet ranking ability via ranking distillation from zero-cost proxies

P Dong, L Li, Z Wei - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Abstract Knowledge distillation (KD) is an effective training strategy to improve the
lightweight student models under the guidance of cumbersome teachers. However, the large …

被引用次数：63 相关文章所有 9 个版本

[PDF] thecvf.com

Automated knowledge distillation via monte carlo tree search

L Li, P Dong, Z Wei, Y Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

In this paper, we present Auto-KD, the first automated search framework for optimal
knowledge distillation design. Traditional distillation techniques typically require handcrafted …

被引用次数：36 相关文章所有 3 个版本

[PDF] neurips.cc

Kd-zero: Evolving knowledge distiller for any teacher-student pairs

L Li, P Dong, A Li, Z Wei… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Knowledge distillation (KD) has emerged as an effective technique for compressing
models that can enhance the lightweight model. Conventional KD methods propose various …

被引用次数：26 相关文章所有 3 个版本

[PDF] thecvf.com

Emq: Evolving training-free proxies for automated mixed precision quantization

P Dong, L Li, Z Wei, X Niu, Z Tian… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Mixed-Precision Quantization (MQ) can achieve a competitive accuracy-complexity
trade-off for models. Conventional training-based search methods require time-consuming …

被引用次数：33 相关文章所有 6 个版本

[PDF] ecva.net

Auto-GAS: automated proxy discovery for training-free generative architecture search

L Li, H Sun, S Li, P Dong, W Luo, W Xue, Q Liu… - … on Computer Vision, 2025 - Springer

In this paper, we introduce Auto-GAS, the first training-free Generative Architecture Search
(GAS) framework enabled by an auto-discovered proxy. Generative models like Generative …

被引用次数：3 相关文章所有 5 个版本

[PDF] pkwyx.com

Attnzero: efficient attention discovery for vision transformers

L Li, Z Wei, P Dong, W Luo, W Xue, Q Liu… - European Conference on …, 2025 - Springer

In this paper, we present AttnZero, the first framework for automatically discovering efficient
attention modules tailored for Vision Transformers (ViTs). While traditional self-attention in …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Efficient search of comprehensively robust neural architectures via multi-fidelity evaluation

J Sun, W Yao, T Jiang, X Chen - Pattern Recognition, 2024 - Elsevier

Neural architecture search (NAS) has emerged as one successful technique to find robust
deep neural network (DNN) architectures. However, most existing robustness evaluations in …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Pruning large language models via accuracy predictor

Y Ji, Y Cao, J Liu - arXiv preprint arXiv:2309.09507, 2023 - arxiv.org

Large language models (LLMs) containing tens of billions of parameters (or even more)
have demonstrated impressive capabilities in various NLP tasks. However, substantial …

被引用次数：6 相关文章所有 2 个版本

[PDF] nature.com

Leveraging logit uncertainty for better knowledge distillation

Z Guo, D Wang, Q He, P Zhang - Scientific Reports, 2024 - nature.com

Abstract Knowledge distillation improves student model performance. However, using a
larger teacher model does not necessarily result in better distillation gains due to significant …

LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs

MH Rahman, P Chakraborty - arXiv preprint arXiv:2402.18443, 2024 - arxiv.org

Building efficient neural network architectures can be a time-consuming task requiring
extensive expert knowledge. This task becomes particularly challenging for edge devices …

被引用次数：3 相关文章所有 2 个版本