Diswot: Student architecture search for distillation without training
Abstract Knowledge distillation (KD) is an effective training strategy to improve the
lightweight student models under the guidance of cumbersome teachers. However, the large …
lightweight student models under the guidance of cumbersome teachers. However, the large …
Automated knowledge distillation via monte carlo tree search
In this paper, we present Auto-KD, the first automated search framework for optimal
knowledge distillation design. Traditional distillation techniques typically require handcrafted …
knowledge distillation design. Traditional distillation techniques typically require handcrafted …
Kd-zero: Evolving knowledge distiller for any teacher-student pairs
Abstract Knowledge distillation (KD) has emerged as an effective technique for compressing
models that can enhance the lightweight model. Conventional KD methods propose various …
models that can enhance the lightweight model. Conventional KD methods propose various …
Emq: Evolving training-free proxies for automated mixed precision quantization
Abstract Mixed-Precision Quantization (MQ) can achieve a competitive accuracy-complexity
trade-off for models. Conventional training-based search methods require time-consuming …
trade-off for models. Conventional training-based search methods require time-consuming …
Auto-GAS: automated proxy discovery for training-free generative architecture search
In this paper, we introduce Auto-GAS, the first training-free Generative Architecture Search
(GAS) framework enabled by an auto-discovered proxy. Generative models like Generative …
(GAS) framework enabled by an auto-discovered proxy. Generative models like Generative …
Attnzero: efficient attention discovery for vision transformers
In this paper, we present AttnZero, the first framework for automatically discovering efficient
attention modules tailored for Vision Transformers (ViTs). While traditional self-attention in …
attention modules tailored for Vision Transformers (ViTs). While traditional self-attention in …
Efficient search of comprehensively robust neural architectures via multi-fidelity evaluation
J Sun, W Yao, T Jiang, X Chen - Pattern Recognition, 2024 - Elsevier
Neural architecture search (NAS) has emerged as one successful technique to find robust
deep neural network (DNN) architectures. However, most existing robustness evaluations in …
deep neural network (DNN) architectures. However, most existing robustness evaluations in …
Pruning large language models via accuracy predictor
Y Ji, Y Cao, J Liu - arXiv preprint arXiv:2309.09507, 2023 - arxiv.org
Large language models (LLMs) containing tens of billions of parameters (or even more)
have demonstrated impressive capabilities in various NLP tasks. However, substantial …
have demonstrated impressive capabilities in various NLP tasks. However, substantial …
Leveraging logit uncertainty for better knowledge distillation
Z Guo, D Wang, Q He, P Zhang - Scientific Reports, 2024 - nature.com
Abstract Knowledge distillation improves student model performance. However, using a
larger teacher model does not necessarily result in better distillation gains due to significant …
larger teacher model does not necessarily result in better distillation gains due to significant …
LeMo-NADe: Multi-Parameter Neural Architecture Discovery with LLMs
MH Rahman, P Chakraborty - arXiv preprint arXiv:2402.18443, 2024 - arxiv.org
Building efficient neural network architectures can be a time-consuming task requiring
extensive expert knowledge. This task becomes particularly challenging for edge devices …
extensive expert knowledge. This task becomes particularly challenging for edge devices …