Llm-pruner: On the structural pruning of large language models

X Ma, G Fang, X Wang - Advances in neural information …, 2023 - proceedings.neurips.cc
Large language models (LLMs) have shown remarkable capabilities in language
understanding and generation. However, such impressive capability typically comes with a …

Up to 100x faster data-free knowledge distillation

G Fang, K Mo, X Wang, J Song, S Bei… - Proceedings of the …, 2022 - ojs.aaai.org
Data-free knowledge distillation (DFKD) has recently been attracting increasing attention
from research communities, attributed to its capability to compress a model only using …

Contrastive model inversion for data-free knowledge distillation

G Fang, J Song, X Wang, C Shen, X Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
Model inversion, whose goal is to recover training data from a pre-trained model, has been
recently proved feasible. However, existing inversion methods usually suffer from the mode …

Robust and resource-efficient data-free knowledge distillation by generative pseudo replay

K Binici, S Aggarwal, NT Pham, K Leman… - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Abstract Data-Free Knowledge Distillation (KD) allows knowledge transfer from a trained
neural network (teacher) to a more compact one (student) in the absence of original training …

Data-free knowledge transfer: A survey

Y Liu, W Zhang, J Wang, J Wang - arXiv preprint arXiv:2112.15278, 2021 - arxiv.org
In the last decade, many deep learning models have been well trained and made a great
success in various fields of machine intelligence, especially for computer vision and natural …

When gradient descent meets derivative-free optimization: A match made in black-box scenario

C Han, L Cui, R Zhu, J Wang, N Chen, Q Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
Large pre-trained language models (PLMs) have garnered significant attention for their
versatility and potential for solving a wide spectrum of natural language processing (NLP) …

Prompting to distill: Boosting data-free knowledge distillation via reinforced prompt

X Ma, X Wang, G Fang, Y Shen, W Lu - arXiv preprint arXiv:2205.07523, 2022 - arxiv.org
Data-free knowledge distillation (DFKD) conducts knowledge distillation via eliminating the
dependence of original training data, and has recently achieved impressive results in …

Data-Free Distillation of Language Model by Text-to-Text Transfer

Z Bai, X Liu, H Hu, T Guo, Q Zhang, Y Wang - arXiv preprint arXiv …, 2023 - arxiv.org
Data-Free Knowledge Distillation (DFKD) plays a vital role in compressing the model when
original training data is unavailable. Previous works for DFKD in NLP mainly focus on …

Feature-rich audio model inversion for data-free knowledge distillation towards general sound classification

Z Kang, Y He, J Wang, J Peng, X Qu… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Data-Free Knowledge Distillation (DFKD) has recently attracted growing attention in the
academic community, especially with major breakthroughs in computer vision. Despite …

Narrowing the language gap: domain adaptation guided cross-lingual passage re-ranking

D Chen, X Zhang, S Zhang - Neural Computing and Applications, 2023 - Springer
For a given query, the objective of Cross-lingual Passage Re-ranking (XPR) is to rank a list
of candidate passages in multiple languages, where only a portion of the passages are in …