A lightweight residual network based on improved knowledge transfer and quantized distillation for cross-domain fault diagnosis of rolling bearings

W Guo, X Li, Z Shen - Expert Systems with Applications, 2024 - Elsevier
Predictive maintenance advocates the use of artificial intelligence to analyze big data and
provides support for monitoring health conditions and planning maintenance activities in …

Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

N Boizard, K El-Haddad, C Hudelot… - arXiv preprint arXiv …, 2024 - arxiv.org
Deploying large language models (LLMs) of several billion parameters can be impractical in
most industrial use cases due to constraints such as cost, latency limitations, and hardware …

Knowledge distillation with insufficient training data for regression

M Kang, S Kang - Engineering Applications of Artificial Intelligence, 2024 - Elsevier
Abstract Knowledge distillation has been widely used to compress a large teacher network
into a smaller student network. Conventional approaches require the training dataset that …

[HTML][HTML] Knowledge Distillation in Image Classification: The Impact of Datasets

AG Belinga, CS Tekouabou Koumetio, M El Haziti… - Computers, 2024 - mdpi.com
As the demand for efficient and lightweight models in image classification grows, knowledge
distillation has emerged as a promising technique to transfer expertise from complex teacher …

A data-driven target-oriented robust optimization framework: bridging machine learning and optimization under uncertainty

JL San Juan, C Sy - Journal of Industrial and Production …, 2024 - Taylor & Francis
The target-oriented robust optimization (TORO) approach converts the original objectives to
system targets and instead maximizes an uncertainty budget or robustness index. Machine …

Task‐oriented feature hallucination for few‐shot image classification

S Wu, X Gao, X Hu - IET Image Processing, 2023 - Wiley Online Library
Data hallucination generates additional training examples for novel classes to alleviate the
data scarcity problem in few‐shot learning (FSL). Existing hallucination‐based FSL methods …

[PDF][PDF] Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

P Colombo, K El Haddad, C Hudelot, N Boizard - 2024 - hal.science
Deploying large language models (LLMs) of several billion parameters can be impractical in
most industrial use cases due to constraints such as cost, latency limitations, and hardware …