Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning

Z Xu, Z Shi, J Wei, F Mu, Y Li, Y Liang - arXiv preprint arXiv:2402.15017, 2024 - arxiv.org
Foundation models have emerged as a powerful tool for many AI problems. Despite the
tremendous success of foundation models, effective adaptation to new tasks, particularly …

Differential privacy mechanisms in neural tangent kernel regression

J Gu, Y Liang, Z Sha, Z Shi, Z Song - arXiv preprint arXiv:2407.13621, 2024 - arxiv.org
Training data privacy is a fundamental problem in modern Artificial Intelligence (AI)
applications, such as face recognition, recommendation systems, language generation, and …

Labeled data selection for category discovery

B Zhao, N Lang, S Belongie, OM Aodha - European Conference on …, 2025 - Springer
Visual category discovery methods aim to find novel categories in unlabeled visual data. At
training time, a set of labeled and unlabeled images are provided, where the labels …

Exploring the frontiers of softmax: Provable optimization, applications in diffusion model, and beyond

J Gu, C Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2405.03251, 2024 - arxiv.org
The softmax activation function plays a crucial role in the success of large language models
(LLMs), particularly in the self-attention mechanism of the widely adopted Transformer …

SPTNet: An efficient alternative framework for generalized category discovery with spatial prompt tuning

H Wang, S Vaze, K Han - arXiv preprint arXiv:2403.13684, 2024 - arxiv.org
Generalized Category Discovery (GCD) aims to classify unlabelled images from
bothseen'andunseen'classes by transferring knowledge from a set of labelledseen'class …

Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond

Y Ke, Y Liang, Z Shi, Z Song, C Yang - arXiv preprint arXiv:2412.06061, 2024 - arxiv.org
The application of transformer-based models on time series forecasting (TSF) tasks has long
been popular to study. However, many of these works fail to beat the simple linear residual …

A Novel Category Discovery Method for SAR Images Based on an Improved UNO Framework

M Chen, T Liu, L Liu - IEEE Journal of Selected Topics in …, 2024 - ieeexplore.ieee.org
In recent years, synthetic aperture radar automatic target recognition (SAR ATR) has been
widely researched for its ability to achieve high-performance target classification through …

Attention is Naturally Sparse with Gaussian Distributed Input

Y Deng, Z Song, C Yang - arXiv preprint arXiv:2404.02690, 2024 - arxiv.org
The computational intensity of Large Language Models (LLMs) is a critical bottleneck,
primarily due to the $ O (n^ 2) $ complexity of the attention mechanism in transformer …

Learning to discover anomalous spatiotemporal trajectory via Open-world State Space model

Q Gao, C Liu, L Huang, G Trajcevski, Q Guo… - Knowledge-Based …, 2024 - Elsevier
Identifying anomalous trajectories that deviate from usual driving patterns in an open-world
context has recently become a critical and urgent task in location-aware systems. In contrast …

Dara: distribution-aware representation alignment for semi-supervised domain adaptation in image classification

H Wu, Z Zheng, L Lv, C Zhang, D Bardou, S Niu… - The Journal of …, 2025 - Springer
Semi-supervised domain adaptation (SSDA) aims to adapt a model trained on an annotated
source domain to a related, but different, target domain with limited labeled and abundant …