Vl-adapter: Parameter-efficient transfer learning for vision-and-language tasks

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：163 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations

Z Zhao, L Alzubaidi, J Zhang, Y Duan, Y Gu - Expert Systems with …, 2023 - Elsevier

Deep learning has emerged as a powerful tool in various domains, revolutionising machine
learning research. However, one persistent challenge is the scarcity of labelled training …

被引用次数：51 相关文章所有 4 个版本

[PDF] thecvf.com

Adding conditional control to text-to-image diffusion models

L Zhang, A Rao, M Agrawala - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

We present ControlNet, a neural network architecture to add spatial conditioning controls to
large, pretrained text-to-image diffusion models. ControlNet locks the production-ready large …

被引用次数：2413 相关文章所有 6 个版本

[PDF] arxiv.org

Llama-adapter v2: Parameter-efficient visual instruction model

P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org

How to efficiently transform large language models (LLMs) into instruction followers is
recently a popular research direction, while training LLM for multi-modal reasoning remains …

被引用次数：419 相关文章所有 3 个版本

[PDF] thecvf.com

Side adapter network for open-vocabulary semantic segmentation

M Xu, Z Zhang, F Wei, H Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper presents a new framework for open-vocabulary semantic segmentation with the
pre-trained vision-language model, named SAN. Our approach models the semantic …

被引用次数：188 相关文章所有 6 个版本

[PDF] ieee.org

Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

被引用次数：458 相关文章所有 9 个版本

[PDF] arxiv.org

Codet5+: Open code large language models for code understanding and generation

Y Wang, H Le, AD Gotmare, NDQ Bui, J Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) pretrained on vast source code have achieved prominent
progress in code intelligence. However, existing code LLMs have two main limitations in …

被引用次数：299 相关文章所有 4 个版本

[PDF] arxiv.org

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org

This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

被引用次数：481 相关文章所有 3 个版本

[PDF] arxiv.org

Clip-adapter: Better vision-language models with feature adapters

P Gao, S Geng, R Zhang, T Ma, R Fang… - International Journal of …, 2024 - Springer

Large-scale contrastive vision-language pretraining has shown significant progress in visual
representation learning. Unlike traditional visual systems trained by a fixed set of discrete …

被引用次数：742 相关文章所有 10 个版本

[PDF] neurips.cc

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc

Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

被引用次数：191 相关文章所有 7 个版本