Challenges in representation learning: A report on three machine learning contests

Y Wang, W Song, W Tao, A Liotta, D Yang, X Li, S Gao… - Information …, 2022 - Elsevier

Affective computing conjoins the research topics of emotion recognition and sentiment
analysis, and can be realized with unimodal or multimodal data, consisting primarily of …

被引用次数：279 相关文章所有 5 个版本

[HTML] sciencedirect.com

[HTML][HTML] Emotion recognition and artificial intelligence: A systematic review (2014–2023) and research recommendations

SK Khare, V Blanes-Vidal, ES Nadimi, UR Acharya - Information Fusion, 2023 - Elsevier

Emotion recognition is the ability to precisely infer human emotions from numerous sources
and modalities using questionnaires, physical signals, and physiological signals. Recently …

被引用次数：69 相关文章所有 7 个版本

[PDF] arxiv.org

Eva-clip: Improved training techniques for clip at scale

Q Sun, Y Fang, L Wu, X Wang, Y Cao - arXiv preprint arXiv:2303.15389, 2023 - arxiv.org

Contrastive language-image pre-training, CLIP for short, has gained increasing attention for
its potential in various scenarios. In this paper, we propose EVA-CLIP, a series of models …

被引用次数：215 相关文章所有 2 个版本

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S Jin, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

被引用次数：150 相关文章所有 9 个版本

[PDF] arxiv.org

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier

We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

被引用次数：128 相关文章所有 3 个版本

[PDF] arxiv.org

Slip: Self-supervision meets language-image pre-training

N Mu, A Kirillov, D Wagner, S Xie - European conference on computer …, 2022 - Springer

Recent work has shown that self-supervised pre-training leads to improvements over
supervised learning on challenging visual recognition tasks. CLIP, an exciting new …

被引用次数：366 相关文章所有 9 个版本

[PDF] thecvf.com

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …

被引用次数：38 相关文章所有 4 个版本

[PDF] mlr.press

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …

被引用次数：18853 相关文章所有 20 个版本

[HTML] sciencedirect.com

[HTML][HTML] Federated learning for secure IoMT-applications in smart healthcare systems: A comprehensive review

S Rani, A Kataria, S Kumar, P Tiwari - Knowledge-based systems, 2023 - Elsevier

Recent developments in the Internet of Things (IoT) and various communication
technologies have reshaped numerous application areas. Nowadays, IoT is assimilated into …

被引用次数：85 相关文章所有 4 个版本

[PDF] arxiv.org

Learn from all: Erasing attention consistency for noisy label facial expression recognition

Y Zhang, C Wang, X Ling, W Deng - European Conference on Computer …, 2022 - Springer

Abstract Noisy label Facial Expression Recognition (FER) is more challenging than
traditional noisy label classification tasks due to the inter-class similarity and the annotation …

被引用次数：127 相关文章所有 6 个版本