A survey on graph neural networks and graph transformers in computer vision: a task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu, S Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …

Metaformer baselines for vision

W Yu, C Si, P Zhou, M Luo, Y Zhou… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant
role in achieving competitive performance. In this paper, we further explore the capacity of …

A generalization of vit/mlp-mixer to graphs

X He, B Hooi, T Laurent, A Perold… - International …, 2023 - proceedings.mlr.press
Abstract Graph Neural Networks (GNNs) have shown great potential in the field of graph
representation learning. Standard GNNs define a local message-passing mechanism which …

Clusterfomer: clustering as a universal visual learner

J Liang, Y Cui, Q Wang, T Geng… - Advances in neural …, 2024 - proceedings.neurips.cc
This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …

[HTML][HTML] Generative ai for visualization: State of the art and future directions

Y Ye, J Hao, Y Hou, Z Wang, S Xiao, Y Luo, W Zeng - Visual Informatics, 2024 - Elsevier
Generative AI (GenAI) has witnessed remarkable progress in recent years and
demonstrated impressive performance in various generation tasks in different domains such …

Dynamic graph learning with content-guided spatial-frequency relation reasoning for deepfake detection

Y Wang, K Yu, C Chen, X Hu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
With the springing up of face synthesis techniques, it is prominent in need to develop
powerful face forgery detection methods due to security concerns. Some existing methods …

Riformer: Keep your vision backbone effective but removing token mixer

J Wang, S Zhang, Y Liu, T Wu, Y Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …

Image as set of points

X Ma, Y Zhou, H Wang, C Qin, B Sun, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
What is an image and how to extract latent features? Convolutional Networks (ConvNets)
consider an image as organized pixels in a rectangular shape and extract features via …

Mobilevig: Graph-based sparse attention for mobile vision applications

M Munir, W Avery… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have
dominated computer vision. However, recently proposed vision graph neural networks (ViG) …

Contrastive cross-scale graph knowledge synergy

Y Zhang, Y Chen, Z Song, I King - Proceedings of the 29th ACM SIGKDD …, 2023 - dl.acm.org
Graph representation learning via Contrastive Learning (GCL) has drawn considerable
attention recently. Efforts are mainly focused on gathering more global information via …