A survey on graph neural networks and graph transformers in computer vision: a task-oriented perspective
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …
Metaformer baselines for vision
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant
role in achieving competitive performance. In this paper, we further explore the capacity of …
role in achieving competitive performance. In this paper, we further explore the capacity of …
A generalization of vit/mlp-mixer to graphs
Abstract Graph Neural Networks (GNNs) have shown great potential in the field of graph
representation learning. Standard GNNs define a local message-passing mechanism which …
representation learning. Standard GNNs define a local message-passing mechanism which …
Clusterfomer: clustering as a universal visual learner
This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …
[HTML][HTML] Generative ai for visualization: State of the art and future directions
Generative AI (GenAI) has witnessed remarkable progress in recent years and
demonstrated impressive performance in various generation tasks in different domains such …
demonstrated impressive performance in various generation tasks in different domains such …
Dynamic graph learning with content-guided spatial-frequency relation reasoning for deepfake detection
With the springing up of face synthesis techniques, it is prominent in need to develop
powerful face forgery detection methods due to security concerns. Some existing methods …
powerful face forgery detection methods due to security concerns. Some existing methods …
Riformer: Keep your vision backbone effective but removing token mixer
This paper studies how to keep a vision backbone effective while removing token mixers in
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …
its basic building blocks. Token mixers, as self-attention for vision transformers (ViTs), are …
Image as set of points
What is an image and how to extract latent features? Convolutional Networks (ConvNets)
consider an image as organized pixels in a rectangular shape and extract features via …
consider an image as organized pixels in a rectangular shape and extract features via …
Mobilevig: Graph-based sparse attention for mobile vision applications
Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have
dominated computer vision. However, recently proposed vision graph neural networks (ViG) …
dominated computer vision. However, recently proposed vision graph neural networks (ViG) …
Contrastive cross-scale graph knowledge synergy
Graph representation learning via Contrastive Learning (GCL) has drawn considerable
attention recently. Efforts are mainly focused on gathering more global information via …
attention recently. Efforts are mainly focused on gathering more global information via …