Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining
Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …
modeling pretext tasks, where great improvements in performance on various downstream …
Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?
The success of deep learning heavily relies on large-scale data with comprehensive labels,
which is more expensive and time-consuming to fetch in 3D compared to 2D images or …
which is more expensive and time-consuming to fetch in 3D compared to 2D images or …
Learning hierarchical time series data augmentation invariances via contrastive supervision for human activity recognition
Human activity recognition (HAR) using wearable sensors is always a research hotspot in
ubiquitous computing scenario, in which feature learning has played a crucial role. Recent …
ubiquitous computing scenario, in which feature learning has played a crucial role. Recent …
Cross contrasting feature perturbation for domain generalization
Abstract Domain generalization (DG) aims to learn a robust model from source domains that
generalize well on unseen target domains. Recent studies focus on generating novel …
generalize well on unseen target domains. Recent studies focus on generating novel …
Deepmim: Deep supervision for masked image modeling
Deep supervision, which involves extra supervisions to the intermediate features of a neural
network, was widely used in image classification in the early deep learning era since it …
network, was widely used in image classification in the early deep learning era since it …
Stageinteractor: Query-based object detector with cross-stage interaction
Previous object detectors make predictions based on dense grid points or numerous preset
anchors. Most of these detectors are trained with one-to-many label assignment strategies …
anchors. Most of these detectors are trained with one-to-many label assignment strategies …
Cross-modality pyramid alignment for visual intention understanding
Visual intention understanding is the task of exploring the potential and underlying meaning
expressed in images. Simply modeling the objects or backgrounds within the image content …
expressed in images. Simply modeling the objects or backgrounds within the image content …
Dreambench++: A human-aligned benchmark for personalized image generation
Personalized image generation holds great promise in assisting humans in everyday work
and life due to its impressive function in creatively generating personalized content …
and life due to its impressive function in creatively generating personalized content …
Multispectral Semantic Segmentation for Land Cover Classification: An Overview
Land cover classification (LCC) is a process used to categorize the earth's surface into
distinct land types. This classification is vital for environmental conservation, urban planning …
distinct land types. This classification is vital for environmental conservation, urban planning …
DDAE: Towards Deep Dynamic Vision BERT Pretraining
Recently, masked image modeling (MIM) has demonstrated promising prospects in self-
supervised representation learning. However, existing MIM frameworks recover all masked …
supervised representation learning. However, existing MIM frameworks recover all masked …