Contrast with reconstruct: Contrastive 3d representation learning guided by generative pretraining

Z Qi, R Dong, G Fan, Z Ge, X Zhang… - … on Machine Learning, 2023 - proceedings.mlr.press
Mainstream 3D representation learning approaches are built upon contrastive or generative
modeling pretext tasks, where great improvements in performance on various downstream …

Autoencoders as cross-modal teachers: Can pretrained 2d image transformers help 3d representation learning?

R Dong, Z Qi, L Zhang, J Zhang, J Sun, Z Ge… - arXiv preprint arXiv …, 2022 - arxiv.org
The success of deep learning heavily relies on large-scale data with comprehensive labels,
which is more expensive and time-consuming to fetch in 3D compared to 2D images or …

Learning hierarchical time series data augmentation invariances via contrastive supervision for human activity recognition

D Cheng, L Zhang, C Bu, H Wu, A Song - Knowledge-Based Systems, 2023 - Elsevier
Human activity recognition (HAR) using wearable sensors is always a research hotspot in
ubiquitous computing scenario, in which feature learning has played a crucial role. Recent …

Cross contrasting feature perturbation for domain generalization

C Li, D Zhang, W Huang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Domain generalization (DG) aims to learn a robust model from source domains that
generalize well on unseen target domains. Recent studies focus on generating novel …

Deepmim: Deep supervision for masked image modeling

S Ren, F Wei, S Albanie, Z Zhang, H Hu - arXiv preprint arXiv:2303.08817, 2023 - arxiv.org
Deep supervision, which involves extra supervisions to the intermediate features of a neural
network, was widely used in image classification in the early deep learning era since it …

Stageinteractor: Query-based object detector with cross-stage interaction

Y Teng, H Liu, S Guo, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Previous object detectors make predictions based on dense grid points or numerous preset
anchors. Most of these detectors are trained with one-to-many label assignment strategies …

Cross-modality pyramid alignment for visual intention understanding

M Ye, Q Shi, K Su, B Du - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
Visual intention understanding is the task of exploring the potential and underlying meaning
expressed in images. Simply modeling the objects or backgrounds within the image content …

Dreambench++: A human-aligned benchmark for personalized image generation

Y Peng, Y Cui, H Tang, Z Qi, R Dong, J Bai… - arXiv preprint arXiv …, 2024 - arxiv.org
Personalized image generation holds great promise in assisting humans in everyday work
and life due to its impressive function in creatively generating personalized content …

Multispectral Semantic Segmentation for Land Cover Classification: An Overview

L Ramos, AD Sappa - IEEE Journal of Selected Topics in …, 2024 - ieeexplore.ieee.org
Land cover classification (LCC) is a process used to categorize the earth's surface into
distinct land types. This classification is vital for environmental conservation, urban planning …

DDAE: Towards Deep Dynamic Vision BERT Pretraining

H Chen, X Kong, X Zhang, X Zhao… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Recently, masked image modeling (MIM) has demonstrated promising prospects in self-
supervised representation learning. However, existing MIM frameworks recover all masked …