Patch-level contrasting without patch correspondence for accurate and dense contrastive represent...

Video-text as game players: Hierarchical banzhaf interaction for cross-modal representation learning

P Jin, J Huang, P Xiong, S Tian, C Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Contrastive learning-based video-language representation learning approaches, eg, CLIP,
have achieved outstanding performance, which pursue semantic interaction upon pre …

被引用次数：63 相关文章所有 6 个版本

[PDF] thecvf.com

Diffusionret: Generative text-video retrieval with diffusion model

P Jin, H Li, Z Cheng, K Li, X Ji, C Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Existing text-video retrieval solutions are, in essence, discriminant models focused on
maximizing the conditional likelihood, ie, p (candidates| query). While straightforward, this …

被引用次数：53 相关文章所有 5 个版本

[PDF] thecvf.com

Time does tell: Self-supervised time-tuning of dense image representations

M Salehi, E Gavves, CGM Snoek… - Proceedings of the …, 2023 - openaccess.thecvf.com

Spatially dense self-supervised learning is a rapidly growing problem domain with
promising applications for unsupervised segmentation and pretraining for dense …

被引用次数：17 相关文章所有 7 个版本

[PDF] arxiv.org

Text-video retrieval with disentangled conceptualization and set-to-set alignment

P Jin, H Li, Z Cheng, J Huang, Z Wang, L Yuan… - arXiv preprint arXiv …, 2023 - arxiv.org

Text-video retrieval is a challenging cross-modal task, which aims to align visual entities with
natural language descriptions. Current methods either fail to leverage the local details or are …

被引用次数：33 相关文章所有 4 个版本

[PDF] mlr.press

Patch-level contrastive learning via positional query for visual pre-training

S Zhang, Q Zhou, Z Wang, F Wang… - … on Machine Learning, 2023 - proceedings.mlr.press

Dense contrastive learning (DCL) has been recently explored for learning localized
information for dense prediction tasks (eg, detection and segmentation). It still suffers the …

被引用次数：12 相关文章所有 6 个版本

[HTML] sciencedirect.com

[HTML][HTML] pnnclr: Stochastic pseudo neighborhoods for contrastive learning based unsupervised representation learning problems

M Biswas, H Buckchash, DK Prasad - Neurocomputing, 2024 - Elsevier

Nearest neighbor (NN) sampling provides more semantic variations than predefined
transformations for self-supervised learning (SSL) based image recognition problems …

被引用次数：7 相关文章所有 4 个版本

[PDF] thecvf.com

SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training

S Wu, H Tan, Z Tian, Y Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language pre-training (VLP) aims to learn joint representations of vision and
language modalities. The contrastive paradigm is currently dominant in this field. However …

被引用次数：2 相关文章

[PDF] thecvf.com

Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability Composability and Decomposability from Anatomy via Self Supervision

MRH Taher, MB Gotway… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Humans effortlessly interpret images by parsing them into part-whole hierarchies; deep
learning excels in learning multi-level feature spaces but they often lack explicit coding of …

被引用次数：1 相关文章所有 2 个版本

[PDF] thecvf.com

QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition

X Li, J Wang, X Xu, X Peng, R Singh… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audiovisual segmentation (AVS) is a challenging task that aims to segment visual objects in
videos according to their associated acoustic cues. With multiple sound sources and …

被引用次数：3 相关文章所有 2 个版本

[PDF] thecvf.com

Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation

J Wang, Z Sun, Z Tan, X Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vanilla text-to-image diffusion models struggle with generating accurate human images
commonly resulting in imperfect anatomies such as unnatural postures or disproportionate …

被引用次数：2 相关文章所有 3 个版本