Dat++: Spatially dynamic vision transformer with deformable attention

Z Xia, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

被引用次数：33 相关文章所有 3 个版本

[PDF] arxiv.org

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z Xia, J Guo, D Han, Q Li, D Li, Y Yuan… - … on Computer Vision, 2025 - Springer

This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

被引用次数：7 相关文章所有 7 个版本

[PDF] arxiv.org

Gra: Detecting oriented objects through group-wise rotating and attention

J Wang, Y Pu, Y Han, J Guo, Y Wang, X Li… - European Conference on …, 2025 - Springer

Oriented object detection, an emerging task in recent years, aims to identify and locate
objects across varied orientations. This requires the detector to accurately capture the …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Open panoramic segmentation

J Zheng, R Liu, Y Chen, K Peng, C Wu, K Yang… - … on Computer Vision, 2025 - Springer

Abstract Panoramic images, capturing a 360\(^\circ\) field of view (FoV), encompass
omnidirectional spatial information crucial for scene understanding. However, it is not only …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Vssd: Vision mamba with non-causal state space duality

Y Shi, M Dong, M Li, C Xu - arXiv preprint arXiv:2407.18559, 2024 - arxiv.org

Vision transformers have significantly advanced the field of computer vision, offering robust
modeling capabilities and global receptive field. However, their high computational …

被引用次数：6 相关文章所有 3 个版本

[PDF] dlr.de

Decoupling common and unique representations for multimodal self-supervised learning

Y Wang, CM Albrecht, NAA Braham, C Liu… - … on Computer Vision …, 2024 - Springer

The increasing availability of multi-sensor data sparks wide interest in multimodal self-
supervised learning. However, most existing approaches learn only common …

被引用次数：3 相关文章所有 6 个版本

[PDF] ieee.org

Research Advances in Deep Learning for Image Semantic Segmentation Techniques

ZG Xiao, TF Chai, NF Li, XF Shen, T Guan, J Tian… - IEEE …, 2024 - ieeexplore.ieee.org

Image semantic segmentation represents a significant area of research within the field of
computer vision. With the advent of deep learning, image semantic segmentation techniques …

Bud-YOLOv8s: A Potato Bud-Eye-Detection Algorithm Based on Improved YOLOv8s

W Liu, Z Li, S Zhang, T Qin, J Zhao - Electronics, 2024 - mdpi.com

The key to intelligent seed potato cutting technology lies in the accurate and rapid
identification of potato bud eyes. Existing detection algorithms suffer from low recognition …

被引用次数：3 相关文章所有 3 个版本

Wavelet Tree Transformer: Multi-Head Attention with Frequency Selective Representation and Interaction for Remote Sensing Object Detection

J Pan, C He, W Huang, J Cao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Vision Transformer has achieved remarkable success in image recognition tasks owing to its
global modeling ability. However, the quadratic computational complexity becomes a …

[HTML][HTML] CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics

Y Tian, X Cao, T Zhang, H Wu, C Zhao, Y Zhao - Sensors, 2024 - mdpi.com

Reducing damage and missed harvest rates is essential for improving efficiency in
unmanned cabbage harvesting. Accurate real-time segmentation of cabbage heads can …