Gsva: Generalized segmentation via multimodal large language models

Z Xia, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z Xia, J Guo, D Han, Q Li, D Li, Y Yuan… - … on Computer Vision, 2025 - Springer
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

Gra: Detecting oriented objects through group-wise rotating and attention

J Wang, Y Pu, Y Han, J Guo, Y Wang, X Li… - European Conference on …, 2025 - Springer
Oriented object detection, an emerging task in recent years, aims to identify and locate
objects across varied orientations. This requires the detector to accurately capture the …

Open panoramic segmentation

J Zheng, R Liu, Y Chen, K Peng, C Wu, K Yang… - … on Computer Vision, 2025 - Springer
Abstract Panoramic images, capturing a 360\(^\circ\) field of view (FoV), encompass
omnidirectional spatial information crucial for scene understanding. However, it is not only …

Vssd: Vision mamba with non-causal state space duality

Y Shi, M Dong, M Li, C Xu - arXiv preprint arXiv:2407.18559, 2024 - arxiv.org
Vision transformers have significantly advanced the field of computer vision, offering robust
modeling capabilities and global receptive field. However, their high computational …

Decoupling common and unique representations for multimodal self-supervised learning

Y Wang, CM Albrecht, NAA Braham, C Liu… - … on Computer Vision …, 2024 - Springer
The increasing availability of multi-sensor data sparks wide interest in multimodal self-
supervised learning. However, most existing approaches learn only common …

Research Advances in Deep Learning for Image Semantic Segmentation Techniques

ZG Xiao, TF Chai, NF Li, XF Shen, T Guan, J Tian… - IEEE …, 2024 - ieeexplore.ieee.org
Image semantic segmentation represents a significant area of research within the field of
computer vision. With the advent of deep learning, image semantic segmentation techniques …

Bud-YOLOv8s: A Potato Bud-Eye-Detection Algorithm Based on Improved YOLOv8s

W Liu, Z Li, S Zhang, T Qin, J Zhao - Electronics, 2024 - mdpi.com
The key to intelligent seed potato cutting technology lies in the accurate and rapid
identification of potato bud eyes. Existing detection algorithms suffer from low recognition …

Wavelet Tree Transformer: Multi-Head Attention with Frequency Selective Representation and Interaction for Remote Sensing Object Detection

J Pan, C He, W Huang, J Cao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformer has achieved remarkable success in image recognition tasks owing to its
global modeling ability. However, the quadratic computational complexity becomes a …

[HTML][HTML] CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics

Y Tian, X Cao, T Zhang, H Wu, C Zhao, Y Zhao - Sensors, 2024 - mdpi.com
Reducing damage and missed harvest rates is essential for improving efficiency in
unmanned cabbage harvesting. Accurate real-time segmentation of cabbage heads can …