Gsva: Generalized segmentation via multimodal large language models
Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …
classic RES to refer to multiple objects in one expression or identify the empty targets absent …
Efficient diffusion transformer with step-wise dynamic attention mediators
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …
mechanisms of diffusion transformer models, particularly during the early stages of …
Gra: Detecting oriented objects through group-wise rotating and attention
Oriented object detection, an emerging task in recent years, aims to identify and locate
objects across varied orientations. This requires the detector to accurately capture the …
objects across varied orientations. This requires the detector to accurately capture the …
Open panoramic segmentation
Abstract Panoramic images, capturing a 360\(^\circ\) field of view (FoV), encompass
omnidirectional spatial information crucial for scene understanding. However, it is not only …
omnidirectional spatial information crucial for scene understanding. However, it is not only …
Vssd: Vision mamba with non-causal state space duality
Vision transformers have significantly advanced the field of computer vision, offering robust
modeling capabilities and global receptive field. However, their high computational …
modeling capabilities and global receptive field. However, their high computational …
Decoupling common and unique representations for multimodal self-supervised learning
Y Wang, CM Albrecht, NAA Braham, C Liu… - … on Computer Vision …, 2024 - Springer
The increasing availability of multi-sensor data sparks wide interest in multimodal self-
supervised learning. However, most existing approaches learn only common …
supervised learning. However, most existing approaches learn only common …
Research Advances in Deep Learning for Image Semantic Segmentation Techniques
ZG Xiao, TF Chai, NF Li, XF Shen, T Guan, J Tian… - IEEE …, 2024 - ieeexplore.ieee.org
Image semantic segmentation represents a significant area of research within the field of
computer vision. With the advent of deep learning, image semantic segmentation techniques …
computer vision. With the advent of deep learning, image semantic segmentation techniques …
Bud-YOLOv8s: A Potato Bud-Eye-Detection Algorithm Based on Improved YOLOv8s
W Liu, Z Li, S Zhang, T Qin, J Zhao - Electronics, 2024 - mdpi.com
The key to intelligent seed potato cutting technology lies in the accurate and rapid
identification of potato bud eyes. Existing detection algorithms suffer from low recognition …
identification of potato bud eyes. Existing detection algorithms suffer from low recognition …
Wavelet Tree Transformer: Multi-Head Attention with Frequency Selective Representation and Interaction for Remote Sensing Object Detection
J Pan, C He, W Huang, J Cao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformer has achieved remarkable success in image recognition tasks owing to its
global modeling ability. However, the quadratic computational complexity becomes a …
global modeling ability. However, the quadratic computational complexity becomes a …
[HTML][HTML] CabbageNet: Deep Learning for High-Precision Cabbage Segmentation in Complex Settings for Autonomous Harvesting Robotics
Y Tian, X Cao, T Zhang, H Wu, C Zhao, Y Zhao - Sensors, 2024 - mdpi.com
Reducing damage and missed harvest rates is essential for improving efficiency in
unmanned cabbage harvesting. Accurate real-time segmentation of cabbage heads can …
unmanned cabbage harvesting. Accurate real-time segmentation of cabbage heads can …