Deep multimodal fusion for semantic image segmentation: A survey

Y Zhang, D Sidibé, O Morel, F Mériaudeau - Image and Vision Computing, 2021 - Elsevier
Recent advances in deep learning have shown excellent performance in various scene
understanding tasks. However, in some complex environments or under challenging …

Recent advances on loss functions in deep learning for computer vision

Y Tian, D Su, S Lauria, X Liu - Neurocomputing, 2022 - Elsevier
The loss function, also known as cost function, is used for training a neural network or other
machine learning models. Over the past decade, researchers have designed many loss …

Mst++: Multi-stage spectral-wise transformer for efficient spectral reconstruction

Y Cai, J Lin, Z Lin, H Wang, Y Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Existing leading methods for spectral reconstruction (SR) focus on designing deeper or
wider convolutional neural networks (CNNs) to learn the end-to-end mapping from the RGB …

Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers

S Zheng, J Lu, H Zhao, X Zhu, Z Luo… - Proceedings of the …, 2021 - openaccess.thecvf.com
Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with
an encoder-decoder architecture. The encoder progressively reduces the spatial resolution …

A survey on vision transformer

K Han, Y Wang, H Chen, X Chen, J Guo… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

A survey on visual transformer

K Han, Y Wang, H Chen, X Chen, J Guo, Z Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Transformer, first applied to the field of natural language processing, is a type of deep neural
network mainly based on the self-attention mechanism. Thanks to its strong representation …

Strip pooling: Rethinking spatial pooling for scene parsing

Q Hou, L Zhang, MM Cheng… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Spatial pooling has been proven highly effective to capture long-range contextual
information for pixel-wise prediction tasks, such as scene parsing. In this paper, beyond …

Object-contextual representations for semantic segmentation

Y Yuan, X Chen, J Wang - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
In this paper, we study the context aggregation problem in semantic segmentation.
Motivated by that the label of a pixel is the category of the object that the pixel belongs to, we …

Bi-directional cross-modality feature propagation with separation-and-aggregation gate for RGB-D semantic segmentation

X Chen, KY Lin, J Wang, W Wu, C Qian, H Li… - European conference on …, 2020 - Springer
Depth information has proven to be a useful cue in the semantic segmentation of RGB-D
images for providing a geometric counterpart to the RGB representation. Most existing works …

Improving semantic segmentation via decoupled body and edge supervision

X Li, X Li, L Zhang, G Cheng, J Shi, Z Lin, S Tan… - Computer Vision–ECCV …, 2020 - Springer
Existing semantic segmentation approaches either aim to improve the object's inner
consistency by modeling the global context, or refine objects detail along their boundaries …