Kitti-360: A novel dataset and benchmarks for urban scene understanding in 2d and 3d
For the last few decades, several major subfields of artificial intelligence including computer
vision, graphics, and robotics have progressed largely independently from each other …
vision, graphics, and robotics have progressed largely independently from each other …
Transformer-based visual segmentation: A survey
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …
segments or groups. This technique has numerous real-world applications, such as …
A comprehensive review of modern object segmentation approaches
Image segmentation is the task of associating pixels in an image with their respective object
class labels. It has a wide range of applications in many industries including healthcare …
class labels. It has a wide range of applications in many industries including healthcare …
Semantic flow for fast and accurate scene parsing
In this paper, we focus on designing effective method for fast and accurate scene parsing. A
common practice to improve the performance is to attain high resolution feature maps with …
common practice to improve the performance is to attain high resolution feature maps with …
Self-supervised learning of audio-visual objects from video
Our objective is to transform a video into a set of discrete audio-visual objects using self-
supervised learning. To this end, we introduce a model that uses attention to localize and …
supervised learning. To this end, we introduce a model that uses attention to localize and …
Psanet: Point-wise spatial attention network for scene parsing
We notice information flow in convolutional neural networks is restricted inside local
neighborhood regions due to the physical design of convolutional filters, which limits the …
neighborhood regions due to the physical design of convolutional filters, which limits the …
Softmax splatting for video frame interpolation
Differentiable image sampling in the form of backward warping has seen broad adoption in
tasks like depth estimation and optical flow prediction. In contrast, how to perform forward …
tasks like depth estimation and optical flow prediction. In contrast, how to perform forward …
The apolloscape dataset for autonomous driving
Scene parsing aims to assign a class (semantic) label for each pixel in an image. It is a
comprehensive analysis of an image. Given the rise of autonomous driving, pixel-accurate …
comprehensive analysis of an image. Given the rise of autonomous driving, pixel-accurate …
Improving semantic segmentation via video propagation and label relaxation
Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate
models. In this paper, we present a video prediction-based methodology to scale up training …
models. In this paper, we present a video prediction-based methodology to scale up training …
Large-scale video panoptic segmentation in the wild: A benchmark
In this paper, we present a new large-scale dataset for the video panoptic segmentation
task, which aims to assign semantic classes and track identities to all pixels in a video. As …
task, which aims to assign semantic classes and track identities to all pixels in a video. As …