Learning visual reasoning without strong priors

T Park, MY Liu, TC Wang… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

We propose spatially-adaptive normalization, a simple but effective layer for synthesizing
photorealistic images given an input semantic layout. Previous methods directly feed the …

被引用次数：3278 相关文章所有 25 个版本

[PDF] neurips.cc

Tadam: Task dependent adaptive metric for improved few-shot learning

B Oreshkin, P Rodríguez López… - Advances in neural …, 2018 - proceedings.neurips.cc

Few-shot learning has become essential for producing models that generalize from few
examples. In this work, we identify that metric scaling and metric task conditioning are …

被引用次数：1572 相关文章所有 8 个版本

[PDF] thecvf.com

Recovering realistic texture in image super-resolution by deep spatial feature transform

X Wang, K Yu, C Dong, CC Loy - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

Despite that convolutional neural networks (CNN) have recently demonstrated high-quality
reconstruction for single-image super-resolution (SR), recovering natural and realistic …

被引用次数：1163 相关文章所有 11 个版本

[PDF] arxiv.org

Slimmable neural networks

J Yu, L Yang, N Xu, J Yang, T Huang - arXiv preprint arXiv:1812.08928, 2018 - arxiv.org

We present a simple and general method to train a single neural network executable at
different widths (number of channels in a layer), permitting instant and adaptive accuracy …

被引用次数：657 相关文章所有 6 个版本

[PDF] aaai.org

Film: Visual reasoning with a general conditioning layer

E Perez, F Strub, H De Vries, V Dumoulin… - Proceedings of the …, 2018 - ojs.aaai.org

We introduce a general-purpose conditioning method for neural networks called FiLM:
Feature-wise Linear Modulation. FiLM layers influence neural network computation via a …

被引用次数：2078 相关文章所有 16 个版本

[PDF] thecvf.com

Efficient video object segmentation via network modulation

L Yang, Y Wang, X Xiong, J Yang… - Proceedings of the …, 2018 - openaccess.thecvf.com

Video object segmentation targets segmenting a specific object throughout a video
sequence when given only an annotated first frame. Recent deep learning based …

被引用次数：433 相关文章所有 11 个版本

[PDF] aaai.org

Reversible architectures for arbitrarily deep residual neural networks

B Chang, L Meng, E Haber, L Ruthotto… - Proceedings of the …, 2018 - ojs.aaai.org

Recently, deep residual networks have been successfully applied in many computer vision
and natural language processing tasks, pushing the state-of-the-art performance with …

被引用次数：312 相关文章所有 8 个版本

[PDF] thecvf.com

Semantics disentangling for text-to-image generation

G Yin, B Liu, L Sheng, N Yu… - Proceedings of the …, 2019 - openaccess.thecvf.com

Synthesizing photo-realistic images from text descriptions is a challenging problem.
Previous studies have shown remarkable progresses on visual quality of the generated …

被引用次数：232 相关文章所有 7 个版本

[PDF] thecvf.com

Long-term cloth-changing person re-identification

X Qian, W Wang, L Zhang, F Zhu, Y Fu… - Proceedings of the …, 2020 - openaccess.thecvf.com

Person re-identification (Re-ID) aims to match a target person across camera views at
different locations and times. Existing Re-ID studies focus on the short-term cloth-consistent …

被引用次数：174 相关文章所有 9 个版本

[PDF] thecvf.com

Transparency by design: Closing the gap between performance and interpretability in visual reasoning

D Mascharka, P Tran, R Soklaski… - Proceedings of the …, 2018 - openaccess.thecvf.com

Visual question answering requires high-order reasoning about an image, which is a
fundamental capability needed by machine systems to follow complex directives. Recently …

被引用次数：247 相关文章所有 7 个版本