- 学术资源搜索

Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

被引用次数：111 相关文章所有 2 个版本

[PDF] arxiv.org

Automated diagnosis of cardiovascular diseases from cardiac magnetic resonance imaging using deep learning models: A review

M Jafari, A Shoeibi, M Khodatars, N Ghassemi… - Computers in Biology …, 2023 - Elsevier

In recent years, cardiovascular diseases (CVDs) have become one of the leading causes of
mortality globally. At early stages, CVDs appear with minor symptoms and progressively get …

被引用次数：40 相关文章所有 13 个版本

[PDF] arxiv.org

Joint feature learning and relation modeling for tracking: A one-stream framework

B Ye, H Chang, B Ma, S Shan, X Chen - European Conference on …, 2022 - Springer

The current popular two-stream, two-stage tracking framework extracts the template and the
search region features separately and then performs relation modeling, thus the extracted …

被引用次数：384 相关文章所有 5 个版本

[PDF] neurips.cc

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

被引用次数：204 相关文章所有 6 个版本

[PDF] neurips.cc

Scaling open-vocabulary object detection

M Minderer, A Gritsenko… - Advances in Neural …, 2024 - proceedings.neurips.cc

Open-vocabulary object detection has benefited greatly from pretrained vision-language
models, but is still limited by the amount of available detection training data. While detection …

被引用次数：88 相关文章所有 6 个版本

[PDF] neurips.cc

Patch n'pack: Navit, a vision transformer for any aspect ratio and resolution

M Dehghani, B Mustafa, J Djolonga… - Advances in …, 2024 - proceedings.neurips.cc

The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution
before processing them with computer vision models has not yet been successfully …

被引用次数：41 相关文章所有 5 个版本

[PDF] neurips.cc

Confident adaptive language modeling

T Schuster, A Fisch, J Gupta… - Advances in …, 2022 - proceedings.neurips.cc

Recent advances in Transformer-based large language models (LLMs) have led to
significant performance improvements across many tasks. These gains come with a drastic …

被引用次数：134 相关文章所有 8 个版本

[PDF] thecvf.com

Flexivit: One model for all patch sizes

L Beyer, P Izmailov, A Kolesnikov… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision Transformers convert images to sequences by slicing them into patches. The size of
these patches controls a speed/accuracy tradeoff, with smaller patches leading to higher …

被引用次数：77 相关文章所有 5 个版本

[PDF] thecvf.com

Propainter: Improving propagation and transformer for video inpainting

S Zhou, C Li, KCK Chan… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms
in video inpainting (VI). Despite the effectiveness of these components, they still suffer from …

被引用次数：39 相关文章所有 5 个版本

[PDF] mlr.press

Global context vision transformers

A Hatamizadeh, H Yin, G Heinrich… - International …, 2023 - proceedings.mlr.press

We propose global context vision transformer (GC ViT), a novel architecture that enhances
parameter and compute utilization for computer vision. Our method leverages global context …

被引用次数：105 相关文章所有 11 个版本