Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Scaling vision transformers to 22 billion parameters
The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …
present, the largest large language models (LLMs) contain upwards of 100B parameters …
Patch n'pack: Navit, a vision transformer for any aspect ratio and resolution
The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution
before processing them with computer vision models has not yet been successfully …
before processing them with computer vision models has not yet been successfully …
Rangevit: Towards vision transformers for 3d semantic segmentation in autonomous driving
Casting semantic segmentation of outdoor LiDAR point clouds as a 2D problem, eg, via
range projection, is an effective and popular approach. These projection-based methods …
range projection, is an effective and popular approach. These projection-based methods …
Which tokens to use? investigating token reduction in vision transformers
Since the introduction of the Vision Transformer (ViT), researchers have sought to make ViTs
more efficient by removing redundant information in the processed tokens. While different …
more efficient by removing redundant information in the processed tokens. While different …
Getting vit in shape: Scaling laws for compute-optimal model design
IM Alabdulmohsin, X Zhai… - Advances in Neural …, 2024 - proceedings.neurips.cc
Scaling laws have been recently employed to derive compute-optimal model size (number
of parameters) for a given compute duration. We advance and refine such methods to infer …
of parameters) for a given compute duration. We advance and refine such methods to infer …
Plainmamba: Improving non-hierarchical mamba in visual recognition
We present PlainMamba: a simple non-hierarchical state space model (SSM) designed for
general visual recognition. The recent Mamba model has shown how SSMs can be highly …
general visual recognition. The recent Mamba model has shown how SSMs can be highly …
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
In this paper we address the challenge of image resolution variation for the Segment
Anything Model (SAM). SAM known for its zero-shot generalizability exhibits a performance …
Anything Model (SAM). SAM known for its zero-shot generalizability exhibits a performance …
[HTML][HTML] A novel day-ahead regional and probabilistic wind power forecasting framework using deep CNNs and conformalized regression forests
Regional forecasting is crucial for a balanced energy delivery system and for achieving the
global transition to clean energy. However, regional wind forecasting is challenging due to …
global transition to clean energy. However, regional wind forecasting is challenging due to …