Cogvideox: Text-to-video diffusion models with an expert transformer
We introduce CogVideoX, a large-scale diffusion transformer model designed for generating
videos based on text prompts. To efficently model video data, we propose to levearge a 3D …
videos based on text prompts. To efficently model video data, we propose to levearge a 3D …
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Text-to-video (T2V) models like Sora have made significant strides in visualizing complex
prompts, which is increasingly viewed as a promising path towards constructing the …
prompts, which is increasingly viewed as a promising path towards constructing the …
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
Incremental object detection (IOD) is challenged by background shift, where background
categories in sequential data may include previously learned or future classes. Inspired by …
categories in sequential data may include previously learned or future classes. Inspired by …
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis
The multi-step sampling mechanism, a key feature of visual diffusion models, has significant
potential to replicate the success of OpenAI's Strawberry in enhancing performance by …
potential to replicate the success of OpenAI's Strawberry in enhancing performance by …
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial
preceding component of Latent Video Diffusion Models (LVDMs). With the same …
preceding component of Latent Video Diffusion Models (LVDMs). With the same …
BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices
Y Xu, Y Lee, G Yi, B Liu, Y Chen, P Liu, J Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep neural networks (DNNs) are powerful for cognitive tasks such as image classification,
object detection, and scene segmentation. One drawback however is the significant high …
object detection, and scene segmentation. One drawback however is the significant high …