Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling
We introduce Motion-I2V, a novel framework for consistent and controllable text-guided
image-to-video generation (I2V). In contrast to previous methods that directly learn the …
image-to-video generation (I2V). In contrast to previous methods that directly learn the …
Onetracker: Unifying visual object tracking with foundation models and efficient tuning
Visual object tracking aims to localize the target object of each frame based on its initial
appearance in the first frame. Depending on the input modility tracking tasks can be divided …
appearance in the first frame. Depending on the input modility tracking tasks can be divided …
Referred by multi-modality: A unified temporal transformer for video object segmentation
Recently, video object segmentation (VOS) referred by multi-modal signals, eg, language
and audio, has evoked increasing attention in both industry and academia. It is challenging …
and audio, has evoked increasing attention in both industry and academia. It is challenging …
Panovos: Bridging non-panoramic and panoramic views with transformer for video segmentation
Panoramic videos contain richer spatial information and have attracted tremendous amounts
of attention due to their exceptional experience in some fields such as autonomous driving …
of attention due to their exceptional experience in some fields such as autonomous driving …
General Compression Framework for Efficient Transformer Object Tracking
Transformer-based trackers have established a dominant role in the field of visual object
tracking. While these trackers exhibit promising performance, their deployment on resource …
tracking. While these trackers exhibit promising performance, their deployment on resource …
EE-MLLM: A Data-Efficient and Compute-Efficient Multimodal Large Language Model
In the realm of multimodal research, numerous studies leverage substantial image-text pairs
to conduct modal alignment learning, transforming Large Language Models (LLMs) into …
to conduct modal alignment learning, transforming Large Language Models (LLMs) into …