A review of modern recommender systems using generative models (gen-recsys)
Traditional recommender systems typically use user-item rating histories as their main data
source. However, deep generative models now have the capability to model and sample …
source. However, deep generative models now have the capability to model and sample …
Leveraging temporal contextualization for video action recognition
We propose a novel framework for video understanding, called Temporally Contextualized
CLIP (TC-CLIP), which leverages essential temporal information through global interactions …
CLIP (TC-CLIP), which leverages essential temporal information through global interactions …
Rethinking clip-based video learners in cross-domain open-vocabulary action recognition
Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining),
recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to …
recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to …
Recommendation with generative models
Generative models are a class of AI models capable of creating new instances of data by
learning and sampling from their statistical distributions. In recent years, these models have …
learning and sampling from their statistical distributions. In recent years, these models have …
Awt: Transferring vision-language models via augmentation, weighting, and transportation
Pre-trained vision-language models (VLMs) have shown impressive results in various visual
classification tasks. However, we often fail to fully unleash their potential when adapting …
classification tasks. However, we often fail to fully unleash their potential when adapting …
Tabpedia: Towards comprehensive visual table understanding with concept synergy
Tables contain factual and quantitative data accompanied by various structures and
contents that pose challenges for machine comprehension. Previous methods generally …
contents that pose challenges for machine comprehension. Previous methods generally …
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Transferring visual-language knowledge from large-scale foundation models for video
recognition has proved to be effective. To bridge the domain gap, additional parametric …
recognition has proved to be effective. To bridge the domain gap, additional parametric …
Multi-modal Generative Models in Recommendation System
Many recommendation systems limit user inputs to text strings or behavior signals such as
clicks and purchases, and system outputs to a list of products sorted by relevance. With the …
clicks and purchases, and system outputs to a list of products sorted by relevance. With the …
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
In rapidly evolving field of vision-language models (VLMs), contrastive language-image pre-
training (CLIP) has made significant strides, becoming foundation for various downstream …
training (CLIP) has made significant strides, becoming foundation for various downstream …
LLAVIDAL: Benchmarking Large Language Vision Models for Daily Activities of Living
Large Language Vision Models (LLVMs) have demonstrated effectiveness in processing
internet videos, yet they struggle with the visually perplexing dynamics present in Activities …
internet videos, yet they struggle with the visually perplexing dynamics present in Activities …