Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer
Benefiting from strong generalization ability, pre-trained vision-language models (VLMs), eg,
CLIP, have been widely utilized in zero-shot scene understanding. Unlike simple recognition …
CLIP, have been widely utilized in zero-shot scene understanding. Unlike simple recognition …
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos
S Majumder, T Nagarajan, Z Al-Halah… - arXiv preprint arXiv …, 2024 - arxiv.org
Given a multi-view video, which viewpoint is most informative for a human observer?
Existing methods rely on heuristics or expensive``best-view" supervision to answer this …
Existing methods rely on heuristics or expensive``best-view" supervision to answer this …
Learning Visual Hierarchies with Hyperbolic Embeddings
Structuring latent representations in a hierarchical manner enables models to learn patterns
at multiple levels of abstraction. However, most prevalent image understanding models …
at multiple levels of abstraction. However, most prevalent image understanding models …
Adversarial Attacks on Hyperbolic Networks
As hyperbolic deep learning grows in popularity, so does the need for adversarial
robustness in the context of such a non-Euclidean geometry. To this end, this paper …
robustness in the context of such a non-Euclidean geometry. To this end, this paper …