Seeing Beyond Classes: Zero-Shot Grounded Situation Recognition via Language Explainer

J Lei, L Li, C Wang, J Xiao, L Chen - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Benefiting from strong generalization ability, pre-trained vision-language models (VLMs), eg,
CLIP, have been widely utilized in zero-shot scene understanding. Unlike simple recognition …

Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Videos

S Majumder, T Nagarajan, Z Al-Halah… - arXiv preprint arXiv …, 2024 - arxiv.org
Given a multi-view video, which viewpoint is most informative for a human observer?
Existing methods rely on heuristics or expensive``best-view" supervision to answer this …

Learning Visual Hierarchies with Hyperbolic Embeddings

Z Wang, S Ramasinghe, C Xu, J Monteil… - arXiv preprint arXiv …, 2024 - arxiv.org
Structuring latent representations in a hierarchical manner enables models to learn patterns
at multiple levels of abstraction. However, most prevalent image understanding models …

Adversarial Attacks on Hyperbolic Networks

M van Spengler, J Zahálka, P Mettes - arXiv preprint arXiv:2412.01495, 2024 - arxiv.org
As hyperbolic deep learning grows in popularity, so does the need for adversarial
robustness in the context of such a non-Euclidean geometry. To this end, this paper …