Unibench: Visual reasoning requires rethinking vision-language beyond scaling
Significant research efforts have been made to scale and improve vision-language model
(VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers …
(VLM) training approaches. Yet, with an ever-growing number of benchmarks, researchers …
Ai safety in generative ai large language models: A survey
Large Language Model (LLMs) such as ChatGPT that exhibit generative AI capabilities are
facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) …
facing accelerated adoption and innovation. The increased presence of Generative AI (GAI) …
Vision language model for interpretable and fine-grained detection of safety compliance in diverse workplaces
Workplace accidents due to personal protective equipment (PPE) non-compliance raise
serious safety concerns and lead to legal liabilities, financial penalties, and reputational …
serious safety concerns and lead to legal liabilities, financial penalties, and reputational …
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Recently, newly developed Vision-Language Models (VLMs), such as OpenAI's GPT-4o,
have emerged, seemingly demonstrating advanced reasoning capabilities across text and …
have emerged, seemingly demonstrating advanced reasoning capabilities across text and …
Evaluation and comparison of visual language models for transportation engineering problems
S Prajapati, T Singh, C Hegde… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent developments in vision language models (VLM) have shown great potential for
diverse applications related to image understanding. In this study, we have explored state-of …
diverse applications related to image understanding. In this study, we have explored state-of …
Omnixr: Evaluating omni-modality language models on reasoning across modalities
We introduce OmnixR, an evaluation suite designed to benchmark SoTA Omni-modality
Language Models, such as GPT-4o and Gemini. Evaluating OLMs, which integrate multiple …
Language Models, such as GPT-4o and Gemini. Evaluating OLMs, which integrate multiple …
TuneVLSeg: Prompt Tuning Benchmark for Vision-Language Segmentation Models
Abstract Vision-Language Models (VLMs) have shown impressive performance in vision
tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt …
tasks, but adapting them to new domains often requires expensive fine-tuning. Prompt …
Egocentric perception of walking environments using an interactive vision-language system
H Tan, A Mihailidis, B Laschowski - bioRxiv, 2024 - biorxiv.org
Large language models can provide a more detailed contextual understanding of a scene
beyond what computer vision alone can provide, which have implications for robotics and …
beyond what computer vision alone can provide, which have implications for robotics and …
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
We introduce INQUIRE, a text-to-image retrieval benchmark designed to challenge
multimodal vision-language models on expert-level queries. INQUIRE includes iNaturalist …
multimodal vision-language models on expert-level queries. INQUIRE includes iNaturalist …
Enabling Data-Driven and Empathetic Interactions: A Context-Aware 3D Virtual Agent in Mixed Reality for Enhanced Financial Customer Experience
C Xu, M Chen, P Deshpande, E Azanli… - … on Mixed and …, 2024 - ieeexplore.ieee.org
In this paper, we introduce a novel system designed to enhance customer service in the
financial and retail sectors through a context-aware 3D virtual agent, utilizing Mixed Reality …
financial and retail sectors through a context-aware 3D virtual agent, utilizing Mixed Reality …