A comprehensive overview of large language models
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …
natural language processing tasks and beyond. This success of LLMs has led to a large …
From google gemini to openai q*(q-star): A survey of reshaping the generative artificial intelligence (ai) research landscape
This comprehensive survey explored the evolving landscape of generative Artificial
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
Intelligence (AI), with a specific focus on the transformative impacts of Mixture of Experts …
A survey of large language models
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …
Minigpt-4: Enhancing vision-language understanding with advanced large language models
The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly
generating websites from handwritten text and identifying humorous elements within …
generating websites from handwritten text and identifying humorous elements within …
Improved baselines with visual instruction tuning
Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …
instruction tuning. In this paper we present the first systematic study to investigate the design …
Lisa: Reasoning segmentation via large language model
Although perception systems have made remarkable advancements in recent years they still
rely on explicit human instruction or pre-defined categories to identify the target objects …
rely on explicit human instruction or pre-defined categories to identify the target objects …
Qwen technical report
Large language models (LLMs) have revolutionized the field of artificial intelligence,
enabling natural language processing tasks that were previously thought to be exclusive to …
enabling natural language processing tasks that were previously thought to be exclusive to …
Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi
We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …
Qwen-vl: A frontier large vision-language model with versatile abilities
In this work, we introduce the Qwen-VL series, a set of large-scale vision-language models
(LVLMs) designed to perceive and understand both texts and images. Starting from the …
(LVLMs) designed to perceive and understand both texts and images. Starting from the …
Video-llama: An instruction-tuned audio-visual language model for video understanding
We present Video-LLaMA, a multi-modal framework that empowers Large Language Models
(LLMs) with the capability of understanding both visual and auditory content in the video …
(LLMs) with the capability of understanding both visual and auditory content in the video …