How far are we to gpt-4v? closing the gap to commercial multimodal models with open-source suites
In this report, we introduce InternVL 1.5, an open-source multimodal large language model
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
(MLLM) to bridge the capability gap between open-source and proprietary commercial …
Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd
The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its
progression has been hindered by challenges in comprehending fine-grained visual content …
progression has been hindered by challenges in comprehending fine-grained visual content …
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs)
to address complex problems and enhance robustness and interpretability. Despite the flux …
to address complex problems and enhance robustness and interpretability. Despite the flux …
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers
The rapid development of Large Language Models (LLMs) demonstrates remarkable
multilingual capabilities in natural language processing, attracting global attention in both …
multilingual capabilities in natural language processing, attracting global attention in both …
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Text-centric visual question answering (VQA) has made great strides with the development
of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of …
of Multimodal Large Language Models (MLLMs), yet open-source models still fall short of …
Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks
Recently, the large language model (LLM) community has shown increasing interest in
enhancing LLMs' capability to handle extremely long documents. As various long-text …
enhancing LLMs' capability to handle extremely long documents. As various long-text …
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters
Exploiting activation sparsity is a promising approach to significantly accelerating the
inference process of large language models (LLMs) without compromising performance …
inference process of large language models (LLMs) without compromising performance …
Needle In A Multimodal Haystack
With the rapid advancement of multimodal large language models (MLLMs), their evaluation
has become increasingly comprehensive. However, understanding long multimodal content …
has become increasingly comprehensive. However, understanding long multimodal content …
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that
unifies visual perception, understanding, and generation within a single framework. Unlike …
unifies visual perception, understanding, and generation within a single framework. Unlike …
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Continual Pre-Training (CPT) on Large Language Models (LLMs) has been widely used to
expand the model's fundamental understanding of specific downstream domains (eg, math …
expand the model's fundamental understanding of specific downstream domains (eg, math …