Are We on the Right Way for Evaluating Large Vision-Language Models?
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?
Translating users' natural language queries (NL) into SQL queries (ie, NL2SQL) can
significantly reduce barriers to accessing relational databases and support various …
significantly reduce barriers to accessing relational databases and support various …
Augmenting math word problems via iterative question composing
H Liu, ACC Yao - arXiv preprint arXiv:2401.09003, 2024 - arxiv.org
Despite recent progress in improving the mathematical reasoning ability of large language
models (LLMs), solving competition-level math problems without the use of external tools …
models (LLMs), solving competition-level math problems without the use of external tools …
Hdldebugger: Streamlining hdl debugging with large language models
In the domain of chip design, Hardware Description Languages (HDLs) play a pivotal role.
However, due to the complex syntax of HDLs and the limited availability of online resources …
However, due to the complex syntax of HDLs and the limited availability of online resources …
Mechanistic design and scaling of hybrid architectures
The development of deep learning architectures is a resource-demanding process, due to a
vast design space, long prototyping times, and high compute costs associated with at-scale …
vast design space, long prototyping times, and high compute costs associated with at-scale …
Codemind: A framework to challenge large language models for code reasoning
C Liu, SD Zhang, R Jabbarvand - arXiv preprint arXiv:2402.09664, 2024 - arxiv.org
Solely relying on test passing to evaluate Large Language Models (LLMs) for code
synthesis may result in unfair assessment or promoting models with data leakage. As an …
synthesis may result in unfair assessment or promoting models with data leakage. As an …
An empirical evaluation of llms for solving offensive security challenges
M Shao, B Chen, S Jancheska, B Dolan-Gavitt… - arXiv preprint arXiv …, 2024 - arxiv.org
Capture The Flag (CTF) challenges are puzzles related to computer security scenarios. With
the advent of large language models (LLMs), more and more CTF participants are using …
the advent of large language models (LLMs), more and more CTF participants are using …
A critical evaluation of ai feedback for aligning large language models
Reinforcement learning with AI feedback (RLAIF) is a popular paradigm for improving the
instruction-following abilities of powerful pre-trained language models. RLAIF first performs …
instruction-following abilities of powerful pre-trained language models. RLAIF first performs …
Cllms: Consistency large language models
Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM
inference as it breaks the sequential nature of the LLM decoding process and transforms it …
inference as it breaks the sequential nature of the LLM decoding process and transforms it …