Are We on the Right Way for Evaluating Large Vision-Language Models?

L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …

A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

X Liu, S Shen, B Li, P Ma, R Jiang, Y Luo… - arXiv preprint arXiv …, 2024 - arxiv.org
Translating users' natural language queries (NL) into SQL queries (ie, NL2SQL) can
significantly reduce barriers to accessing relational databases and support various …

Augmenting math word problems via iterative question composing

H Liu, ACC Yao - arXiv preprint arXiv:2401.09003, 2024 - arxiv.org
Despite recent progress in improving the mathematical reasoning ability of large language
models (LLMs), solving competition-level math problems without the use of external tools …

A survey on mixture of experts

W Cai, J Jiang, F Wang, J Tang, S Kim… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have garnered unprecedented advancements across
diverse fields, ranging from natural language processing to computer vision and beyond …

Hdldebugger: Streamlining hdl debugging with large language models

X Yao, H Li, TH Chan, W Xiao, M Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org
In the domain of chip design, Hardware Description Languages (HDLs) play a pivotal role.
However, due to the complex syntax of HDLs and the limited availability of online resources …

Mechanistic design and scaling of hybrid architectures

M Poli, AW Thomas, E Nguyen, P Ponnusamy… - arXiv preprint arXiv …, 2024 - arxiv.org
The development of deep learning architectures is a resource-demanding process, due to a
vast design space, long prototyping times, and high compute costs associated with at-scale …

Codemind: A framework to challenge large language models for code reasoning

C Liu, SD Zhang, R Jabbarvand - arXiv preprint arXiv:2402.09664, 2024 - arxiv.org
Solely relying on test passing to evaluate Large Language Models (LLMs) for code
synthesis may result in unfair assessment or promoting models with data leakage. As an …

An empirical evaluation of llms for solving offensive security challenges

M Shao, B Chen, S Jancheska, B Dolan-Gavitt… - arXiv preprint arXiv …, 2024 - arxiv.org
Capture The Flag (CTF) challenges are puzzles related to computer security scenarios. With
the advent of large language models (LLMs), more and more CTF participants are using …

A critical evaluation of ai feedback for aligning large language models

A Sharma, S Keh, E Mitchell, C Finn, K Arora… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning with AI feedback (RLAIF) is a popular paradigm for improving the
instruction-following abilities of powerful pre-trained language models. RLAIF first performs …

Cllms: Consistency large language models

S Kou, L Hu, Z He, Z Deng, H Zhang - arXiv preprint arXiv:2403.00835, 2024 - arxiv.org
Parallel decoding methods such as Jacobi decoding show promise for more efficient LLM
inference as it breaks the sequential nature of the LLM decoding process and transforms it …