A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations

MTR Laskar, S Alqahtani, MS Bari… - Proceedings of the …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …

A3-CodGen: A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware

D Liao, S Pan, X Sun, X Ren, Q Huang… - IEEE Transactions on …, 2024 - computer.org
LLM-based code generation tools are essential to help developers in the software
development process. Existing tools often disconnect with the working context, ie, the code …

A3-CodGen : A Repository-Level Code Generation Framework for Code Reuse with Local-Aware, Global-Aware, and Third-Party-Library-Aware

D Liao, S Pan, X Sun, X Ren, Q Huang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
LLM-based code generation tools are essential to help developers in the software
development process. Existing tools often disconnect with the working context, ie, the code …

Morescient GAI for software engineering

M Kessel, C Atkinson - ACM Transactions on Software Engineering and …, 2024 - dl.acm.org
The ability of Generative AI (GAI) technology to automatically check, synthesize and modify
software engineering artifacts promises to revolutionize all aspects of software engineering …

CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs

DN Manh, TP Chau, NL Hai, TT Doan… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in Code Large Language Models (CodeLLMs) have predominantly
focused on open-ended code generation tasks, often neglecting the critical aspect of code …

Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts

N Gritsch, Q Zhang, A Locatelli, S Hooker… - arXiv preprint arXiv …, 2024 - arxiv.org
Efficiency, specialization, and adaptability to new data distributions are qualities that are
hard to combine in current Large Language Models. The Mixture of Experts (MoE) …

Outcome-Refining Process Supervision for Code Generation

Z Yu, W Gu, Y Wang, Z Zeng, J Wang, W Ye… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models have demonstrated remarkable capabilities in code generation, yet
they often struggle with complex programming tasks that require deep algorithmic …

RepairBench: Leaderboard of Frontier Models for Program Repair

A Silva, M Monperrus - arXiv preprint arXiv:2409.18952, 2024 - arxiv.org
AI-driven program repair uses AI models to repair buggy software by producing patches.
Rapid advancements in AI surely impact state-of-the-art performance of program repair. Yet …

CODECLEANER: Elevating Standards with A Robust Data Contamination Mitigation Toolkit

J Cao, S Chen, W Zhang, HC Lo… - arXiv preprint arXiv …, 2024 - arxiv.org
Data contamination presents a critical barrier preventing widespread industrial adoption of
advanced software engineering techniques that leverage code language models (CLMs) …

If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs

M Khalifa, YC Tan, A Ahmadian, T Hosking… - arXiv preprint arXiv …, 2024 - arxiv.org
Model merging has shown great promise at combining expert models, but the benefit of
merging is unclear when merging``generalist''models trained on many tasks. We explore …