Large language models for software engineering: A systematic literature review

X Hou, Y Zhao, Y Liu, Z Yang, K Wang, L Li… - ACM Transactions on …, 2023 - dl.acm.org
Large Language Models (LLMs) have significantly impacted numerous domains, including
Software Engineering (SE). Many recent publications have explored LLMs applied to …

Software testing with large language models: Survey, landscape, and vision

J Wang, Y Huang, C Chen, Z Liu… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Pre-trained large language models (LLMs) have recently emerged as a breakthrough
technology in natural language processing and artificial intelligence, with the ability to …

Large language models for data annotation: A survey

Z Tan, D Li, S Wang, A Beigi, B Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org
Data annotation generally refers to the labeling or generating of raw data with relevant
information, which could be used for improving the efficacy of machine learning models. The …

The robots are here: Navigating the generative ai revolution in computing education

J Prather, P Denny, J Leinonen, BA Becker… - Proceedings of the …, 2023 - dl.acm.org
Recent advancements in artificial intelligence (AI) and specifically generative AI (GenAI) are
threatening to fundamentally reshape computing and society. Largely driven by large …

Lampilot: An open benchmark dataset for autonomous driving with language model programs

Y Ma, C Cui, X Cao, W Ye, P Liu, J Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Autonomous driving (AD) has made significant strides in recent years. However existing
frameworks struggle to interpret and execute spontaneous user instructions such as" …

Pangu-coder2: Boosting large language models for code with ranking feedback

B Shen, J Zhang, T Chen, D Zan, B Geng, A Fu… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models for Code (Code LLM) are flourishing. New and powerful models
are released on a weekly basis, demonstrating remarkable performance on the code …

Evaluating instruction-tuned large language models on code comprehension and generation

Z Yuan, J Liu, Q Zi, M Liu, X Peng, Y Lou - arXiv preprint arXiv:2308.01240, 2023 - arxiv.org
In this work, we evaluate 10 open-source instructed LLMs on four representative code
comprehension and generation tasks. We have the following main findings. First, for the zero …

Classeval: A manually-crafted benchmark for evaluating llms on class-level code generation

X Du, M Liu, K Wang, H Wang, J Liu, Y Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we make the first attempt to evaluate LLMs in a more challenging code
generation scenario, ie class-level code generation. We first manually construct the first …

A survey of large language models for code: Evolution, benchmarking, and future trends

Z Zheng, K Ning, Y Wang, J Zhang, D Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
General large language models (LLMs), represented by ChatGPT, have demonstrated
significant potential in tasks such as code generation in software engineering. This has led …

Cruxeval: A benchmark for code reasoning, understanding and execution

A Gu, B Rozière, H Leather, A Solar-Lezama… - arXiv preprint arXiv …, 2024 - arxiv.org
We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a
benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an …