Large language models for software engineering: A systematic literature review

X Hou, Y Zhao, Y Liu, Z Yang, K Wang, L Li… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have significantly impacted numerous domains, notably
including Software Engineering (SE). Nevertheless, a well-rounded understanding of the …

A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu… - ACM Computing Surveys …, 2018 - dl.acm.org
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …

Scaling data-constrained language models

N Muennighoff, A Rush, B Barak… - Advances in …, 2024 - proceedings.neurips.cc
The current trend of scaling language models involves increasing both parameter count and
training dataset size. Extrapolating this trend suggests that training dataset size may soon be …

The programmer's assistant: Conversational interaction with a large language model for software development

SI Ross, F Martinez, S Houde, M Muller… - Proceedings of the 28th …, 2023 - dl.acm.org
Large language models (LLMs) have recently been applied in software engineering to
perform tasks such as translating code between programming languages, generating code …

Graph neural networks: foundation, frontiers and applications

L Wu, P Cui, J Pei, L Zhao, X Guo - … of the 28th ACM SIGKDD Conference …, 2022 - dl.acm.org
The field of graph neural networks (GNNs) has seen rapid and incredible strides over the
recent years. Graph neural networks, also known as deep learning on graphs, graph …

Unified pre-training for program understanding and generation

WU Ahmad, S Chakraborty, B Ray… - arXiv preprint arXiv …, 2021 - arxiv.org
Code summarization and generation empower conversion between programming language
(PL) and natural language (NL), while code translation avails the migration of legacy code …

Codexglue: A machine learning benchmark dataset for code understanding and generation

S Lu, D Guo, S Ren, J Huang, A Svyatkovskiy… - arXiv preprint arXiv …, 2021 - arxiv.org
Benchmark datasets have a significant impact on accelerating research in programming
language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster …

Graph neural networks for natural language processing: A survey

L Wu, Y Chen, K Shen, X Guo, H Gao… - … and Trends® in …, 2023 - nowpublishers.com
Deep learning has become the dominant approach in addressing various tasks in Natural
Language Processing (NLP). Although text inputs are typically represented as a sequence …

Octopack: Instruction tuning code large language models

N Muennighoff, Q Liu, A Zebaze, Q Zheng… - arXiv preprint arXiv …, 2023 - arxiv.org
Finetuning large language models (LLMs) on instructions leads to vast performance
improvements on natural language tasks. We apply instruction tuning using code …

Codebert: A pre-trained model for programming and natural languages

Z Feng, D Guo, D Tang, N Duan, X Feng… - arXiv preprint arXiv …, 2020 - arxiv.org
We present CodeBERT, a bimodal pre-trained model for programming language (PL) and
nat-ural language (NL). CodeBERT learns general-purpose representations that support …