A survey on deep learning for software engineering

Y Yang, X Xia, D Lo, J Grundy - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
In 2006, Geoffrey Hinton proposed the concept of training “Deep Neural Networks (DNNs)”
and an improved model training method to break the bottleneck of neural network …

Natural language generation and understanding of big code for AI-assisted programming: A review

MF Wong, S Guo, CN Hang, SW Ho, CW Tan - Entropy, 2023 - mdpi.com
This paper provides a comprehensive review of the literature concerning the utilization of
Natural Language Processing (NLP) techniques, with a particular focus on transformer …

Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models

P Vaithilingam, T Zhang, EL Glassman - Chi conference on human …, 2022 - dl.acm.org
Recent advances in Large Language Models (LLM) have made automatic code generation
possible for real-world programming tasks in general-purpose programming languages …

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation

Y Wang, W Wang, S Joty, SCH Hoi - arXiv preprint arXiv:2109.00859, 2021 - arxiv.org
Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently
shown to transfer well to Programming Languages (PL) and largely benefit a broad set of …

Large language models for software engineering: Survey and open problems

A Fan, B Gokkaya, M Harman… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
This paper provides a survey of the emerging area of Large Language Models (LLMs) for
Software Engineering (SE). It also sets out open research challenges for the application of …

Unified pre-training for program understanding and generation

WU Ahmad, S Chakraborty, B Ray… - arXiv preprint arXiv …, 2021 - arxiv.org
Code summarization and generation empower conversion between programming language
(PL) and natural language (NL), while code translation avails the migration of legacy code …

Codexglue: A machine learning benchmark dataset for code understanding and generation

S Lu, D Guo, S Ren, J Huang, A Svyatkovskiy… - arXiv preprint arXiv …, 2021 - arxiv.org
Benchmark datasets have a significant impact on accelerating research in programming
language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster …

Examining zero-shot vulnerability repair with large language models

H Pearce, B Tan, B Ahmad, R Karri… - … IEEE Symposium on …, 2023 - ieeexplore.ieee.org
Human developers can produce code with cybersecurity bugs. Can emerging 'smart'code
completion tools help repair those bugs? In this work, we examine the use of large language …

Graphcodebert: Pre-training code representations with data flow

D Guo, S Ren, S Lu, Z Feng, D Tang, S Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Pre-trained models for programming language have achieved dramatic empirical
improvements on a variety of code-related tasks such as code search, code completion …

Cure: Code-aware neural machine translation for automatic program repair

N Jiang, T Lutellier, L Tan - 2021 IEEE/ACM 43rd International …, 2021 - ieeexplore.ieee.org
Automatic program repair (APR) is crucial to improve software reliability. Recently, neural
machine translation (NMT) techniques have been used to automatically fix software bugs …