A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu… - ACM Computing Surveys …, 2018 - dl.acm.org
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …

Genetic improvement of software: a comprehensive survey

J Petke, SO Haraldsson, M Harman… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
Genetic improvement (GI) uses automated search to find improved versions of existing
software. We present a comprehensive survey of this nascent field of research with a focus …

Large language models for software engineering: Survey and open problems

A Fan, B Gokkaya, M Harman… - 2023 IEEE/ACM …, 2023 - ieeexplore.ieee.org
This paper provides a survey of the emerging area of Large Language Models (LLMs) for
Software Engineering (SE). It also sets out open research challenges for the application of …

Studying the usage of text-to-text transfer transformer to support code-related tasks

A Mastropaolo, S Scalabrino, N Cooper… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
Deep learning (DL) techniques are gaining more and more attention in the software
engineering community. They have been used to support several code-related tasks, such …

An empirical study on learning bug-fixing patches in the wild via neural machine translation

M Tufano, C Watson, G Bavota, MD Penta… - ACM Transactions on …, 2019 - dl.acm.org
Millions of open source projects with numerous bug fixes are available in code repositories.
This proliferation of software development histories can be leveraged to learn how to fix …

Big code!= big vocabulary: Open-vocabulary models for source code

RM Karampatsis, H Babii, R Robbes, C Sutton… - Proceedings of the …, 2020 - dl.acm.org
Statistical language modeling techniques have successfully been applied to large source
code corpora, yielding a variety of new software development tools, such as tools for code …

Deep learning code fragments for code clone detection

M White, M Tufano, C Vendome… - Proceedings of the 31st …, 2016 - dl.acm.org
Code clone detection is an important problem for software maintenance and evolution. Many
approaches consider either structure or identifiers, but none of the existing detection …

Context-aware patch generation for better automated program repair

M Wen, J Chen, R Wu, D Hao, SC Cheung - Proceedings of the 40th …, 2018 - dl.acm.org
The effectiveness of search-based automated program repair is limited in the number of
correct patches that can be successfully generated. There are two causes of such limitation …

Deepsim: deep learning code functional similarity

G Zhao, J Huang - Proceedings of the 2018 26th ACM joint meeting on …, 2018 - dl.acm.org
Measuring code similarity is fundamental for many software engineering tasks, eg, code
search, refactoring and reuse. However, most existing techniques focus on code syntactical …

Learning to mine aligned code and natural language pairs from stack overflow

P Yin, B Deng, E Chen, B Vasilescu… - Proceedings of the 15th …, 2018 - dl.acm.org
For tasks like code synthesis from natural language, code retrieval, and code
summarization, data-driven models have shown great promise. However, creating these …