A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu… - ACM Computing Surveys …, 2018 - dl.acm.org
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …

Natural language generation and understanding of big code for AI-assisted programming: A review

MF Wong, S Guo, CN Hang, SW Ho, CW Tan - Entropy, 2023 - mdpi.com
This paper provides a comprehensive review of the literature concerning the utilization of
Natural Language Processing (NLP) techniques, with a particular focus on transformer …

Program synthesis with large language models

J Austin, A Odena, M Nye, M Bosma… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …

Unsupervised translation of programming languages

B Roziere, MA Lachaux… - Advances in neural …, 2020 - proceedings.neurips.cc
A transcompiler, also known as source-to-source translator, is a system that converts source
code from a high-level programming language (such as C++ or Python) to another …

code2vec: Learning distributed representations of code

U Alon, M Zilberstein, O Levy, E Yahav - Proceedings of the ACM on …, 2019 - dl.acm.org
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …

On the robustness of code generation techniques: An empirical study on github copilot

A Mastropaolo, L Pascarella… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Software engineering research has always being concerned with the improvement of code
completion approaches, which suggest the next tokens a developer will likely type while …

Deep code comment generation

X Hu, G Li, X Xia, D Lo, Z Jin - Proceedings of the 26th conference on …, 2018 - dl.acm.org
During software maintenance, code comments help developers comprehend programs and
reduce additional time spent on reading and navigating source code. Unfortunately, these …

Learning to represent programs with graphs

M Allamanis, M Brockschmidt, M Khademi - arXiv preprint arXiv …, 2017 - arxiv.org
Learning tasks on source code (ie, formal languages) have been considered recently, but
most work has tried to transfer natural language methods and does not capitalize on the …

Big code!= big vocabulary: Open-vocabulary models for source code

RM Karampatsis, H Babii, R Robbes, C Sutton… - Proceedings of the …, 2020 - dl.acm.org
Statistical language modeling techniques have successfully been applied to large source
code corpora, yielding a variety of new software development tools, such as tools for code …

Deep learning code fragments for code clone detection

M White, M Tufano, C Vendome… - Proceedings of the 31st …, 2016 - dl.acm.org
Code clone detection is an important problem for software maintenance and evolution. Many
approaches consider either structure or identifiers, but none of the existing detection …