A survey of machine learning for big code and naturalness
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …
engineering has recently taken important steps in proposing learnable probabilistic models …
Natural language generation and understanding of big code for AI-assisted programming: A review
MF Wong, S Guo, CN Hang, SW Ho, CW Tan - Entropy, 2023 - mdpi.com
This paper provides a comprehensive review of the literature concerning the utilization of
Natural Language Processing (NLP) techniques, with a particular focus on transformer …
Natural Language Processing (NLP) techniques, with a particular focus on transformer …
Program synthesis with large language models
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …
program synthesis in general purpose programming languages. We evaluate a collection of …
Unsupervised translation of programming languages
B Roziere, MA Lachaux… - Advances in neural …, 2020 - proceedings.neurips.cc
A transcompiler, also known as source-to-source translator, is a system that converts source
code from a high-level programming language (such as C++ or Python) to another …
code from a high-level programming language (such as C++ or Python) to another …
code2vec: Learning distributed representations of code
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
On the robustness of code generation techniques: An empirical study on github copilot
A Mastropaolo, L Pascarella… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Software engineering research has always being concerned with the improvement of code
completion approaches, which suggest the next tokens a developer will likely type while …
completion approaches, which suggest the next tokens a developer will likely type while …
Deep code comment generation
During software maintenance, code comments help developers comprehend programs and
reduce additional time spent on reading and navigating source code. Unfortunately, these …
reduce additional time spent on reading and navigating source code. Unfortunately, these …
Learning to represent programs with graphs
Learning tasks on source code (ie, formal languages) have been considered recently, but
most work has tried to transfer natural language methods and does not capitalize on the …
most work has tried to transfer natural language methods and does not capitalize on the …
Big code!= big vocabulary: Open-vocabulary models for source code
Statistical language modeling techniques have successfully been applied to large source
code corpora, yielding a variety of new software development tools, such as tools for code …
code corpora, yielding a variety of new software development tools, such as tools for code …
Deep learning code fragments for code clone detection
Code clone detection is an important problem for software maintenance and evolution. Many
approaches consider either structure or identifiers, but none of the existing detection …
approaches consider either structure or identifiers, but none of the existing detection …