A survey of machine learning for big code and naturalness
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …
engineering has recently taken important steps in proposing learnable probabilistic models …
Deep learning for source code modeling and generation: Models, applications, and challenges
Deep Learning (DL) techniques for Natural Language Processing have been evolving
remarkably fast. Recently, the DL advances in language modeling, machine translation, and …
remarkably fast. Recently, the DL advances in language modeling, machine translation, and …
Codebleu: a method for automatic evaluation of code synthesis
Evaluation metrics play a vital role in the growth of an area as it defines the standard of
distinguishing between good and bad models. In the area of code synthesis, the commonly …
distinguishing between good and bad models. In the area of code synthesis, the commonly …
Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task
We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-
SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 …
SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 …
Unsupervised translation of programming languages
B Roziere, MA Lachaux… - Advances in neural …, 2020 - proceedings.neurips.cc
A transcompiler, also known as source-to-source translator, is a system that converts source
code from a high-level programming language (such as C++ or Python) to another …
code from a high-level programming language (such as C++ or Python) to another …
code2seq: Generating sequences from structured representations of code
The ability to generate natural language sequences from source code snippets has a variety
of applications such as code summarization, documentation, and retrieval. Sequence-to …
of applications such as code summarization, documentation, and retrieval. Sequence-to …
code2vec: Learning distributed representations of code
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …
Deep code comment generation
During software maintenance, code comments help developers comprehend programs and
reduce additional time spent on reading and navigating source code. Unfortunately, these …
reduce additional time spent on reading and navigating source code. Unfortunately, these …
Improving automatic source code summarization via deep reinforcement learning
Code summarization provides a high level natural language description of the function
performed by code, as it can benefit the software maintenance, code categorization and …
performed by code, as it can benefit the software maintenance, code categorization and …
A syntactic neural model for general-purpose code generation
We consider the problem of parsing natural language descriptions into source code written
in a general-purpose programming language like Python. Existing data-driven methods treat …
in a general-purpose programming language like Python. Existing data-driven methods treat …