A survey of machine learning for big code and naturalness

M Allamanis, ET Barr, P Devanbu… - ACM Computing Surveys …, 2018 - dl.acm.org
Research at the intersection of machine learning, programming languages, and software
engineering has recently taken important steps in proposing learnable probabilistic models …

Deep learning for source code modeling and generation: Models, applications, and challenges

THM Le, H Chen, MA Babar - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Deep Learning (DL) techniques for Natural Language Processing have been evolving
remarkably fast. Recently, the DL advances in language modeling, machine translation, and …

Codebleu: a method for automatic evaluation of code synthesis

S Ren, D Guo, S Lu, L Zhou, S Liu, D Tang… - arXiv preprint arXiv …, 2020 - arxiv.org
Evaluation metrics play a vital role in the growth of an area as it defines the standard of
distinguishing between good and bad models. In the area of code synthesis, the commonly …

Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task

T Yu, R Zhang, K Yang, M Yasunaga, D Wang… - arXiv preprint arXiv …, 2018 - arxiv.org
We present Spider, a large-scale, complex and cross-domain semantic parsing and text-to-
SQL dataset annotated by 11 college students. It consists of 10,181 questions and 5,693 …

Unsupervised translation of programming languages

B Roziere, MA Lachaux… - Advances in neural …, 2020 - proceedings.neurips.cc
A transcompiler, also known as source-to-source translator, is a system that converts source
code from a high-level programming language (such as C++ or Python) to another …

code2seq: Generating sequences from structured representations of code

U Alon, S Brody, O Levy, E Yahav - arXiv preprint arXiv:1808.01400, 2018 - arxiv.org
The ability to generate natural language sequences from source code snippets has a variety
of applications such as code summarization, documentation, and retrieval. Sequence-to …

code2vec: Learning distributed representations of code

U Alon, M Zilberstein, O Levy, E Yahav - Proceedings of the ACM on …, 2019 - dl.acm.org
We present a neural model for representing snippets of code as continuous distributed
vectors (``code embeddings''). The main idea is to represent a code snippet as a single fixed …

Deep code comment generation

X Hu, G Li, X Xia, D Lo, Z Jin - Proceedings of the 26th conference on …, 2018 - dl.acm.org
During software maintenance, code comments help developers comprehend programs and
reduce additional time spent on reading and navigating source code. Unfortunately, these …

Improving automatic source code summarization via deep reinforcement learning

Y Wan, Z Zhao, M Yang, G Xu, H Ying, J Wu… - Proceedings of the 33rd …, 2018 - dl.acm.org
Code summarization provides a high level natural language description of the function
performed by code, as it can benefit the software maintenance, code categorization and …

A syntactic neural model for general-purpose code generation

P Yin, G Neubig - arXiv preprint arXiv:1704.01696, 2017 - arxiv.org
We consider the problem of parsing natural language descriptions into source code written
in a general-purpose programming language like Python. Existing data-driven methods treat …