[HTML][HTML] Natural language generation and understanding of big code for AI-assisted programming: A review

MF Wong, S Guo, CN Hang, SW Ho, CW Tan - Entropy, 2023 - mdpi.com
This paper provides a comprehensive review of the literature concerning the utilization of
Natural Language Processing (NLP) techniques, with a particular focus on transformer …

Is your code generated by chatgpt really correct? rigorous evaluation of large language models for code generation

J Liu, CS Xia, Y Wang, L Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
Program synthesis has been long studied with recent approaches focused on directly using
the power of Large Language Models (LLMs) to generate code. Programming benchmarks …

Codegen: An open large language model for code with multi-turn program synthesis

E Nijkamp, B Pang, H Hayashi, L Tu, H Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Program synthesis strives to generate a computer program as a solution to a given problem
specification, expressed with input-output examples or natural language descriptions. The …

Unixcoder: Unified cross-modal pre-training for code representation

D Guo, S Lu, N Duan, Y Wang, M Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
Pre-trained models for programming languages have recently demonstrated great success
on code intelligence. To support both code-related understanding and generation tasks …

Codet5: Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation

Y Wang, W Wang, S Joty, SCH Hoi - arXiv preprint arXiv:2109.00859, 2021 - arxiv.org
Pre-trained models for Natural Languages (NL) like BERT and GPT have been recently
shown to transfer well to Programming Languages (PL) and largely benefit a broad set of …

Program synthesis with large language models

J Austin, A Odena, M Nye, M Bosma… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper explores the limits of the current generation of large language models for
program synthesis in general purpose programming languages. We evaluate a collection of …

SantaCoder: don't reach for the stars!

LB Allal, R Li, D Kocetkov, C Mou, C Akiki… - arXiv preprint arXiv …, 2023 - arxiv.org
The BigCode project is an open-scientific collaboration working on the responsible
development of large language models for code. This tech report describes the progress of …

Unified pre-training for program understanding and generation

WU Ahmad, S Chakraborty, B Ray… - arXiv preprint arXiv …, 2021 - arxiv.org
Code summarization and generation empower conversion between programming language
(PL) and natural language (NL), while code translation avails the migration of legacy code …

Codexglue: A machine learning benchmark dataset for code understanding and generation

S Lu, D Guo, S Ren, J Huang, A Svyatkovskiy… - arXiv preprint arXiv …, 2021 - arxiv.org
Benchmark datasets have a significant impact on accelerating research in programming
language tasks. In this paper, we introduce CodeXGLUE, a benchmark dataset to foster …

Measuring coding challenge competence with apps

D Hendrycks, S Basart, S Kadavath, M Mazeika… - arXiv preprint arXiv …, 2021 - arxiv.org
While programming is one of the most broadly applicable skills in modern society, modern
machine learning models still cannot code solutions to basic problems. Despite its …