Natgen: generative pre-training by “naturalizing” source code

S Chakraborty, T Ahmed, Y Ding, PT Devanbu… - Proceedings of the 30th …, 2022 - dl.acm.org
Pre-trained Generative Language models (eg, PLBART, CodeT5, SPT-Code) for source
code yielded strong results on several tasks in the past few years, including code generation …

Convergent representations of computer programs in human and artificial neural networks

S Srikant, B Lipkin, A Ivanova… - Advances in …, 2022 - proceedings.neurips.cc
What aspects of computer programs are represented by the human brain during
comprehension? We leverage brain recordings derived from functional magnetic resonance …

Dependency-aware code naturalness

C Yang, J Chen, J Jiang, Y Huang - Proceedings of the ACM on …, 2024 - dl.acm.org
Code naturalness, which captures repetitiveness and predictability in programming
languages, has proven valuable for various code-related tasks in software engineering …

CONCORD: clone-aware contrastive learning for source code

Y Ding, S Chakraborty, L Buratti, S Pujar… - Proceedings of the …, 2023 - dl.acm.org
Deep Learning (DL) models to analyze source code have shown immense promise during
the past few years. More recently, self-supervised pre-training has gained traction for …

Naturally!: How breakthroughs in natural language processing can dramatically help developers

AA Sawant, P Devanbu - IEEE Software, 2021 - ieeexplore.ieee.org
Taking advantage of the naturalness hypothesis for code, recent development, and research
has focused on applying machine learning (ML) techniques originally developed for natural …

Towards Understanding What Code Language Models Learned

T Ahmed, D Yu, C Huang, C Wang, P Devanbu… - arXiv preprint arXiv …, 2023 - arxiv.org
Pre-trained language models are effective in a variety of natural language tasks, but it has
been argued their capabilities fall short of fully learning meaning or understanding …

CodeScholar: Growing Idiomatic Code Examples

M Shetty, K Sen, I Stoica - arXiv preprint arXiv:2312.15157, 2023 - arxiv.org
Programmers often search for usage examples for API methods. A tool that could generate
realistic, idiomatic, and contextual usage examples for one or more APIs would be …

Demystifying and Assessing Code Understandability in Java Decompilation

R Qin, Y Xiong, Y Lu, M Pan - arXiv preprint arXiv:2409.20343, 2024 - arxiv.org
Decompilation, the process of converting machine-level code into readable source code,
plays a critical role in reverse engineering. Given that the main purpose of decompilation is …

Do Developers Present Proficient Code Snippets in Their README Files? An Analysis of PyPI Libraries in GitHub

S Sitthithanasakul, B Chinthanet, RG Kula… - Journal of Information …, 2023 - jstage.jst.go.jp
A README file plays an essential role as the face of a software project and the initial point of
contact for developers in Open Source Software (OSS) projects. The code snippet ranks …

On the naturalness of fuzzer-generated code

RH Kambhamettu, J Billos, T Oluwaseun-Apo… - Proceedings of the 19th …, 2022 - dl.acm.org
Compiler fuzzing tools such as Csmith have uncovered many bugs in compilers by randomly
sampling programs from a generative model. The success of these tools is often attributed to …