Starcoder 2 and the stack v2: The next generation
The BigCode project, an open-scientific collaboration focused on the responsible
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …
development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In …
Robustness, security, privacy, explainability, efficiency, and usability of large language models for code
Large language models for code (LLM4Code), which demonstrate strong performance (eg,
high accuracy) in processing source code, have significantly transformed software …
high accuracy) in processing source code, have significantly transformed software …
Exploring {ChatGPT's} Capabilities on Vulnerability Management
Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works
show that ChatGPT has the capabilities of processing foundational code analysis tasks …
show that ChatGPT has the capabilities of processing foundational code analysis tasks …
Code Membership Inference for Detecting Unauthorized Data Use in Code Pre-trained Language Models
S Zhang, H Li - arXiv preprint arXiv:2312.07200, 2023 - arxiv.org
Code pre-trained language models (CPLMs) have received great attention since they can
benefit various tasks that facilitate software development and maintenance. However …
benefit various tasks that facilitate software development and maintenance. However …
Trained Without My Consent: Detecting Code Inclusion In Language Models Trained on Code
Code auditing ensures that the developed code adheres to standards, regulations, and
copyright protection by verifying that it does not contain code from protected sources. The …
copyright protection by verifying that it does not contain code from protected sources. The …
Does Your Neural Code Completion Model Use My Code? A Membership Inference Approach
Recent years have witnessed significant progress in developing deep learning-based
models for automated code completion. Although using source code in GitHub has been a …
models for automated code completion. Although using source code in GitHub has been a …
An Exploratory Investigation into Code License Infringements in Large Language Model Training Datasets
Does the training of large language models potentially infringe upon code licenses?
Furthermore, are there any datasets available that can be safely used for training these …
Furthermore, are there any datasets available that can be safely used for training these …