The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only
Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical …
curated``high-quality''corpora, such as social media conversations, books, or technical …
A survey of reasoning with foundation models
Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …
Distilled GPT for source code summarization
CY Su, C McMillan - Automated Software Engineering, 2024 - Springer
A code summary is a brief natural language description of source code. Summaries are
usually only a single sentence long, and yet form the backbone of developer documentation …
usually only a single sentence long, and yet form the backbone of developer documentation …
Source code summarization in the era of large language models
To support software developers in understanding and maintaining programs, various
automatic (source) code summarization techniques have been proposed to generate a …
automatic (source) code summarization techniques have been proposed to generate a …
A survey of neural code intelligence: Paradigms, advances and beyond
Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …
code--holds immense potential for transformative impacts on the whole society. Bridging the …
Concerned with Data Contamination? Assessing Countermeasures in Code Language Model
Various techniques have been proposed to leverage the capabilities of code language
models (CLMs) for SE tasks. While these techniques typically evaluate their effectiveness …
models (CLMs) for SE tasks. While these techniques typically evaluate their effectiveness …
A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection
Large Language Models (LLMs) have demonstrated great potential for code generation and
other software engineering tasks. Vulnerability detection is of crucial importance to …
other software engineering tasks. Vulnerability detection is of crucial importance to …
CodeS: Natural Language to Code Repository via Multi-Layer Sketch
The impressive performance of large language models (LLMs) on code-related tasks has
shown the potential of fully automated software development. In light of this, we introduce a …
shown the potential of fully automated software development. In light of this, we introduce a …
Text2analysis: A benchmark of table question answering with advanced data analysis and unclear queries
Tabular data analysis is crucial in various fields, and large language models show promise
in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL …
in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL …
On the effectiveness of large language models for github workflows
X Zhang, S Muralee, S Cherupattamoolayil… - Proceedings of the 19th …, 2024 - dl.acm.org
GitHub workflows or GitHub CI is a popular continuous integration platform that enables
developers to automate various software engineering tasks by specifying them as workflows …
developers to automate various software engineering tasks by specifying them as workflows …