- 学术资源搜索

The RefinedWeb dataset for Falcon LLM: Outperforming curated corpora with web data only

G Penedo, Q Malartic, D Hesslow… - Advances in …, 2023 - proceedings.neurips.cc

Large language models are commonly trained on a mixture of filtered web data and
curated``high-quality''corpora, such as social media conversations, books, or technical …

被引用次数：65 相关文章所有 4 个版本

[PDF] arxiv.org

A survey of reasoning with foundation models

J Sun, C Zheng, E Xie, Z Liu, R Chu, J Qiu, J Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Reasoning, a crucial ability for complex problem-solving, plays a pivotal role in various real-
world settings such as negotiation, medical diagnosis, and criminal investigation. It serves …

被引用次数：32 相关文章所有 2 个版本

[PDF] arxiv.org

Distilled GPT for source code summarization

CY Su, C McMillan - Automated Software Engineering, 2024 - Springer

A code summary is a brief natural language description of source code. Summaries are
usually only a single sentence long, and yet form the backbone of developer documentation …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Source code summarization in the era of large language models

W Sun, Y Miao, Y Li, H Zhang, C Fang, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

To support software developers in understanding and maintaining programs, various
automatic (source) code summarization techniques have been proposed to generate a …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

A survey of neural code intelligence: Paradigms, advances and beyond

Q Sun, Z Chen, F Xu, K Cheng, C Ma, Z Yin… - arXiv preprint arXiv …, 2024 - arxiv.org

Neural Code Intelligence--leveraging deep learning to understand, generate, and optimize
code--holds immense potential for transformative impacts on the whole society. Bridging the …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Concerned with Data Contamination? Assessing Countermeasures in Code Language Model

J Cao, W Zhang, SC Cheung - arXiv preprint arXiv:2403.16898, 2024 - arxiv.org

Various techniques have been proposed to leverage the capabilities of code language
models (CLMs) for SE tasks. While these techniques typically evaluate their effectiveness …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

A Comprehensive Study of the Capabilities of Large Language Models for Vulnerability Detection

B Steenhoek, MM Rahman, MK Roy, MS Alam… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated great potential for code generation and
other software engineering tasks. Vulnerability detection is of crucial importance to …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

CodeS: Natural Language to Code Repository via Multi-Layer Sketch

D Zan, A Yu, W Liu, D Chen, B Shen, W Li… - arXiv preprint arXiv …, 2024 - arxiv.org

The impressive performance of large language models (LLMs) on code-related tasks has
shown the potential of fully automated software development. In light of this, we introduce a …

被引用次数：6 相关文章所有 2 个版本

[PDF] aaai.org

Text2analysis: A benchmark of table question answering with advanced data analysis and unclear queries

X He, M Zhou, X Xu, X Ma, R Ding, L Du… - Proceedings of the …, 2024 - ojs.aaai.org

Tabular data analysis is crucial in various fields, and large language models show promise
in this area. However, current research mostly focuses on rudimentary tasks like Text2SQL …

被引用次数：14 相关文章所有 5 个版本

[PDF] acm.org

On the effectiveness of large language models for github workflows

X Zhang, S Muralee, S Cherupattamoolayil… - Proceedings of the 19th …, 2024 - dl.acm.org

GitHub workflows or GitHub CI is a popular continuous integration platform that enables
developers to automate various software engineering tasks by specifying them as workflows …

被引用次数：2 相关文章所有 2 个版本