CODE-MVP: Learning to represent source code from multiple views with contrastive pre-training

X Wang, Y Wang, Y Wan, J Wang, P Zhou, L Li… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent years have witnessed increasing interest in code representation learning, which
aims to represent the semantics of source code into distributed vectors. Currently, various …

Models are codes: Towards measuring malicious code poisoning attacks on pre-trained model hubs

J Zhao, S Wang, Y Zhao, X Hou, K Wang… - 2024 39th IEEE/ACM …, 2024 - ieeexplore.ieee.org
The proliferation of pre-trained models (PTMs) and datasets has led to the emergence of
centralized model hubs like Hugging Face, which facilitate collaborative development and …

Enhancing comprehension and navigation in Jupyter notebooks with static analysis

APS Venkatesh, J Wang, L Li… - 2023 IEEE international …, 2023 - ieeexplore.ieee.org
Jupyter notebooks enable developers to interleave code snippets with rich-text and in-line
visualizations. Data scientists use Jupyter notebook as the de-facto standard for creating …

Peatmoss: A dataset and initial analysis of pre-trained models in open-source software

W Jiang, J Yasmin, J Jones, N Synovic… - 2024 IEEE/ACM 21st …, 2024 - ieeexplore.ieee.org
The development and training of deep learning models have become increasingly costly
and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for …

Data leakage in notebooks: Static detection and better processes

C Yang, RA Brower-Sinning, G Lewis… - Proceedings of the 37th …, 2022 - dl.acm.org
Data science pipelines to train and evaluate models with machine learning may contain
bugs just like any other code. Leakage between training and test data can lead to …

Static analysis driven enhancements for comprehension in machine learning notebooks

APS Venkatesh, S Sabu, M Chekkapalli… - Empirical Software …, 2024 - Springer
Jupyter notebooks have emerged as the predominant tool for data scientists to develop and
share machine learning solutions, primarily using Python as the programming language …

Investigating and Detecting Silent Bugs in PyTorch Programs

S Hong, H Sun, X Gao, SH Tan - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Deep Learning (DL) has been widely applied in various fields. Unlike traditional software,
DL programs possess the “black box” characteristic that can make it challenging for …

Exploring Hyperparameter Usage and Tuning in Machine Learning Research

S Simon, N Kolyada, C Akiki, M Potthast… - 2023 IEEE/ACM 2nd …, 2023 - ieeexplore.ieee.org
The success of machine learning (ML) models depends on careful experimentation and
optimization of their hyperparameters. Tuning can affect the reliability and accuracy of a …

Hard to Read and Understand Pythonic Idioms? DeIdiom and Explain Them in Non-Idiomatic Equivalent Code

Z Zhang, Z Xing, D Zhao, Q Lu, X Xu… - Proceedings of the IEEE …, 2024 - dl.acm.org
The Python community strives to design pythonic idioms so that Python users can achieve
their intent in a more concise and efficient way. According to our analysis of 154 questions …

Complex Python features in the wild

Y Yang, A Milanova, M Hirzel - … of the 19th International Conference on …, 2022 - dl.acm.org
While Python is increasingly popular, program analysis tooling for Python is lagging. This is
due, in part, to complex features of the Python language---features with difficult to …