[图书][B] Statistical foundations of data science

J Fan, R Li, CH Zhang, H Zou - 2020 - taylorfrancis.com
Statistical Foundations of Data Science gives a thorough introduction to commonly used
statistical models, contemporary statistical machine learning techniques and algorithms …

[HTML][HTML] Projected principal component analysis in factor models

J Fan, Y Liao, W Wang - Annals of statistics, 2016 - ncbi.nlm.nih.gov
This paper introduces a Projected Principal Component Analysis (Projected-PCA), which
employees principal component analysis to the projected (smoothed) data matrix onto a …

Statistical analysis of big data on pharmacogenomics

J Fan, H Liu - Advanced drug delivery reviews, 2013 - Elsevier
This paper discusses statistical methods for estimating complex correlation structure from
large pharmacogenomic datasets. We selectively review several prominent statistical …

[HTML][HTML] Confounder adjustment in multiple hypothesis testing

J Wang, Q Zhao, T Hastie, AB Owen - Annals of statistics, 2017 - ncbi.nlm.nih.gov
We consider large-scale studies in which thousands of significance tests are performed
simultaneously. In some of these studies, the multiple testing procedure can be severely …

[PDF][PDF] Removing unwanted variation from high dimensional data with negative controls

JA Gagnon-Bartsch, L Jacob, TP Speed - … Tech Reports from Dep Stat Univ …, 2013 - Citeseer
High dimensional data suffer from unwanted variation, such as the batch effects common in
microarray data. Unwanted variation complicates the analysis of high dimensional data …

[HTML][HTML] A new perspective on robust M-estimation: Finite sample theory and applications to dependence-adjusted multiple testing

WX Zhou, K Bose, J Fan, H Liu - Annals of statistics, 2018 - ncbi.nlm.nih.gov
Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled
by a single grossly outlying observation. As argued in the seminal work of Peter Huber in …

Estimation of the false discovery proportion with unknown dependence

J Fan, X Han - Journal of the Royal Statistical Society Series B …, 2017 - academic.oup.com
Large-scale multiple testing with correlated test statistics arises frequently in much scientific
research. Incorporating correlation information in approximating the false discovery …

[HTML][HTML] Robust high dimensional factor models with applications to statistical machine learning

J Fan, K Wang, Y Zhong, Z Zhu - … science: a review journal of the …, 2021 - ncbi.nlm.nih.gov
Factor models are a class of powerful statistical models that have been widely used to deal
with dependent measurements that arise frequently from various applications from genomics …

FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control

J Fan, Y Ke, Q Sun, WX Zhou - Journal of the American Statistical …, 2019 - Taylor & Francis
Large-scale multiple testing with correlated and heavy-tailed data arises in a wide range of
research areas from genomics, medical imaging to finance. Conventional methods for …

Community network auto-regression for high-dimensional time series

EY Chen, J Fan, X Zhu - Journal of Econometrics, 2023 - Elsevier
Modeling responses on the nodes of a large-scale network is an important task that arises
commonly in practice. This paper proposes a community network vector autoregressive …