[图书][B] Statistical foundations of data science
Statistical Foundations of Data Science gives a thorough introduction to commonly used
statistical models, contemporary statistical machine learning techniques and algorithms …
statistical models, contemporary statistical machine learning techniques and algorithms …
[HTML][HTML] Projected principal component analysis in factor models
This paper introduces a Projected Principal Component Analysis (Projected-PCA), which
employees principal component analysis to the projected (smoothed) data matrix onto a …
employees principal component analysis to the projected (smoothed) data matrix onto a …
Statistical analysis of big data on pharmacogenomics
This paper discusses statistical methods for estimating complex correlation structure from
large pharmacogenomic datasets. We selectively review several prominent statistical …
large pharmacogenomic datasets. We selectively review several prominent statistical …
[HTML][HTML] Confounder adjustment in multiple hypothesis testing
We consider large-scale studies in which thousands of significance tests are performed
simultaneously. In some of these studies, the multiple testing procedure can be severely …
simultaneously. In some of these studies, the multiple testing procedure can be severely …
[PDF][PDF] Removing unwanted variation from high dimensional data with negative controls
High dimensional data suffer from unwanted variation, such as the batch effects common in
microarray data. Unwanted variation complicates the analysis of high dimensional data …
microarray data. Unwanted variation complicates the analysis of high dimensional data …
[HTML][HTML] A new perspective on robust M-estimation: Finite sample theory and applications to dependence-adjusted multiple testing
Heavy-tailed errors impair the accuracy of the least squares estimate, which can be spoiled
by a single grossly outlying observation. As argued in the seminal work of Peter Huber in …
by a single grossly outlying observation. As argued in the seminal work of Peter Huber in …
Estimation of the false discovery proportion with unknown dependence
J Fan, X Han - Journal of the Royal Statistical Society Series B …, 2017 - academic.oup.com
Large-scale multiple testing with correlated test statistics arises frequently in much scientific
research. Incorporating correlation information in approximating the false discovery …
research. Incorporating correlation information in approximating the false discovery …
[HTML][HTML] Robust high dimensional factor models with applications to statistical machine learning
Factor models are a class of powerful statistical models that have been widely used to deal
with dependent measurements that arise frequently from various applications from genomics …
with dependent measurements that arise frequently from various applications from genomics …
FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control
Large-scale multiple testing with correlated and heavy-tailed data arises in a wide range of
research areas from genomics, medical imaging to finance. Conventional methods for …
research areas from genomics, medical imaging to finance. Conventional methods for …
Community network auto-regression for high-dimensional time series
Modeling responses on the nodes of a large-scale network is an important task that arises
commonly in practice. This paper proposes a community network vector autoregressive …
commonly in practice. This paper proposes a community network vector autoregressive …