A review on design inspired subsampling for big data
J Yu, M Ai, Z Ye - Statistical Papers, 2024 - Springer
Subsampling focuses on selecting a subsample that can efficiently sketch the information of
the original data in terms of statistical inference. It provides a powerful tool in big data …
the original data in terms of statistical inference. It provides a powerful tool in big data …
Feature screening with conditional rank utility for big-data classification
X Li, C Xu - Journal of the American Statistical Association, 2024 - Taylor & Francis
Feature screening is a commonly used strategy to eliminate irrelevant features in high-
dimensional classification. When one encounters big datasets with both high dimensionality …
dimensional classification. When one encounters big datasets with both high dimensionality …
Subsampling and jackknifing: a practically convenient solution for large data analysis with limited computational resources
Modern statistical analysis often encounters datasets with large sizes. For these datasets,
conventional estimation methods can hardly be used immediately because practitioners …
conventional estimation methods can hardly be used immediately because practitioners …
A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques
This paper presents a selective review of statistical computation methods for massive data
analysis. A huge amount of statistical methods for massive data computation have been …
analysis. A huge amount of statistical methods for massive data computation have been …
Combining random forest and multicollinearity modeling for index tracking
Y Cao, H Li, Y Yang - Communications in Statistics-Simulation and …, 2024 - Taylor & Francis
This paper studies the combination of random forest (RF) and classical statistical modeling.
We propose two algorithms: RF cluster+ ridge and RF regression+ ridge, in which the RF …
We propose two algorithms: RF cluster+ ridge and RF regression+ ridge, in which the RF …
Supervised Stratified Subsampling for Predictive Analytics
MC Chang - Journal of Computational and Graphical Statistics, 2024 - Taylor & Francis
Predictive analytics involves the use of statistical models to make predictions; however, the
power of these techniques is hindered by ever-increasing quantities of data. The richness …
power of these techniques is hindered by ever-increasing quantities of data. The richness …
Distributed Conditional Feature Screening via Pearson Partial Correlation with FDR Control
N Pang, X Xia - arXiv preprint arXiv:2403.05792, 2024 - arxiv.org
This paper studies the distributed conditional feature screening for massive data with
ultrahigh-dimensional features. Specifically, three distributed partial correlation feature …
ultrahigh-dimensional features. Specifically, three distributed partial correlation feature …
On the asymptotic properties of a bagging estimator with a massive dataset
Bagging is a useful method for large‐scale statistical analysis, especially when the
computing resources are very limited. We study here the asymptotic properties of bagging …
computing resources are very limited. We study here the asymptotic properties of bagging …