A review on design inspired subsampling for big data

J Yu, M Ai, Z Ye - Statistical Papers, 2024 - Springer
Subsampling focuses on selecting a subsample that can efficiently sketch the information of
the original data in terms of statistical inference. It provides a powerful tool in big data …

Feature screening with conditional rank utility for big-data classification

X Li, C Xu - Journal of the American Statistical Association, 2024 - Taylor & Francis
Feature screening is a commonly used strategy to eliminate irrelevant features in high-
dimensional classification. When one encounters big datasets with both high dimensionality …

Subsampling and jackknifing: a practically convenient solution for large data analysis with limited computational resources

S Wu, X Zhu, H Wang - arXiv preprint arXiv:2304.06231, 2023 - arxiv.org
Modern statistical analysis often encounters datasets with large sizes. For these datasets,
conventional estimation methods can hardly be used immediately because practitioners …

A selective review on statistical methods for massive data computation: distributed computing, subsampling, and minibatch techniques

X Li, Y Gao, H Chang, D Huang, Y Ma… - Statistical Theory and …, 2024 - Taylor & Francis
This paper presents a selective review of statistical computation methods for massive data
analysis. A huge amount of statistical methods for massive data computation have been …

Combining random forest and multicollinearity modeling for index tracking

Y Cao, H Li, Y Yang - Communications in Statistics-Simulation and …, 2024 - Taylor & Francis
This paper studies the combination of random forest (RF) and classical statistical modeling.
We propose two algorithms: RF cluster+ ridge and RF regression+ ridge, in which the RF …

Supervised Stratified Subsampling for Predictive Analytics

MC Chang - Journal of Computational and Graphical Statistics, 2024 - Taylor & Francis
Predictive analytics involves the use of statistical models to make predictions; however, the
power of these techniques is hindered by ever-increasing quantities of data. The richness …

Distributed Conditional Feature Screening via Pearson Partial Correlation with FDR Control

N Pang, X Xia - arXiv preprint arXiv:2403.05792, 2024 - arxiv.org
This paper studies the distributed conditional feature screening for massive data with
ultrahigh-dimensional features. Specifically, three distributed partial correlation feature …

On the asymptotic properties of a bagging estimator with a massive dataset

Y Gao, R Zhang, H Wang - Stat, 2022 - Wiley Online Library
Bagging is a useful method for large‐scale statistical analysis, especially when the
computing resources are very limited. We study here the asymptotic properties of bagging …