A review on design inspired subsampling for big data
J Yu, M Ai, Z Ye - Statistical Papers, 2024 - Springer
Subsampling focuses on selecting a subsample that can efficiently sketch the information of
the original data in terms of statistical inference. It provides a powerful tool in big data …
the original data in terms of statistical inference. It provides a powerful tool in big data …
Lowcon: A design-based subsampling approach in a misspecified linear model
We consider a measurement constrained supervised learning problem, that is,(i) full sample
of the predictors are given;(ii) the response observations are unavailable and expensive to …
of the predictors are given;(ii) the response observations are unavailable and expensive to …
Information-based optimal subdata selection for non-linear models
Subdata selection methods provide flexible tradeoffs between computational complexity and
statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for …
statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for …
An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation
Subsampling methods aim to select a subsample as a surrogate for the observed sample.
Such methods have been used pervasively in large-scale data analytics, active learning …
Such methods have been used pervasively in large-scale data analytics, active learning …
Smoothing splines approximation using Hilbert curve basis selection
Smoothing splines have been used pervasively in nonparametric regressions. However, the
computational burden of smoothing splines is significant when the sample size n is large …
computational burden of smoothing splines is significant when the sample size n is large …
Group-Orthogonal Subsampling for Hierarchical Data Based on Linear Mixed Models
Hierarchical data analysis is crucial in various fields for making discoveries. The linear
mixed model is often used for training hierarchical data, but its parameter estimation is …
mixed model is often used for training hierarchical data, but its parameter estimation is …
Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples
Data subsampling has become widely recognized as a tool to overcome computational and
economic bottlenecks in analyzing massive datasets. We contribute to the development of …
economic bottlenecks in analyzing massive datasets. We contribute to the development of …
Optimal decorrelated score subsampling for generalized linear models with massive data
In this paper, we consider the unified optimal subsampling estimation and inference on the
low-dimensional parameter of main interest in the presence of the nuisance parameter for …
low-dimensional parameter of main interest in the presence of the nuisance parameter for …
Residual projection for quantile regression in vertically partitioned big data
Standard regression techniques model only the mean of the response variable. Quantile
regression (QR) is more powerful in that it depicts a comprehensive relationship between …
regression (QR) is more powerful in that it depicts a comprehensive relationship between …
Optimal subsampling for linear quantile regression models
Y Fan, Y Liu, L Zhu - Canadian Journal of Statistics, 2021 - Wiley Online Library
Subsampling techniques are efficient methods for handling big data. Quite a few optimal
sampling methods have been developed for parametric models in which the loss functions …
sampling methods have been developed for parametric models in which the loss functions …