A review on design inspired subsampling for big data
J Yu, M Ai, Z Ye - Statistical Papers, 2024 - Springer
Subsampling focuses on selecting a subsample that can efficiently sketch the information of
the original data in terms of statistical inference. It provides a powerful tool in big data …
the original data in terms of statistical inference. It provides a powerful tool in big data …
Projection‐based techniques for high‐dimensional optimal transport problems
Optimal transport (OT) methods seek a transformation map (or plan) between two probability
measures, such that the transformation has the minimum transportation cost. Such a …
measures, such that the transformation has the minimum transportation cost. Such a …
Subdata selection algorithm for linear model discrimination
A statistical method is likely to be sub-optimal if the assumed model does not reflect the
structure of the data at hand. For this reason, it is important to perform model selection …
structure of the data at hand. For this reason, it is important to perform model selection …
Information-based optimal subdata selection for non-linear models
Subdata selection methods provide flexible tradeoffs between computational complexity and
statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for …
statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for …
Model-robust subdata selection for big data
Subdata selection is necessary because of challenges arising from statistical analysis of big
data using limited computing resources. The existing work on subdata selection relies …
data using limited computing resources. The existing work on subdata selection relies …
Smoothing splines approximation using Hilbert curve basis selection
Smoothing splines have been used pervasively in nonparametric regressions. However, the
computational burden of smoothing splines is significant when the sample size n is large …
computational burden of smoothing splines is significant when the sample size n is large …
Group-Orthogonal Subsampling for Hierarchical Data Based on Linear Mixed Models
Hierarchical data analysis is crucial in various fields for making discoveries. The linear
mixed model is often used for training hierarchical data, but its parameter estimation is …
mixed model is often used for training hierarchical data, but its parameter estimation is …
Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples
Data subsampling has become widely recognized as a tool to overcome computational and
economic bottlenecks in analyzing massive datasets. We contribute to the development of …
economic bottlenecks in analyzing massive datasets. We contribute to the development of …
Model-free subsampling method based on uniform designs
M Zhang, Y Zhou, Z Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Subsampling or subdata selection is a useful approach in large-scale statistical learning.
Most existing studies focus on model-based subsampling methods which significantly …
Most existing studies focus on model-based subsampling methods which significantly …
Optimal sampling designs for multidimensional streaming time series with application to power grid sensor data
Optimal sampling designs for multidimensional streaming time series with application to power
grid sensor data Page 1 The Annals of Applied Statistics 2023, Vol. 17, No. 4, 3195–3215 …
grid sensor data Page 1 The Annals of Applied Statistics 2023, Vol. 17, No. 4, 3195–3215 …