A review on design inspired subsampling for big data

J Yu, M Ai, Z Ye - Statistical Papers, 2024 - Springer
Subsampling focuses on selecting a subsample that can efficiently sketch the information of
the original data in terms of statistical inference. It provides a powerful tool in big data …

Projection‐based techniques for high‐dimensional optimal transport problems

J Zhang, P Ma, W Zhong, C Meng - Wiley Interdisciplinary …, 2023 - Wiley Online Library
Optimal transport (OT) methods seek a transformation map (or plan) between two probability
measures, such that the transformation has the minimum transportation cost. Such a …

Subdata selection algorithm for linear model discrimination

J Yu, HY Wang - Statistical Papers, 2022 - Springer
A statistical method is likely to be sub-optimal if the assumed model does not reflect the
structure of the data at hand. For this reason, it is important to perform model selection …

Information-based optimal subdata selection for non-linear models

J Yu, J Liu, HY Wang - Statistical Papers, 2023 - Springer
Subdata selection methods provide flexible tradeoffs between computational complexity and
statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for …

Model-robust subdata selection for big data

C Shi, B Tang - Journal of Statistical Theory and Practice, 2021 - Springer
Subdata selection is necessary because of challenges arising from statistical analysis of big
data using limited computing resources. The existing work on subdata selection relies …

Smoothing splines approximation using Hilbert curve basis selection

C Meng, J Yu, Y Chen, W Zhong… - Journal of Computational …, 2022 - Taylor & Francis
Smoothing splines have been used pervasively in nonparametric regressions. However, the
computational burden of smoothing splines is significant when the sample size n is large …

Group-Orthogonal Subsampling for Hierarchical Data Based on Linear Mixed Models

J Zhu, L Wang, F Sun - Journal of Computational and Graphical …, 2024 - Taylor & Francis
Hierarchical data analysis is crucial in various fields for making discoveries. The linear
mixed model is often used for training hierarchical data, but its parameter estimation is …

Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples

H Imberg, X Yang, C Flannagan, J Bärgman - Technometrics, 2024 - Taylor & Francis
Data subsampling has become widely recognized as a tool to overcome computational and
economic bottlenecks in analyzing massive datasets. We contribute to the development of …

Model-free subsampling method based on uniform designs

M Zhang, Y Zhou, Z Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Subsampling or subdata selection is a useful approach in large-scale statistical learning.
Most existing studies focus on model-based subsampling methods which significantly …

Optimal sampling designs for multidimensional streaming time series with application to power grid sensor data

R Xie, S Bai, P Ma - The Annals of Applied Statistics, 2023 - projecteuclid.org
Optimal sampling designs for multidimensional streaming time series with application to power
grid sensor data Page 1 The Annals of Applied Statistics 2023, Vol. 17, No. 4, 3195–3215 …