A review on design inspired subsampling for big data

J Yu, M Ai, Z Ye - Statistical Papers, 2024 - Springer
Subsampling focuses on selecting a subsample that can efficiently sketch the information of
the original data in terms of statistical inference. It provides a powerful tool in big data …

Lowcon: A design-based subsampling approach in a misspecified linear model

C Meng, R Xie, A Mandal, X Zhang… - … of Computational and …, 2021 - Taylor & Francis
We consider a measurement constrained supervised learning problem, that is,(i) full sample
of the predictors are given;(ii) the response observations are unavailable and expensive to …

Information-based optimal subdata selection for non-linear models

J Yu, J Liu, HY Wang - Statistical Papers, 2023 - Springer
Subdata selection methods provide flexible tradeoffs between computational complexity and
statistical efficiency in analyzing big data. In this work, we investigate a new algorithm for …

An optimal transport approach for selecting a representative subsample with application in efficient kernel density estimation

J Zhang, C Meng, J Yu, M Zhang… - … of Computational and …, 2023 - Taylor & Francis
Subsampling methods aim to select a subsample as a surrogate for the observed sample.
Such methods have been used pervasively in large-scale data analytics, active learning …

Smoothing splines approximation using Hilbert curve basis selection

C Meng, J Yu, Y Chen, W Zhong… - Journal of Computational …, 2022 - Taylor & Francis
Smoothing splines have been used pervasively in nonparametric regressions. However, the
computational burden of smoothing splines is significant when the sample size n is large …

Group-Orthogonal Subsampling for Hierarchical Data Based on Linear Mixed Models

J Zhu, L Wang, F Sun - Journal of Computational and Graphical …, 2024 - Taylor & Francis
Hierarchical data analysis is crucial in various fields for making discoveries. The linear
mixed model is often used for training hierarchical data, but its parameter estimation is …

Active sampling: A machine-learning-assisted framework for finite population inference with optimal subsamples

H Imberg, X Yang, C Flannagan, J Bärgman - Technometrics, 2024 - Taylor & Francis
Data subsampling has become widely recognized as a tool to overcome computational and
economic bottlenecks in analyzing massive datasets. We contribute to the development of …

Optimal decorrelated score subsampling for generalized linear models with massive data

J Gao, L Wang, H Lian - Science China Mathematics, 2024 - Springer
In this paper, we consider the unified optimal subsampling estimation and inference on the
low-dimensional parameter of main interest in the presence of the nuisance parameter for …

Residual projection for quantile regression in vertically partitioned big data

Y Fan, JS Li, N Lin - Data Mining and Knowledge Discovery, 2023 - Springer
Standard regression techniques model only the mean of the response variable. Quantile
regression (QR) is more powerful in that it depicts a comprehensive relationship between …

Optimal subsampling for linear quantile regression models

Y Fan, Y Liu, L Zhu - Canadian Journal of Statistics, 2021 - Wiley Online Library
Subsampling techniques are efficient methods for handling big data. Quite a few optimal
sampling methods have been developed for parametric models in which the loss functions …