On the relevance of data science for flight delay research: a systematic review

L Carvalho, A Sternberg, L Maia Goncalves… - Transport …, 2021 - Taylor & Francis
Flight delays are a significant problem for society as they evenly impair airlines, transport
companies, air traffic controllers, facility managers, and passengers. Studying prior flight …

Making recursive Bayesian inference accessible

MB Hooten, DS Johnson, BM Brost - The American Statistician, 2021 - Taylor & Francis
Bayesian models provide recursive inference naturally because they can formally reconcile
new data and existing scientific information. However, popular use of Bayesian methods …

Divide-and-conquer methods for big data analysis

X Chen, JQ Cheng, M Xie - arXiv preprint arXiv:2102.10771, 2021 - arxiv.org
In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-
step process: first splitting a data set into several smaller ones; then analyzing each set …

[HTML][HTML] Renewable quantile regression for streaming data sets

R Jiang, K Yu - Neurocomputing, 2022 - Elsevier
Online updating is an important statistical method for the analysis of big data arriving in
streams due to its ability to break the storage barrier and the computational barrier under …

Distributed subdata selection for big data via sampling-based approach

H Zhang, HY Wang - Computational Statistics & Data Analysis, 2021 - Elsevier
With the development of modern technologies, it is possible to gather an extraordinarily
large number of observations. Due to the storage or transmission burden, big data are …

Online updating of survival analysis

J Wu, MH Chen, ED Schifano, J Yan - Journal of Computational …, 2021 - Taylor & Francis
When large amounts of survival data arrive in streams, conventional estimation methods
become computationally infeasible since they require access to all observations at each …

Least squares model averaging for distributed data

H Zhang, Z Liu, G Zou - Journal of Machine Learning Research, 2023 - jmlr.org
Divide and conquer algorithm is a common strategy applied in big data. Model averaging
has the natural divide-and-conquer feature, but its theory has not been developed in big …

Optimal subsampling for parametric accelerated failure time models with massive survival data

Z Yang, HY Wang, J Yan - Statistics in Medicine, 2022 - Wiley Online Library
With increasing availability of massive survival data, researchers need valid statistical
inferences for survival modeling whose computation is not limited by computer memories …

Fast optimal subsampling probability approximation for generalized linear models

JC Lee, ED Schifano, HY Wang - Econometrics and Statistics, 2021 - Elsevier
For massive data, subsampling techniques are popular to mitigate computational burden by
reducing the data size. In a subsampling approach, subsampling probabilities for each data …

Online two‐way estimation and inference via linear mixed‐effects models

L Luo, L Li - Statistics in medicine, 2022 - Wiley Online Library
In this article, we tackle the estimation and inference problem of analyzing distributed
streaming data that is collected continuously over multiple data sites. We propose an online …