Challenges of big data analysis

J Fan, F Han, H Liu - National science review, 2014 - academic.oup.com
Big Data bring new opportunities to modern society and challenges to data scientists. On the
one hand, Big Data hold great promises for discovering subtle population patterns and …

High-dimensional statistics with a view toward applications in biology

P Bühlmann, M Kalisch, L Meier - Annual Review of Statistics …, 2014 - annualreviews.org
We review statistical methods for high-dimensional data analysis and pay particular
attention to recent developments for assessing uncertainties in terms of controlling false …

Predicting returns with text data

ZT Ke, BT Kelly, D Xiu - 2019 - nber.org
We introduce a new text-mining methodology that extracts sentiment information from news
articles to predict asset returns. Unlike more common sentiment scores used for stock return …

[HTML][HTML] Development of a stacked ensemble model for forecasting and analyzing daily average PM2. 5 concentrations in Beijing, China

B Zhai, J Chen - Science of the Total Environment, 2018 - Elsevier
A stacked ensemble model is developed for forecasting and analyzing the daily average
concentrations of fine particulate matter (PM 2.5) in Beijing, China. Special feature extraction …

Non-negative least squares for high-dimensional linear models: Consistency and sparse recovery without regularization

M Slawski, M Hein - 2013 - projecteuclid.org
Least squares fitting is in general not useful for high-dimensional linear models, in which the
number of predictors is of the same or even larger order of magnitude than the number of …

Higher criticism for large-scale inference, especially for rare and weak effects

D Donoho, J Jin - 2015 - projecteuclid.org
In modern high-throughput data analysis, researchers perform a large number of statistical
tests, expecting to find perhaps a small fraction of significant effects against a predominantly …

Exact post model selection inference for marginal screening

JD Lee, JE Taylor - Advances in neural information …, 2014 - proceedings.neurips.cc
We develop a framework for post model selection inference, via marginal screening, in
linear regression. At the core of this framework is a result that characterizes the exact …

Matrix factorization techniques in machine learning, signal processing, and statistics

KL Du, MNS Swamy, ZQ Wang, WH Mow - Mathematics, 2023 - mdpi.com
Compressed sensing is an alternative to Shannon/Nyquist sampling for acquiring sparse or
compressible signals. Sparse coding represents a signal as a sparse linear combination of …

Ordered weighted l1 regularized regression with strongly correlated covariates: Theoretical aspects

M Figueiredo, R Nowak - Artificial Intelligence and Statistics, 2016 - proceedings.mlr.press
This paper studies the ordered weighted L1 (OWL) family of regularizers for sparse linear
regression with strongly correlated covariates. We prove sufficient conditions for clustering …

Maximin effects in inhomogeneous large-scale data

N Meinshausen, P Bühlmann - 2015 - projecteuclid.org
Large-scale data are often characterized by some degree of inhomogeneity as data are
either recorded in different time regimes or taken from multiple sources. We look at …