Deep neural networks and tabular data: A survey

V Borisov, T Leemann, K Seßler, J Haug… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Heterogeneous tabular data are the most commonly used form of data and are essential for
numerous critical and computationally demanding applications. On homogeneous datasets …

Learning parameter distributions to detect concept drift in data streams

J Haug, G Kasneci - 2020 25th international conference on …, 2021 - ieeexplore.ieee.org
Data distributions in streaming environments are usually not stationary. In order to maintain
a high predictive quality at all times, online learning models need to adapt to distributional …

On baselines for local feature attributions

J Haug, S Zürn, P El-Jiz, G Kasneci - arXiv preprint arXiv:2101.00905, 2021 - arxiv.org
High-performing predictive models, such as neural nets, usually operate as black boxes,
which raises serious concerns about their interpretability. Local feature attribution methods …

Change detection for local explainability in evolving data streams

J Haug, A Braun, S Zürn, G Kasneci - Proceedings of the 31st ACM …, 2022 - dl.acm.org
As complex machine learning models are increasingly used in sensitive applications like
banking, trading or credit scoring, there is a growing demand for reliable explanation …

Dynamic model tree for interpretable data stream learning

J Haug, K Broelemann… - 2022 IEEE 38th …, 2022 - ieeexplore.ieee.org
Data streams are ubiquitous in modern business and society. In practice, data streams may
evolve over time and cannot be stored indefinitely. Effective and transparent machine …

Standardized Evaluation of Machine Learning Methods for Evolving Data Streams

J Haug, E Tramountani, G Kasneci - arXiv preprint arXiv:2204.13625, 2022 - arxiv.org
Due to the unspecified and dynamic nature of data streams, online machine learning
requires powerful and flexible solutions. However, evaluating online machine learning …

Online feature screening for data streams with concept drift

M Wang, A Barbu - IEEE Transactions on Knowledge and Data …, 2022 - ieeexplore.ieee.org
Screening feature selection methods are often used as a preprocessing step for reducing
the number of variables before training a model. Traditional screening methods only focus …

Employing Two-Dimensional Word Embedding for Difficult Tabular Data Stream Classification

P Zyblewski - arXiv preprint arXiv:2404.15836, 2024 - arxiv.org
Rapid technological advances are inherently linked to the increased amount of data, a
substantial portion of which can be interpreted as data stream, capable of exhibiting the …

Towards Reliable Machine Learning in Evolving Data Streams

JC Haug - 2022 - tobias-lib.ub.uni-tuebingen.de
Data streams are ubiquitous in many areas of modern life. For example, applications in
healthcare, education, finance, or advertising often deal with large-scale and evolving data …

Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems

B Škrlj, N Ki-Tov, L Edelist, N Silberstein… - arXiv preprint arXiv …, 2023 - arxiv.org
Real-world production systems often grapple with maintaining data quality in large-scale,
dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature …