Towards communication-efficient vertical federated learning training via cache-enabled local updates

F Fu, X Miao, J Jiang, H Xue, B Cui - arXiv preprint arXiv:2207.14628, 2022 - arxiv.org
Vertical federated learning (VFL) is an emerging paradigm that allows different parties (eg,
organizations or enterprises) to collaboratively build machine learning models with privacy …

Auto-differentiation of relational computations for very large scale machine learning

Y Tang, Z Ding, D Jankov, B Yuan… - International …, 2023 - proceedings.mlr.press
The relational data model was designed to facilitate large-scale data management and
analytics. We consider the problem of how to differentiate computations expressed …

Database native model selection: Harnessing deep neural networks in database systems

N Xing, S Cai, G Chen, Z Luo, BC Ooi… - Proceedings of the VLDB …, 2024 - dl.acm.org
The growing demand for advanced analytics beyond statistical aggregation calls for
database systems that support effective model selection of deep neural networks (DNNs) …

Powering in-database dynamic model slicing for structured data analytics

L Zeng, N Xing, S Cai, G Chen, BC Ooi, J Pei… - arXiv preprint arXiv …, 2024 - arxiv.org
Relational database management systems (RDBMS) are widely used for the storage and
retrieval of structured data. To derive insights beyond statistical aggregation, we typically …

Stochastic gradient descent without full data shuffle: with applications to in-database machine learning and deep learning systems

L Xu, S Qiu, B Yuan, J Jiang, C Renggli, S Gan… - The VLDB Journal, 2024 - Springer
Modern machine learning (ML) systems commonly use stochastic gradient descent (SGD) to
train ML models. However, SGD relies on random data order to converge, which usually …

A Selective Preprocessing Offloading Framework for Reducing Data Traffic in DL Training

M Wang, G Waldspurger, S Sundararaman - Proceedings of the 16th …, 2024 - dl.acm.org
Deep learning (DL) training is data-intensive and often bottlenecked by fetching data from
remote storage. Recognizing that many samples' sizes diminish during data preprocessing …

Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training

Y Yang, M Jones, MC Mozer, M Ren - arXiv preprint arXiv:2403.09613, 2024 - arxiv.org
We explore the training dynamics of neural networks in a structured non-IID setting where
documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer …

In-database query optimization on SQL with ML predicates

Y Guo, G Li, R Hu, Y Wang - The VLDB Journal, 2025 - Springer
Extended SQL with machine learning (ML) predicates, commonly referred to as SQL+ ML,
integrates ML abilities into traditional SQL processing in databases. When processing SQL+ …

NeurDB: On the Design and Implementation of an AI-powered Autonomous Database

Z Zhao, S Cai, H Gao, H Pan, S Xiang, N Xing… - arXiv preprint arXiv …, 2024 - arxiv.org
Databases are increasingly embracing AI to provide autonomous system optimization and
intelligent in-database analytics, aiming to relieve end-user burdens across various industry …

moduli: A Disaggregated Data Management Architecture for Data-Intensive Workflows

P Ceravolo, T Catarci, M Console… - ACM SIGWEB …, 2024 - dl.acm.org
As companies store, process, and analyse bigger and bigger volumes of highly
heterogeneous data, novel research and technological challenges are emerging. Traditional …