Dataset discovery and exploration: A survey

NW Paton, J Chen, Z Wu - ACM Computing Surveys, 2023 - dl.acm.org
Data scientists are tasked with obtaining insights from data. However, suitable data is often
not immediately at hand, and there may be many potentially relevant datasets in a data lake …

Deep transfer learning & beyond: Transformer language models in information systems research

R Gruetzemacher, D Paradice - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
AI is widely thought to be poised to transform business, yet current perceptions of the scope
of this transformation may be myopic. Recent progress in natural language processing …

PASTA: table-operations aware fact verification via sentence-table cloze pre-training

Z Gu, J Fan, N Tang, P Nakov, X Zhao, X Du - arXiv preprint arXiv …, 2022 - arxiv.org
Fact verification has attracted a lot of research attention recently, eg, in journalism,
marketing, and policymaking, as misinformation and disinformation online can sway one's …

Pretrained generalized autoregressive model with adaptive probabilistic label clusters for extreme multi-label text classification

H Ye, Z Chen, DH Wang… - … Conference on Machine …, 2020 - proceedings.mlr.press
Extreme multi-label text classification (XMTC) is a task for tagging a given text with the most
relevant labels from an extremely large label set. We propose a novel deep learning method …

Deepjoin: Joinable table discovery with pre-trained language models

Y Dong, C Xiao, T Nozawa, M Enomoto… - arXiv preprint arXiv …, 2022 - arxiv.org
Due to the usefulness in data enrichment for data analysis tasks, joinable table discovery
has become an important operation in data lake management. Existing approaches target …

Strubert: Structure-aware bert for table search and matching

M Trabelsi, Z Chen, S Zhang, BD Davison… - Proceedings of the ACM …, 2022 - dl.acm.org
A table is composed of data values that are organized in rows and columns providing
implicit structural information. A table is usually accompanied by secondary information such …

Retrieving complex tables with multi-granular graph representation learning

F Wang, K Sun, M Chen, J Pujara… - Proceedings of the 44th …, 2021 - dl.acm.org
The task of natural language table retrieval (NLTR) seeks to retrieve semantically relevant
tables based on natural language queries. Existing learning systems for this task often treat …

Neural ranking models for document retrieval

M Trabelsi, Z Chen, BD Davison, J Heflin - Information Retrieval Journal, 2021 - Springer
Ranking models are the main components of information retrieval systems. Several
approaches to ranking are based on traditional machine learning algorithms using a set of …

Is table retrieval a solved problem? exploring join-aware multi-table retrieval

PB Chen, Y Zhang, D Roth - … of the 62nd Annual Meeting of the …, 2024 - aclanthology.org
Retrieving relevant tables containing the necessary information to accurately answer a given
question over tables is critical to open-domain question-answering (QA) systems. Previous …

Mixed-modality representation learning and pre-training for joint table-and-text retrieval in openqa

J Huang, W Zhong, Q Liu, M Gong, D Jiang… - arXiv preprint arXiv …, 2022 - arxiv.org
Retrieving evidences from tabular and textual resources is essential for open-domain
question answering (OpenQA), which provides more comprehensive information. However …