Data wrangling for big data: Challenges and opportunities

T Furche, G Gottlob, L Libkin, G Orsi… - … on Extending Database …, 2016 - research.ed.ac.uk
Data wrangling is the process by which the data required by an application is identified,
extracted, cleaned and integrated, to yield a data set that is suitable for exploration and …

Mave: A product dataset for multi-source attribute value extraction

L Yang, Q Wang, Z Yu, A Kulkarni, S Sanghai… - Proceedings of the …, 2022 - dl.acm.org
Attribute value extraction refers to the task of identifying values of an attribute of interest from
product information. Product attribute values are essential in many e-commerce scenarios …

Measurement extraction with natural language processing: a review

J Göpfert, P Kuckertz, J Weinand… - Findings of the …, 2022 - aclanthology.org
Quantitative data is important in many domains. Information extraction methods draw
structured data from documents. However, the extraction of quantities and their contexts has …

Autoknow: Self-driving knowledge collection for products of thousands of types

XL Dong, X He, A Kan, X Li, Y Liang, J Ma… - Proceedings of the 26th …, 2020 - dl.acm.org
Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs
have firmly established themselves as valuable sources of information for search and …

Learning to extract attribute value from product via question answering: A multi-task approach

Q Wang, L Yang, B Kanagal, S Sanghai… - Proceedings of the 26th …, 2020 - dl.acm.org
Attribute value extraction refers to the task of identifying values of an attribute of interest from
product information. It is an important research topic which has been widely studied in e …

Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title

H Xu, W Wang, X Mao, X Jiang… - Proceedings of the 57th …, 2019 - aclanthology.org
Supplementing product information by extracting attribute values from title is a crucial task in
e-Commerce domain. Previous studies treat each attribute only as an entity type and build …

The WDC training dataset and gold standard for large-scale product matching

A Primpeli, R Peeters, C Bizer - … Proceedings of The 2019 World Wide …, 2019 - dl.acm.org
A current research question in the area of entity resolution (also called link discovery or
duplicate detection) is whether and in which cases embeddings and deep neural network …

Construction and applications of billion-scale pre-trained multimodal business knowledge graph

S Deng, C Wang, Z Li, N Zhang, Z Dai… - 2023 IEEE 39th …, 2023 - ieeexplore.ieee.org
Business Knowledge Graphs (KGs) are important to many enterprises today, providing
factual knowledge and structured data that steer many products and make them more …

A machine learning approach for product matching and categorization

P Ristoski, P Petrovski, P Mika, H Paulheim - Semantic web, 2018 - content.iospress.com
Consumers today have the option to purchase products from thousands of e-shops.
However, the completeness of the product specifications and the taxonomies used for …

Deep neural networks for web page information extraction

T Gogar, O Hubacek, J Sedivy - … and Innovations: 12th IFIP WG 12.5 …, 2016 - Springer
Web wrappers are systems for extracting structured information from web pages. Currently,
wrappers need to be adapted to a particular website template before they can start the …