Data wrangling for big data: Challenges and opportunities
Data wrangling is the process by which the data required by an application is identified,
extracted, cleaned and integrated, to yield a data set that is suitable for exploration and …
extracted, cleaned and integrated, to yield a data set that is suitable for exploration and …
Mave: A product dataset for multi-source attribute value extraction
Attribute value extraction refers to the task of identifying values of an attribute of interest from
product information. Product attribute values are essential in many e-commerce scenarios …
product information. Product attribute values are essential in many e-commerce scenarios …
Measurement extraction with natural language processing: a review
Quantitative data is important in many domains. Information extraction methods draw
structured data from documents. However, the extraction of quantities and their contexts has …
structured data from documents. However, the extraction of quantities and their contexts has …
Autoknow: Self-driving knowledge collection for products of thousands of types
Can one build a knowledge graph (KG) for all products in the world? Knowledge graphs
have firmly established themselves as valuable sources of information for search and …
have firmly established themselves as valuable sources of information for search and …
Learning to extract attribute value from product via question answering: A multi-task approach
Attribute value extraction refers to the task of identifying values of an attribute of interest from
product information. It is an important research topic which has been widely studied in e …
product information. It is an important research topic which has been widely studied in e …
Scaling up open tagging from tens to thousands: Comprehension empowered attribute value extraction from product title
Supplementing product information by extracting attribute values from title is a crucial task in
e-Commerce domain. Previous studies treat each attribute only as an entity type and build …
e-Commerce domain. Previous studies treat each attribute only as an entity type and build …
The WDC training dataset and gold standard for large-scale product matching
A current research question in the area of entity resolution (also called link discovery or
duplicate detection) is whether and in which cases embeddings and deep neural network …
duplicate detection) is whether and in which cases embeddings and deep neural network …
Construction and applications of billion-scale pre-trained multimodal business knowledge graph
Business Knowledge Graphs (KGs) are important to many enterprises today, providing
factual knowledge and structured data that steer many products and make them more …
factual knowledge and structured data that steer many products and make them more …
A machine learning approach for product matching and categorization
Consumers today have the option to purchase products from thousands of e-shops.
However, the completeness of the product specifications and the taxonomies used for …
However, the completeness of the product specifications and the taxonomies used for …
Deep neural networks for web page information extraction
Web wrappers are systems for extracting structured information from web pages. Currently,
wrappers need to be adapted to a particular website template before they can start the …
wrappers need to be adapted to a particular website template before they can start the …