The MADlib analytics library or MAD skills, the SQL
MADlib is a free, open source library of in-database analytic methods. It provides an
evolving suite of SQL-based algorithms for machine learning, data mining and statistics that …
evolving suite of SQL-based algorithms for machine learning, data mining and statistics that …
Towards a unified architecture for in-RDBMS analytics
The increasing use of statistical data analysis in enterprise applications has created an arms
race among database vendors to offer ever more sophisticated in-database analytics. One …
race among database vendors to offer ever more sophisticated in-database analytics. One …
Efficient query answering in probabilistic RDF graphs
In this paper, we tackle the problem of efficiently answering queries on probabilistic RDF
data graphs. Specifically, we model RDF data by probabilistic graphs, and an RDF query is …
data graphs. Specifically, we model RDF data by probabilistic graphs, and an RDF query is …
In-RDBMS hardware acceleration of advanced analytics
The data revolution is fueled by advances in machine learning, databases, and hardware
design. Programmable accelerators are making their way into each of these areas …
design. Programmable accelerators are making their way into each of these areas …
Hybrid in-database inference for declarative information extraction
In the database community, work on information extraction (IE) has centered on two themes:
how to effectively manage IE tasks, and how to manage the uncertainties that arise in the IE …
how to effectively manage IE tasks, and how to manage the uncertainties that arise in the IE …
Efficient and effective similarity search over probabilistic data based on earth mover's distance
Advances in geographical tracking, multimedia processing, information extraction, and
sensor networks have created a deluge of probabilistic data. While similarity search is an …
sensor networks have created a deluge of probabilistic data. While similarity search is an …
[PDF][PDF] Automatic knowledge base construction using probabilistic extraction, deductive reasoning, and human feedback
We envision an automatic knowledge base construction system consisting of three
interrelated components. MADDEN is a knowledge extraction system applying statistical text …
interrelated components. MADDEN is a knowledge extraction system applying statistical text …
Asynchronous complex analytics in a distributed dataflow architecture
Scalable distributed dataflow systems have recently experienced widespread adoption, with
commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines …
commodity dataflow engines such as Hadoop and Spark, and even commodity SQL engines …
Optimizing statistical information extraction programs over evolving text
Statistical information extraction (IE) programs are increasingly used to build real-world IE
systems such as Alibaba, CiteSeer, Kylin, and YAGO. Current statistical IE approaches …
systems such as Alibaba, CiteSeer, Kylin, and YAGO. Current statistical IE approaches …