Probabilistic Databases

Probabilistic Databases

Author: Dan Suciu

Publisher: Morgan & Claypool Publishers

Published: 2011

Total Pages: 183

ISBN-13: 1608456803

DOWNLOAD EBOOK

Probabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-k query processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases. Table of Contents: Overview / Data and Query Model / The Query Evaluation Problem / Extensional Query Evaluation / Intensional Query Evaluation / Advanced Techniques


Query Processing over Uncertain Databases

Query Processing over Uncertain Databases

Author: Lei Chen

Publisher: Morgan & Claypool Publishers

Published: 2012-12-01

Total Pages: 103

ISBN-13: 1608458938

DOWNLOAD EBOOK

Due to measurement errors, transmission lost, or injected noise for privacy protection, uncertainty exists in the data of many real applications. However, query processing techniques for deterministic data cannot be directly applied to uncertain data because they do not have mechanisms to handle data uncertainty. Therefore, efficient and effective manipulation of uncertain data is a practical yet challenging research topic. In this book, we start from the data models for imprecise and uncertain data, move on to defining different semantics for queries on uncertain data, and finally discuss the advanced query processing techniques for various probabilistic queries in uncertain databases. The book serves as a comprehensive guideline for query processing over uncertain databases. Table of Contents: Introduction / Uncertain Data Models / Spatial Query Semantics over Uncertain Data Models / Spatial Query Processing over Uncertain Databases / Conclusion


Advanced Query Processing

Advanced Query Processing

Author: Barbara Catania

Publisher: Springer Science & Business Media

Published: 2012-07-28

Total Pages: 355

ISBN-13: 3642283233

DOWNLOAD EBOOK

This research book presents key developments, directions, and challenges concerning advanced query processing for both traditional and non-traditional data. A special emphasis is devoted to approximation and adaptivity issues as well as to the integration of heterogeneous data sources. The book will prove useful as a reference book for senior undergraduate or graduate courses on advanced data management issues, which have a special focus on query processing and data integration. It is aimed for technologists, managers, and developers who want to know more about emerging trends in advanced query processing.


Extracting and Querying Probabilistic Information in BayesStore

Extracting and Querying Probabilistic Information in BayesStore

Author: Zhe Wang

Publisher:

Published: 2011

Total Pages: 310

ISBN-13:

DOWNLOAD EBOOK

During the past few years, the number of applications that need to process large-scale data has grown remarkably. The data driving these applications are often uncertain, as is the analysis, which often involves probabilistic models and statistical inference. Examples include sensor-based monitoring, information extraction, and online advertising. Such applications require probabilistic data analysis (PDA), which is a family of queries over data, uncertainties, and probabilistic models that involve relational operators from database literature, as well as inference operators from statistical machine learning (SML) literature. Prior to our work, probabilistic database research advocated an approach in which uncertainty is modeled by attaching probabilities to data items. However, such systems do not and cannot take advantage of the wealth of SML research, because they are unable to represent and reason the pervasive probabilistic correlations in the data. In this thesis, we propose, build, and evaluate BayesStore, a probabilistic database system that natively supports SML models and various inference algorithms to perform advanced data analysis. This marriage of database and SML technologies creates a declarative and efficient probabilistic processing framework for applications dealing with large-scale uncertain data. We use sensor-based monitoring and information extraction over text as the two driving applications. Sensor network applications generate noisy sensor readings, on top of which a first-order Bayesian network model is used to capture the probability distribution. Information extraction applications generate uncertain entities from text using linear-chain conditional random fields. We explore a variety of research challenges, including extending the relational data model with probabilistic data and statistical models, efficiently implementing statistical inference algorithms in a database, defining relational operators (e.g., select, project, join) over probabilistic data and models, developing joint optimization of inference operators and the relational algebra, and devising novel query execution plans. The experimental results show: (1) statistical inference algorithms over probabilistic models can be efficiently implemented in the set-oriented programming framework in databases; (2) optimizations for query-driven SML inference lead to orders-of-magnitude speed-up on large corpora; and (3) using in-database SML methods to extract and query probabilistic information can significantly improve answer quality.


Query Processing over Uncertain Databases

Query Processing over Uncertain Databases

Author: Lei Chen

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 91

ISBN-13: 3031018966

DOWNLOAD EBOOK

Due to measurement errors, transmission lost, or injected noise for privacy protection, uncertainty exists in the data of many real applications. However, query processing techniques for deterministic data cannot be directly applied to uncertain data because they do not have mechanisms to handle data uncertainty. Therefore, efficient and effective manipulation of uncertain data is a practical yet challenging research topic. In this book, we start from the data models for imprecise and uncertain data, move on to defining different semantics for queries on uncertain data, and finally discuss the advanced query processing techniques for various probabilistic queries in uncertain databases. The book serves as a comprehensive guideline for query processing over uncertain databases. Table of Contents: Introduction / Uncertain Data Models / Spatial Query Semantics over Uncertain Data Models / Spatial Query Processing over Uncertain Databases / Conclusion


Database Systems for Advanced Applications

Database Systems for Advanced Applications

Author: Sang-goo Lee

Publisher: Springer Science & Business Media

Published: 2012-03-27

Total Pages: 355

ISBN-13: 3642290345

DOWNLOAD EBOOK

This two volume set LNCS 7238 and LNCS 7239 constitutes the refereed proceedings of the 17th International Conference on Database Systems for Advanced Applications, DASFAA 2012, held in Busan, South Korea, in April 2012. The 44 revised full papers and 8 short papers presented together with 2 invited keynote papers, 8 industrial papers, 8 demo presentations, 4 tutorials and 1 panel paper were carefully reviewed and selected from a total of 159 submissions. The topics covered are query processing and optimization, data semantics, XML and semi-structured data, data mining and knowledge discovery, privacy and anonymity, data management in the Web, graphs and data mining applications, temporal and spatial data, top-k and skyline query processing, information retrieval and recommendation, indexing and search systems, cloud computing and scalability, memory-based query processing, semantic and decision support systems, social data, data mining.


Query Processing over Incomplete Databases

Query Processing over Incomplete Databases

Author: Yunjun Gao

Publisher: Springer Nature

Published: 2022-06-01

Total Pages: 106

ISBN-13: 303101863X

DOWNLOAD EBOOK

Incomplete data is part of life and almost all areas of scientific studies. Users tend to skip certain fields when they fill out online forms; participants choose to ignore sensitive questions on surveys; sensors fail, resulting in the loss of certain readings; publicly viewable satellite map services have missing data in many mobile applications; and in privacy-preserving applications, the data is incomplete deliberately in order to preserve the sensitivity of some attribute values. Query processing is a fundamental problem in computer science, and is useful in a variety of applications. In this book, we mostly focus on the query processing over incomplete databases, which involves finding a set of qualified objects from a specified incomplete dataset in order to support a wide spectrum of real-life applications. We first elaborate the three general kinds of methods of handling incomplete data, including (i) discarding the data with missing values, (ii) imputation for the missing values, and (iii) just depending on the observed data values. For the third method type, we introduce the semantics of k-nearest neighbor (kNN) search, skyline query, and top-k dominating query on incomplete data, respectively. In terms of the three representative queries over incomplete data, we investigate some advanced techniques to process incomplete data queries, including indexing, pruning as well as crowdsourcing techniques.


Advances in Probabilistic Databases for Uncertain Information Management

Advances in Probabilistic Databases for Uncertain Information Management

Author: Zongmin Ma

Publisher: Springer

Published: 2013-03-30

Total Pages: 167

ISBN-13: 364237509X

DOWNLOAD EBOOK

This book covers a fast-growing topic in great depth and focuses on the technologies and applications of probabilistic data management. It aims to provide a single account of current studies in probabilistic data management. The objective of the book is to provide the state of the art information to researchers, practitioners, and graduate students of information technology of intelligent information processing, and at the same time serving the information technology professional faced with non-traditional applications that make the application of conventional approaches difficult or impossible.