Query Processing on Probabilistic Data
Author: Guy van den Broeck
Publisher:
Published: 2015
Total Pages: 0
ISBN-13:
DOWNLOAD EBOOKRead and Download eBook Full
Author: Guy van den Broeck
Publisher:
Published: 2015
Total Pages: 0
ISBN-13:
DOWNLOAD EBOOKAuthor: Dan Suciu
Publisher: Morgan & Claypool Publishers
Published: 2011
Total Pages: 183
ISBN-13: 1608456803
DOWNLOAD EBOOKProbabilistic databases are databases where the value of some attributes or the presence of some records are uncertain and known only with some probability. Applications in many areas such as information extraction, RFID and scientific data management, data cleaning, data integration, and financial risk assessment produce large volumes of uncertain data, which are best modeled and processed by a probabilistic database. This book presents the state of the art in representation formalisms and query processing techniques for probabilistic data. It starts by discussing the basic principles for representing large probabilistic databases, by decomposing them into tuple-independent tables, block-independent-disjoint tables, or U-databases. Then it discusses two classes of techniques for query evaluation on probabilistic databases. In extensional query evaluation, the entire probabilistic inference can be pushed into the database engine and, therefore, processed as effectively as the evaluation of standard SQL queries. The relational queries that can be evaluated this way are called safe queries. In intensional query evaluation, the probabilistic inference is performed over a propositional formula called lineage expression: every relational query can be evaluated this way, but the data complexity dramatically depends on the query being evaluated, and can be #P-hard. The book also discusses some advanced topics in probabilistic data management such as top-k query processing, sequential probabilistic databases, indexing and materialized views, and Monte Carlo databases. Table of Contents: Overview / Data and Query Model / The Query Evaluation Problem / Extensional Query Evaluation / Intensional Query Evaluation / Advanced Techniques
Author: Lei Chen
Publisher: Morgan & Claypool Publishers
Published: 2012-12-01
Total Pages: 103
ISBN-13: 1608458938
DOWNLOAD EBOOKDue to measurement errors, transmission lost, or injected noise for privacy protection, uncertainty exists in the data of many real applications. However, query processing techniques for deterministic data cannot be directly applied to uncertain data because they do not have mechanisms to handle data uncertainty. Therefore, efficient and effective manipulation of uncertain data is a practical yet challenging research topic. In this book, we start from the data models for imprecise and uncertain data, move on to defining different semantics for queries on uncertain data, and finally discuss the advanced query processing techniques for various probabilistic queries in uncertain databases. The book serves as a comprehensive guideline for query processing over uncertain databases. Table of Contents: Introduction / Uncertain Data Models / Spatial Query Semantics over Uncertain Data Models / Spatial Query Processing over Uncertain Databases / Conclusion
Author: Barbara Catania
Publisher: Springer Science & Business Media
Published: 2012-07-28
Total Pages: 355
ISBN-13: 3642283233
DOWNLOAD EBOOKThis research book presents key developments, directions, and challenges concerning advanced query processing for both traditional and non-traditional data. A special emphasis is devoted to approximation and adaptivity issues as well as to the integration of heterogeneous data sources. The book will prove useful as a reference book for senior undergraduate or graduate courses on advanced data management issues, which have a special focus on query processing and data integration. It is aimed for technologists, managers, and developers who want to know more about emerging trends in advanced query processing.
Author: Zhe Wang
Publisher:
Published: 2011
Total Pages: 310
ISBN-13:
DOWNLOAD EBOOKDuring the past few years, the number of applications that need to process large-scale data has grown remarkably. The data driving these applications are often uncertain, as is the analysis, which often involves probabilistic models and statistical inference. Examples include sensor-based monitoring, information extraction, and online advertising. Such applications require probabilistic data analysis (PDA), which is a family of queries over data, uncertainties, and probabilistic models that involve relational operators from database literature, as well as inference operators from statistical machine learning (SML) literature. Prior to our work, probabilistic database research advocated an approach in which uncertainty is modeled by attaching probabilities to data items. However, such systems do not and cannot take advantage of the wealth of SML research, because they are unable to represent and reason the pervasive probabilistic correlations in the data. In this thesis, we propose, build, and evaluate BayesStore, a probabilistic database system that natively supports SML models and various inference algorithms to perform advanced data analysis. This marriage of database and SML technologies creates a declarative and efficient probabilistic processing framework for applications dealing with large-scale uncertain data. We use sensor-based monitoring and information extraction over text as the two driving applications. Sensor network applications generate noisy sensor readings, on top of which a first-order Bayesian network model is used to capture the probability distribution. Information extraction applications generate uncertain entities from text using linear-chain conditional random fields. We explore a variety of research challenges, including extending the relational data model with probabilistic data and statistical models, efficiently implementing statistical inference algorithms in a database, defining relational operators (e.g., select, project, join) over probabilistic data and models, developing joint optimization of inference operators and the relational algebra, and devising novel query execution plans. The experimental results show: (1) statistical inference algorithms over probabilistic models can be efficiently implemented in the set-oriented programming framework in databases; (2) optimizations for query-driven SML inference lead to orders-of-magnitude speed-up on large corpora; and (3) using in-database SML methods to extract and query probabilistic information can significantly improve answer quality.
Author: Lei Chen
Publisher: Springer Nature
Published: 2022-05-31
Total Pages: 91
ISBN-13: 3031018966
DOWNLOAD EBOOKDue to measurement errors, transmission lost, or injected noise for privacy protection, uncertainty exists in the data of many real applications. However, query processing techniques for deterministic data cannot be directly applied to uncertain data because they do not have mechanisms to handle data uncertainty. Therefore, efficient and effective manipulation of uncertain data is a practical yet challenging research topic. In this book, we start from the data models for imprecise and uncertain data, move on to defining different semantics for queries on uncertain data, and finally discuss the advanced query processing techniques for various probabilistic queries in uncertain databases. The book serves as a comprehensive guideline for query processing over uncertain databases. Table of Contents: Introduction / Uncertain Data Models / Spatial Query Semantics over Uncertain Data Models / Spatial Query Processing over Uncertain Databases / Conclusion
Author: Sang-goo Lee
Publisher: Springer Science & Business Media
Published: 2012-03-27
Total Pages: 355
ISBN-13: 3642290345
DOWNLOAD EBOOKThis two volume set LNCS 7238 and LNCS 7239 constitutes the refereed proceedings of the 17th International Conference on Database Systems for Advanced Applications, DASFAA 2012, held in Busan, South Korea, in April 2012. The 44 revised full papers and 8 short papers presented together with 2 invited keynote papers, 8 industrial papers, 8 demo presentations, 4 tutorials and 1 panel paper were carefully reviewed and selected from a total of 159 submissions. The topics covered are query processing and optimization, data semantics, XML and semi-structured data, data mining and knowledge discovery, privacy and anonymity, data management in the Web, graphs and data mining applications, temporal and spatial data, top-k and skyline query processing, information retrieval and recommendation, indexing and search systems, cloud computing and scalability, memory-based query processing, semantic and decision support systems, social data, data mining.
Author: Yunjun Gao
Publisher: Springer Nature
Published: 2022-06-01
Total Pages: 106
ISBN-13: 303101863X
DOWNLOAD EBOOKIncomplete data is part of life and almost all areas of scientific studies. Users tend to skip certain fields when they fill out online forms; participants choose to ignore sensitive questions on surveys; sensors fail, resulting in the loss of certain readings; publicly viewable satellite map services have missing data in many mobile applications; and in privacy-preserving applications, the data is incomplete deliberately in order to preserve the sensitivity of some attribute values. Query processing is a fundamental problem in computer science, and is useful in a variety of applications. In this book, we mostly focus on the query processing over incomplete databases, which involves finding a set of qualified objects from a specified incomplete dataset in order to support a wide spectrum of real-life applications. We first elaborate the three general kinds of methods of handling incomplete data, including (i) discarding the data with missing values, (ii) imputation for the missing values, and (iii) just depending on the observed data values. For the third method type, we introduce the semantics of k-nearest neighbor (kNN) search, skyline query, and top-k dominating query on incomplete data, respectively. In terms of the three representative queries over incomplete data, we investigate some advanced techniques to process incomplete data queries, including indexing, pruning as well as crowdsourcing techniques.
Author: S. Seshadri
Publisher:
Published: 1992
Total Pages: 272
ISBN-13:
DOWNLOAD EBOOKAuthor: Zongmin Ma
Publisher: Springer
Published: 2013-03-30
Total Pages: 167
ISBN-13: 364237509X
DOWNLOAD EBOOKThis book covers a fast-growing topic in great depth and focuses on the technologies and applications of probabilistic data management. It aims to provide a single account of current studies in probabilistic data management. The objective of the book is to provide the state of the art information to researchers, practitioners, and graduate students of information technology of intelligent information processing, and at the same time serving the information technology professional faced with non-traditional applications that make the application of conventional approaches difficult or impossible.