High-Dimensional Indexing

High-Dimensional Indexing

Author: Cui Yu

Publisher: Springer

Published: 2003-08-01

Total Pages: 159

ISBN-13: 3540457704

DOWNLOAD EBOOK

In this monograph, we study the problem of high-dimensional indexing and systematically introduce two efficient index structures: one for range queries and the other for similarity queries. Extensive experiments and comparison studies are conducted to demonstrate the superiority of the proposed indexing methods. Many new database applications, such as multimedia databases or stock price information systems, transform important features or properties of data objects into high-dimensional points. Searching for objects based on these features is thus a search of points in this feature space. To support efficient retrieval in such high-dimensional databases, indexes are required to prune the search space. Indexes for low-dimensional databases are well studied, whereas most of these application specific indexes are not scaleable with the number of dimensions, and they are not designed to support similarity searches and high-dimensional joins.


High-dimensional Data Indexing with Applications

High-dimensional Data Indexing with Applications

Author: Michael Arthur Schuh

Publisher:

Published: 2015

Total Pages: 131

ISBN-13:

DOWNLOAD EBOOK

The indexing of high-dimensional data remains a challenging task amidst an active and storied area of computer science research that impacts many far-reaching applications. At the crossroads of databases and machine learning, modern data indexing enables information retrieval capabilities that would otherwise be impractical or near impossible to attain and apply. One such useful retrieval task in our increasingly data-driven world is the k-nearest neighbor (k-NN) search, which returns the k most similar items in a dataset to the search query provided. While the k-NN concept was popularized in every-day use through the sorted (ranked) results of online text-based search engines like Google, multimedia applications are rapidly becoming the new frontier of research. This dissertation advances the current state of high-dimensional data indexing with the creation of a novel index named ID* (\ID Star"). Based on extensive theoretical and empirical analyses, we discuss important challenges associated with high dimensional data and identify several shortcomings of existing indexing approaches and methodologies. By further mitigating against the negative effects of the curse of dimensionality, we are able to push the boundary of effective k-NN retrieval to a higher number of dimensions over much larger volumes of data. As the foundations of the ID* index, we developed an open-source and extensible distance-based indexing framework predicated on the basic concepts of the popular iDistance index, which utilizes an internal B+-tree for efficient one-dimensional data indexing. Through the addition of several new heuristic-guided algorithmic improvements and hybrid indexing extensions, we show that our new ID* index can perform significantly better than several other popular alternative indexing techniques over a wide variety of synthetic and real-world data. In addition, we present applications of our ID* index through the use of k-NN queries in Content-Based Image Retrieval (CBIR) systems and machine learning classification. An emphasis is placed on the NASA sponsored interdisciplinary research goal of developing a CBIR system for large-scale solar image repositories. Since such applications rely on fast and effective k-NN queries over increasingly large-scale and high-dimensional datasets, it is imperative to utilize an efficient data indexing strategy such as the ID* index.


High-Dimensional Probability

High-Dimensional Probability

Author: Roman Vershynin

Publisher: Cambridge University Press

Published: 2018-09-27

Total Pages: 299

ISBN-13: 1108415199

DOWNLOAD EBOOK

An integrated package of powerful probabilistic tools and key applications in modern mathematical data science.


Innovative Data Communication Technologies and Application

Innovative Data Communication Technologies and Application

Author: Jennifer S. Raj

Publisher: Springer Nature

Published: 2022-02-24

Total Pages: 1033

ISBN-13: 9811671672

DOWNLOAD EBOOK

This book presents the latest research in the fields of computational intelligence, ubiquitous computing models, communication intelligence, communication security, machine learning, informatics, mobile computing, cloud computing, and big data analytics. The best selected papers, presented at the International Conference on Innovative Data Communication Technologies and Application (ICIDCA 2021), are included in the book. The book focuses on the theory, design, analysis, implementation, and application of distributed systems and networks.


Data Analytics: Principles, Tools, and Practices

Data Analytics: Principles, Tools, and Practices

Author: Gaurav Aroraa

Publisher: BPB Publications

Published: 2022-01-24

Total Pages: 481

ISBN-13: 9388511956

DOWNLOAD EBOOK

A Complete Data Analytics Guide for Learners and Professionals. KEY FEATURES ● Learn Big Data, Hadoop Architecture, HBase, Hive and NoSQL Database. ● Dive into Machine Learning, its tools, and applications. ● Coverage of applications of Big Data, Data Analysis, and Business Intelligence. DESCRIPTION These days critical problem solving related to data and data sciences is in demand. Professionals who can solve real data science problems using data science tools are in demand. The book “Data Analytics: Principles, Tools, and Practices” can be considered a handbook or a guide for professionals who want to start their journey in the field of data science. The journey starts with the introduction of DBMS, RDBMS, NoSQL, and DocumentDB. The book introduces the essentials of data science and the modern ecosystem, including the important steps such as data ingestion, data munging, and visualization. The book covers the different types of analysis, different Hadoop ecosystem tools like Apache Spark, Apache Hive, R, MapReduce, and NoSQL Database. It also includes the different machine learning techniques that are useful for data analytics and how to visualize data with different graphs and charts. The book discusses useful tools and approaches for data analytics, supported by concrete code examples. After reading this book, you will be motivated to explore real data analytics and make use of the acquired knowledge on databases, BI/DW, data visualization, Big Data tools, and statistical science. WHAT YOU WILL LEARN ● Familiarize yourself with Apache Spark, Apache Hive, R, MapReduce, and NoSQL Database. ● Learn to manage data warehousing with real time transaction processing. ● Explore various machine learning techniques that apply to data analytics. ● Learn how to visualize data using a variety of graphs and charts using real-world examples from the industry. ● Acquaint yourself with Big Data tools and statistical techniques for machine learning. WHO THIS BOOK IS FOR IT graduates, data engineers and entry-level professionals who have a basic understanding of the tools and techniques but want to learn more about how they fit into a broader context are encouraged to read this book. TABLE OF CONTENTS 1. Database Management System 2. Online Transaction Processing and Data Warehouse 3. Business Intelligence and its deeper dynamics 4. Introduction to Data Visualization 5. Advanced Data Visualization 6. Introduction to Big Data and Hadoop 7. Application of Big Data Real Use Cases 8. Application of Big Data 9. Introduction to Machine Learning 10. Advanced Concepts to Machine Learning 11. Application of Machine Learning


Proceedings of International Conference on Machine Intelligence and Data Science Applications

Proceedings of International Conference on Machine Intelligence and Data Science Applications

Author: Manish Prateek

Publisher: Springer Nature

Published: 2021

Total Pages: 813

ISBN-13: 9813340878

DOWNLOAD EBOOK

This book is a compilation of peer-reviewed papers presented at the International Conference on Machine Intelligence and Data Science Applications, organized by the School of Computer Science, University of Petroleum & Energy Studies, Dehradun, on September 4 and 5, 2020. The book starts by addressing the algorithmic aspect of machine intelligence which includes the framework and optimization of various states of algorithms. Variety of papers related to wide applications in various fields like image processing, natural language processing, computer vision, sentiment analysis, and speech and gesture analysis have been included with upfront details. The book concludes with interdisciplinary applications like legal, health care, smart society, cyber physical system and smart agriculture. The book is a good reference for computer science engineers, lecturers/researchers in machine intelligence discipline and engineering graduates.


Statistical Analysis for High-Dimensional Data

Statistical Analysis for High-Dimensional Data

Author: Arnoldo Frigessi

Publisher: Springer

Published: 2016-02-16

Total Pages: 313

ISBN-13: 3319270990

DOWNLOAD EBOOK

This book features research contributions from The Abel Symposium on Statistical Analysis for High Dimensional Data, held in Nyvågar, Lofoten, Norway, in May 2014. The focus of the symposium was on statistical and machine learning methodologies specifically developed for inference in “big data” situations, with particular reference to genomic applications. The contributors, who are among the most prominent researchers on the theory of statistics for high dimensional inference, present new theories and methods, as well as challenging applications and computational solutions. Specific themes include, among others, variable selection and screening, penalised regression, sparsity, thresholding, low dimensional structures, computational challenges, non-convex situations, learning graphical models, sparse covariance and precision matrices, semi- and non-parametric formulations, multiple testing, classification, factor models, clustering, and preselection. Highlighting cutting-edge research and casting light on future research directions, the contributions will benefit graduate students and researchers in computational biology, statistics and the machine learning community.


Applied Machine Learning for Smart Data Analysis

Applied Machine Learning for Smart Data Analysis

Author: Nilanjan Dey

Publisher: CRC Press

Published: 2019-05-20

Total Pages: 214

ISBN-13: 0429804563

DOWNLOAD EBOOK

The book focuses on how machine learning and the Internet of Things (IoT) has empowered the advancement of information driven arrangements including key concepts and advancements. Ontologies that are used in heterogeneous IoT environments have been discussed including interpretation, context awareness, analyzing various data sources, machine learning algorithms and intelligent services and applications. Further, it includes unsupervised and semi-supervised machine learning techniques with study of semantic analysis and thorough analysis of reviews. Divided into sections such as machine learning, security, IoT and data mining, the concepts are explained with practical implementation including results. Key Features Follows an algorithmic approach for data analysis in machine learning Introduces machine learning methods in applications Address the emerging issues in computing such as deep learning, machine learning, Internet of Things and data analytics Focuses on machine learning techniques namely unsupervised and semi-supervised for unseen and seen data sets Case studies are covered relating to human health, transportation and Internet applications


Soft Computing: Theories and Applications

Soft Computing: Theories and Applications

Author: Tarun K. Sharma

Publisher: Springer Nature

Published: 2021-06-26

Total Pages: 572

ISBN-13: 9811616965

DOWNLOAD EBOOK

This book focuses on soft computing and how it can be applied to solve real-world problems arising in various domains, ranging from medicine and healthcare, to supply chain management, image processing and cryptanalysis. It gathers high-quality papers presented at the International Conference on Soft Computing: Theories and Applications (SoCTA 2020), organized online. The book is divided into two volumes and offers valuable insights into soft computing for teachers and researchers alike; the book will inspire further research in this dynamic field.