High-Dimensional Indexing

High-Dimensional Indexing

Author: Cui Yu

Publisher: Springer

Published: 2003-08-01

Total Pages: 159

ISBN-13: 3540457704

DOWNLOAD EBOOK

In this monograph, we study the problem of high-dimensional indexing and systematically introduce two efficient index structures: one for range queries and the other for similarity queries. Extensive experiments and comparison studies are conducted to demonstrate the superiority of the proposed indexing methods. Many new database applications, such as multimedia databases or stock price information systems, transform important features or properties of data objects into high-dimensional points. Searching for objects based on these features is thus a search of points in this feature space. To support efficient retrieval in such high-dimensional databases, indexes are required to prune the search space. Indexes for low-dimensional databases are well studied, whereas most of these application specific indexes are not scaleable with the number of dimensions, and they are not designed to support similarity searches and high-dimensional joins.


High-dimensional Data Indexing with Applications

High-dimensional Data Indexing with Applications

Author: Michael Arthur Schuh

Publisher:

Published: 2015

Total Pages: 131

ISBN-13:

DOWNLOAD EBOOK

The indexing of high-dimensional data remains a challenging task amidst an active and storied area of computer science research that impacts many far-reaching applications. At the crossroads of databases and machine learning, modern data indexing enables information retrieval capabilities that would otherwise be impractical or near impossible to attain and apply. One such useful retrieval task in our increasingly data-driven world is the k-nearest neighbor (k-NN) search, which returns the k most similar items in a dataset to the search query provided. While the k-NN concept was popularized in every-day use through the sorted (ranked) results of online text-based search engines like Google, multimedia applications are rapidly becoming the new frontier of research. This dissertation advances the current state of high-dimensional data indexing with the creation of a novel index named ID* (\ID Star"). Based on extensive theoretical and empirical analyses, we discuss important challenges associated with high dimensional data and identify several shortcomings of existing indexing approaches and methodologies. By further mitigating against the negative effects of the curse of dimensionality, we are able to push the boundary of effective k-NN retrieval to a higher number of dimensions over much larger volumes of data. As the foundations of the ID* index, we developed an open-source and extensible distance-based indexing framework predicated on the basic concepts of the popular iDistance index, which utilizes an internal B+-tree for efficient one-dimensional data indexing. Through the addition of several new heuristic-guided algorithmic improvements and hybrid indexing extensions, we show that our new ID* index can perform significantly better than several other popular alternative indexing techniques over a wide variety of synthetic and real-world data. In addition, we present applications of our ID* index through the use of k-NN queries in Content-Based Image Retrieval (CBIR) systems and machine learning classification. An emphasis is placed on the NASA sponsored interdisciplinary research goal of developing a CBIR system for large-scale solar image repositories. Since such applications rely on fast and effective k-NN queries over increasingly large-scale and high-dimensional datasets, it is imperative to utilize an efficient data indexing strategy such as the ID* index.


High Dimensional Spatial Indexing Using Space-Filling Curves

High Dimensional Spatial Indexing Using Space-Filling Curves

Author: Ankush Chauhan

Publisher: Grin Publishing

Published: 2016-07-21

Total Pages: 16

ISBN-13: 9783668260122

DOWNLOAD EBOOK

Scientific Essay from the year 2015 in the subject Mathematics - Miscellaneous, language: English, abstract: Representation of two dimensional objects into one dimensional space is simple and efficient when using a two coordinate system imposed upon a grid. However, when the two dimensions are expanded far beyond visual and sometimes mental understanding, techniques are used to quantify and simplify the representation of such objects. These techniques center around spatial interpretations by means of a space-filling curve. Since the late 1800's, mathematicians and computer scientists have succeeded with algorithms that express high dimensional geometries. However, very few implementations of the algorithms beyond three dimensions for computing these geometries exist. We propose using the basic spatial computations developed by pioneers in the field like G. Peano, D. Hilbert, E. H. Moore, and others in a working model. The algorithms in this paper are fully implemented in high-level programming languages utilizing a relation database management system. We show the execution speeds of the algorithms using a space-filling curve index for searching compared to brute force searching. Finally, we contrast three space-filling curve algorithms: Moore, Hilbert, and Morton, in execution time of searching for high dimensional data in point queries and range queries.


Database Theory - ICDT 2001

Database Theory - ICDT 2001

Author: Jan Van den Bussche

Publisher: Springer Science & Business Media

Published: 2001-02-08

Total Pages: 460

ISBN-13: 3540414568

DOWNLOAD EBOOK

This book constitutes the refereed proceedings of the 8th International Conference on Database Theory, ICDT 2001, held in London, UK, in January 2001. The 26 revised full papers presented together with two invited papers were carefully reviewed and selected from 75 submissions. All current issues on database theory and the foundations of database systems are addressed. Among the topics covered are database queries, SQL, information retrieval, database logic, database mining, constraint databases, transactions, algorithmic aspects, semi-structured data, data engineering, XML, term rewriting, clustering, etc.


High-Dimensional Statistics

High-Dimensional Statistics

Author: Martin J. Wainwright

Publisher: Cambridge University Press

Published: 2019-02-21

Total Pages: 571

ISBN-13: 1108498027

DOWNLOAD EBOOK

A coherent introductory text from a groundbreaking researcher, focusing on clarity and motivation to build intuition and understanding.


Fundamentals of Database Indexing and Searching

Fundamentals of Database Indexing and Searching

Author: Arnab Bhattacharya

Publisher: CRC Press

Published: 2014-12-02

Total Pages: 287

ISBN-13: 1466582545

DOWNLOAD EBOOK

Fundamentals of Database Indexing and Searching presents well-known database searching and indexing techniques. It focuses on similarity search queries, showing how to use distance functions to measure the notion of dissimilarity. After defining database queries and similarity search queries, the book organizes the most common and representative index structures according to their characteristics. The author first describes low-dimensional index structures, memory-based index structures, and hierarchical disk-based index structures. He then outlines useful distance measures and index structures that use the distance information to efficiently solve similarity search queries. Focusing on the difficult dimensionality phenomenon, he also presents several indexing methods that specifically deal with high-dimensional spaces. In addition, the book covers data reduction techniques, including embedding, various data transforms, and histograms. Through numerous real-world examples, this book explores how to effectively index and search for information in large collections of data. Requiring only a basic computer science background, it is accessible to practitioners and advanced undergraduate students.


Locality Sensitive Indexing for Efficient High-dimensional Query Answering in the Presence of Excluded Regions

Locality Sensitive Indexing for Efficient High-dimensional Query Answering in the Presence of Excluded Regions

Author: Aneesha Bhat

Publisher:

Published: 2016

Total Pages: 75

ISBN-13:

DOWNLOAD EBOOK

Similarity search in high-dimensional spaces is popular for applications like imageprocessing, time series, and genome data. In higher dimensions, the phenomenon ofcurse of dimensionality kills the effectiveness of most of the index structures, givingway to approximate methods like Locality Sensitive Hashing (LSH), to answer similaritysearches. In addition to range searches and k-nearest neighbor searches, thereis a need to answer negative queries formed by excluded regions, in high-dimensionaldata. Though there have been a slew of variants of LSH to improve efficiency, reducestorage, and provide better accuracies, none of the techniques are capable ofanswering queries in the presence of excluded regions.This thesis provides a novel approach to handle such negative queries. This isachieved by creating a prefix based hierarchical index structure. First, the higherdimensional space is projected to a lower dimension space. Then, a one-dimensionalordering is developed, while retaining the hierarchical traits. The algorithm intelligentlyprunes the irrelevant candidates while answering queries in the presence ofexcluded regions. While naive LSH would need to filter out the negative query resultsfrom the main results, the new algorithm minimizes the need to fetch the redundantresults in the first place. Experiment results show that this reduces post-processingcost thereby reducing the query processing time.