Advancing the Discovery of Unique Column Combinations

Advancing the Discovery of Unique Column Combinations

Author: Ziawasch Abedjan

Publisher: Universitätsverlag Potsdam

Published: 2011

Total Pages: 30

ISBN-13: 3869561483

DOWNLOAD EBOOK

Unique column combinations of a relational database table are sets of columns that contain only unique values. Discovering such combinations is a fundamental research problem and has many different data management and knowledge discovery applications. Existing discovery algorithms are either brute force or have a high memory load and can thus be applied only to small datasets or samples. In this paper, the wellknown GORDIAN algorithm and "Apriori-based" algorithms are compared and analyzed for further optimization. We greatly improve the Apriori algorithms through efficient candidate generation and statistics-based pruning methods. A hybrid solution HCAGORDIAN combines the advantages of GORDIAN and our new algorithm HCA, and it significantly outperforms all previous work in many situations.


Perspectives in Business Informatics Research

Perspectives in Business Informatics Research

Author: Václav Řepa

Publisher: Springer

Published: 2016-09-07

Total Pages: 360

ISBN-13: 3319453211

DOWNLOAD EBOOK

This book constitutes the proceedings of the 15th International Conference on Perspectives in Business Informatics Research, BIR 2016, held in Prague, Czech Republic, in September 2016. Overall, 61 submissions from 16 countries were rigorously reviewed by 42 members of the program committee representing 21 countries. The selected 21 full papers and 3 short papers are included in this volume together with 2 abstracts of invited talks. This year again, the papers presented at the conference cover many important aspects of the development, use, and application of management information systems. The papers have been organized in topical sections on Business Processes and Enterprise Modeling; Information Systems Development; Information Systems Management; Learning and Capability; and Data Analysis.


Cache Conscious Column Organization in In-memory Column Stores

Cache Conscious Column Organization in In-memory Column Stores

Author: David Schwalb

Publisher: Universitätsverlag Potsdam

Published: 2013

Total Pages: 100

ISBN-13: 3869562285

DOWNLOAD EBOOK

Cost models are an essential part of database systems, as they are the basis of query performance optimization. Based on predictions made by cost models, the fastest query execution plan can be chosen and executed or algorithms can be tuned and optimised. In-memory databases shifts the focus from disk to main memory accesses and CPU costs, compared to disk based systems where input and output costs dominate the overall costs and other processing costs are often neglected. However, modelling memory accesses is fundamentally different and common models do not apply anymore. This work presents a detailed parameter evaluation for the plan operators scan with equality selection, scan with range selection, positional lookup and insert in in-memory column stores. Based on this evaluation, a cost model based on cache misses for estimating the runtime of the considered plan operators using different data structures is developed. Considered are uncompressed columns, bit compressed and dictionary encoded columns with sorted and unsorted dictionaries. Furthermore, tree indices on the columns and dictionaries are discussed. Finally, partitioned columns consisting of one partition with a sorted and one with an unsorted dictionary are investigated. New values are inserted in the unsorted dictionary partition and moved periodically by a merge process to the sorted partition. An efficient attribute merge algorithm is described, supporting the update performance required to run enterprise applications on read-optimised databases. Further, a memory traffic based cost model for the merge process is provided.


Business Process Management Workshops

Business Process Management Workshops

Author: Marcello La Rosa

Publisher: Springer

Published: 2013-01-26

Total Pages: 837

ISBN-13: 3642362850

DOWNLOAD EBOOK

This book constitutes the refereed proceedings of 12 international workshops held in Tallinn, Estonia, in conjunction with the 10th International Conference on Business Process Management, BPM 2012, in September 2012. The 12 workshops comprised Adaptive Case Management and Other Non-Workflow Approaches to BPM (ACM 2012), Business Process Design (BPD 2012), Business Process Intelligence (BPI 2012), Business Process Management and Social Software (BPMS2 2012), Data- and Artifact-Centric BPM (DAB 2012), Event-Driven Business Process Management (edBPM 2012), Empirical Research in Business Process Management (ER-BPM 2012), Process Model Collections (PMC 2012), Process-Aware Logistics Systems (PALS 2012), Reuse in Business Process Management (rBPM 2012), Security in Business Processes (SBP 2012), and Theory and Applications of Process Visualization (TAProViz 2012). The 56 revised full papers presented were carefully reviewed and selected from 141 submissions.


Understanding Cryptic Schemata in Large Extract-transform-load Systems

Understanding Cryptic Schemata in Large Extract-transform-load Systems

Author: Alexander Albrecht

Publisher: Universitätsverlag Potsdam

Published: 2013

Total Pages: 28

ISBN-13: 3869562013

DOWNLOAD EBOOK

Extract-Transform-Load (ETL) tools are used for the creation, maintenance, and evolution of data warehouses, data marts, and operational data stores. ETL workflows populate those systems with data from various data sources by specifying and executing a DAG of transformations. Over time, hundreds of individual workflows evolve as new sources and new requirements are integrated into the system. The maintenance and evolution of large-scale ETL systems requires much time and manual effort. A key problem is to understand the meaning of unfamiliar attribute labels in source and target databases and ETL transformations. Hard-to-understand attribute labels lead to frustration and time spent to develop and understand ETL workflows. We present a schema decryption technique to support ETL developers in understanding cryptic schemata of sources, targets, and ETL transformations. For a given ETL system, our recommender-like approach leverages the large number of mapped attribute labels in existing ETL workflows to produce good and meaningful decryptions. In this way we are able to decrypt attribute labels consisting of a number of unfamiliar few-letter abbreviations, such as UNP_PEN_INT, which we can decrypt to UNPAID_PENALTY_INTEREST. We evaluate our schema decryption approach on three real-world repositories of ETL workflows and show that our approach is able to suggest high-quality decryptions for cryptic attribute labels in a given schema.


Transactions on Large-Scale Data- and Knowledge-Centered Systems L

Transactions on Large-Scale Data- and Knowledge-Centered Systems L

Author: Abdelkader Hameurlain

Publisher: Springer Nature

Published: 2021-12-02

Total Pages: 124

ISBN-13: 366264553X

DOWNLOAD EBOOK

The LNCS journal Transactions on Large-Scale Data and Knowledge-Centered Systems focuses on data management, knowledge discovery, and knowledge processing, which are core and hot topics in computer science. Since the 1990s, the Internet has become the main driving force behind application development in all domains. An increase in the demand for resource sharing (e.g., computing resources, services, metadata, data sources) across different sites connected through networks has led to an evolution of data- and knowledge-management systems from centralized systems to decentralized systems enabling large-scale distributed applications providing high scalability. This, the 50th issue of Transactions on Large-Scale Data and Knowledge-Centered Systems, contains five fully revised selected regular papers. Topics covered include data anonymization, quasi-identifier discovery methods, symbolic time series representation, detection of anomalies in time series, data quality management in biobanks, and the use of multi-agent technology in the design of intelligent systems for maritime transport.


Web-based Development in the Lively Kernel

Web-based Development in the Lively Kernel

Author: Jens Lincke

Publisher: Universitätsverlag Potsdam

Published: 2012

Total Pages: 70

ISBN-13: 3869561602

DOWNLOAD EBOOK

The World Wide Web as an application platform becomes increasingly important. However, the development of Web applications is often more complex than for the desktop. Web-based development environments like Lively Webwerkstatt can mitigate this problem by making the development process more interactive and direct. By moving the development environment into the Web, applications can be developed collaboratively in a Wiki-like manner. This report documents the results of the project seminar on Web-based Development Environments 2010. In this seminar, participants extended the Web-based development environment Lively Webwerkstatt. They worked in small teams on current research topics from the field of Web-development and tool support for programmers and implemented their results in the Webwerkstatt environment.


An Abstraction for Version Control Systems

An Abstraction for Version Control Systems

Author: Matthias Kleine

Publisher: Universitätsverlag Potsdam

Published: 2012

Total Pages: 88

ISBN-13: 3869561580

DOWNLOAD EBOOK

Version Control Systems (VCS) allow developers to manage changes to software artifacts. Developers interact with VCSs through a variety of client programs, such as graphical front-ends or command line tools. It is desirable to use the same version control client program against different VCSs. Unfortunately, no established abstraction over VCS concepts exists. Instead, VCS client programs implement ad-hoc solutions to support interaction with multiple VCSs. This thesis presents Pur, an abstraction over version control concepts that allows building rich client programs that can interact with multiple VCSs. We provide an implementation of this abstraction and validate it by implementing a client application.


Scalable Compatibility for Embedded Real-time Components Via Language Progressive Timed Automata

Scalable Compatibility for Embedded Real-time Components Via Language Progressive Timed Automata

Author: Stefan Neumann

Publisher: Universitätsverlag Potsdam

Published: 2013

Total Pages: 82

ISBN-13: 3869562269

DOWNLOAD EBOOK

The proper composition of independently developed components of an embedded real- time system is complicated due to the fact that besides the functional behavior also the non-functional properties and in particular the timing have to be compatible. Nowadays related compatibility problems have to be addressed in a cumbersome integration and configuration phase at the end of the development process, that in the worst case may fail. Therefore, a number of formal approaches have been developed, which try to guide the upfront decomposition of the embedded real-time system into components such that integration problems related to timing properties can be excluded and that suitable configurations can be found. However, the proposed solutions require a number of strong assumptions that can be hardly fulfilled or the required analysis does not scale well. In this paper, we present an approach based on timed automata that can provide the required guarantees for the later integration without strong assumptions, which are difficult to match in practice. The approach provides a modular reasoning scheme that permits to establish the required guarantees for the integration employing only local checks, which therefore also scales. It is also possible to determine potential configuration settings by means of timed game synthesis.