This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques. Table of Contents: Introduction / Basic Concepts and Terminology / Building Parallel Corpora / Sentence Alignment / Word Alignment / Phrase and Tree Alignment / Concluding Remarks
This volume takes stock of current research in contrastive lexical studies. It reflects the growing interest in corpus-based approaches to the study of lexis, in particular the use of multilingual corpora, shared by researchers working in widely differing fields - contrastive linguistics, lexicology, lexicography, terminology, computational linguistics and machine translation. The articles in the volume, which cover a wide diversity of languages, are divided into four main sections: the exploration of cross-linguistic equivalence, contrastive lexical semantics, corpus-based multilingual lexicography, and translation and parallel concordancing. The volume also contains a lengthy introduction to recent trends in contrastive lexical studies written by the editors of the volume, Bengt Altenberg and Sylviane Granger.
From the contents: Stig JOHANSSON: Towards a multilingual corpus for contrastive analysis and translation studies. - Anna SAGVALL HEIN: The PLUG project: parallel corpora in Linkoping, Uppsala, Goteborg: aims and achievements. - Raphael SALKIE: How can linguists profit from parallel corpora? - Trond TROSTERUD: Parallel corpora as tools for investigating and developing minority languages."
Simulations are frequently used techniques for training, performance assessment, and prediction of future outcomes. In this thesis, the term “human-centered simulation” is used to refer to any simulation in which humans and human cognition are integral to the simulation’s function and purpose (e.g., simulation-based training). A general problem for human-centered simulations is to capture the cognitive processes and activities of the target situation (i.e., the real world task) and recreate them accurately in the simulation. The prevalent view within the simulation research community is that cognition is internal, decontextualized computational processes of individuals. However, contemporary theories of cognition emphasize the importance of the external environment, use of tools, as well as social and cultural factors in cognitive practice. Consequently, there is a need for research on how such contemporary perspectives can be used to describe human-centered simulations, re-interpret theoretical constructs of such simulations, and direct how simulations should be modeled, designed, and evaluated. This thesis adopts distributed cognition as a framework for studying human-centered simulations. Training and assessment of emergency medical management in a Swedish context using the Emergo Train System (ETS) simulator was adopted as a case study. ETS simulations were studied and analyzed using the distributed cognition for teamwork (DiCoT) methodology with the goal of understanding, evaluating, and testing the validity of the ETS simulator. Moreover, to explore distributed cognition as a basis for simulator design, a digital re-design of ETS (DIGEMERGO) was developed based on the DiCoT analysis. The aim of the DIGEMERGO system was to retain core distributed cognitive features of ETS, to increase validity, outcome reliability, and to provide a digital platform for emergency medical studies. DIGEMERGO was evaluated in three separate studies; first, a usefulness, usability, and facevalidation study that involved subject-matter-experts; second, a comparative validation study using an expert-novice group comparison; and finally, a transfer of training study based on self-efficacy and management performance. Overall, the results showed that DIGEMERGO was perceived as a useful, immersive, and promising simulator – with mixed evidence for validity – that demonstrated increased general self-efficacy and management performance following simulation exercises. This thesis demonstrates that distributed cognition, using DiCoT, is a useful framework for understanding, designing and evaluating simulated environments. In addition, the thesis conceptualizes and re-interprets central constructs of human-centered simulation in terms of distributed cognition. In doing so, the thesis shows how distributed cognitive processes relate to validity, fidelity, functionality, and usefulness of human-centered simulations. This thesis thus provides a new understanding of human-centered simulations that is grounded in distributed cognition theory.
With the rising importance of multilingualism in language industries, brought about by global markets and world-wide information exchange, parallel corpora, i.e. corpora of texts accompanied by their translation, have become key resources in the development of natural language processing tools. The applications based upon parallel corpora are numerous and growing in number: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc. The book's chapters have been commissioned from major figures in the field of parallel corpus building and exploitation, with the aim of showing the state of the art in parallel text alignment and use ten to fifteen years after the first parallel-text alignment techniques were developed. Within the book, the following broad themes are addressed: (i) techniques for the alignment of parallel texts at various levels such as sentence, clause, and word; (ii) the use of parallel texts in fields as diverse as translation, lexicography, and information retrieval; (iii) available corpus resources and the evaluation of alignment methods. The book will be of interest to researchers and advanced students of computational linguistics, terminology, lexicography and translation, both in academia and industry.
Equation-based object-oriented (EOO) modeling languages such as Modelica provide a convenient, declarative method for describing models of cyber-physical systems. Because of the ease of use of EOO languages, large and complex models can be built with limited effort. However, current state-of-the-art tools do not provide the user with enough information when errors appear or simulation results are wrong. It is of paramount importance that such tools should give the user enough information to correct errors or understand where the problems that lead to wrong simulation results are located. However, understanding the model translation process of an EOO compiler is a daunting task that not only requires knowledge of the numerical algorithms that the tool executes during simulation, but also the complex symbolic transformations being performed. As part of this work, methods have been developed and explored where the EOO tool, an enhanced Modelica compiler, records the transformations during the translation process in order to provide better diagnostics, explanations, and analysis. This information is used to generate better error-messages during translation. It is also used to provide better debugging for a simulation that produces unexpected results or where numerical methods fail. Meeting deadlines is particularly important for real-time applications. It is usually essential to identify possible bottlenecks and either simplify the model or give hints to the compiler that enable it to generate faster code. When profiling and measuring execution times of parts of the model the recorded information can also be used to find out why a particular system model executes slowly. Combined with debugging information, it is possible to find out why this system of equations is slow to solve, which helps understanding what can be done to simplify the model. A tool with a graphical user interface has been developed to make debugging and performance profiling easier. Both debugging and profiling have been combined into a single view so that performance metrics are mapped to equations, which are mapped to debugging information. The algorithmic part of Modelica was extended with meta-modeling constructs (MetaModelica) for language modeling. In this context a quite general approach to debugging and compilation from (extended) Modelica to C code was developed. That makes it possible to use the same executable format for simulation executables as for compiler bootstrapping when the compiler written in MetaModelica compiles itself. Finally, a method and tool prototype suitable for speeding up simulations has been developed. It works by partitioning the model at appropriate places and compiling a simulation executable for a suitable parallel platform.
Bayesian networks have grown to become a dominant type of model within the domain of probabilistic graphical models. Not only do they empower users with a graphical means for describing the relationships among random variables, but they also allow for (potentially) fewer parameters to estimate, and enable more efficient inference. The random variables and the relationships among them decide the structure of the directed acyclic graph that represents the Bayesian network. It is the stasis over time of these two components that we question in this thesis. By introducing a new type of probabilistic graphical model, which we call gated Bayesian networks, we allow for the variables that we include in our model, and the relationships among them, to change overtime. We introduce algorithms that can learn gated Bayesian networks that use different variables at different times, required due to the process which we are modelling going through distinct phases. We evaluate the efficacy of these algorithms within the domain of algorithmic trading, showing how the learnt gated Bayesian networks can improve upon a passive approach to trading. We also introduce algorithms that detect changes in the relationships among the random variables, allowing us to create a model that consists of several Bayesian networks, thereby revealing changes and the structure by which these changes occur. The resulting models can be used to detect the currently most appropriate Bayesian network, and we show their use in real-world examples from both the domain of sports analytics and finance.
One major problem for the designer of electronic systems is the presence of uncertainty, which is due to phenomena such as process and workload variation. Very often, uncertainty is inherent and inevitable. If ignored, it can lead to degradation of the quality of service in the best case and to severe faults or burnt silicon in the worst case. Thus, it is crucial to analyze uncertainty and to mitigate its damaging consequences by designing electronic systems in such a way that they effectively and efficiently take uncertainty into account. We begin by considering techniques for deterministic system-level analysis and design of certain aspects of electronic systems. These techniques do not take uncertainty into account, but they serve as a solid foundation for those that do. Our attention revolves primarily around power and temperature, as they are of central importance for attaining robustness and energy efficiency. We develop a novel approach to dynamic steady-state temperature analysis of electronic systems and apply it in the context of reliability optimization. We then proceed to develop techniques that address uncertainty. The first technique is designed to quantify the variability of process parameters, which is induced by process variation, across silicon wafers based on indirect and potentially incomplete and noisy measurements. The second technique is designed to study diverse system-level characteristics with respect to the variability originating from process variation. In particular, it allows for analyzing transient temperature profiles as well as dynamic steady-state temperature profiles of electronic systems. This is illustrated by considering a problem of design-space exploration with probabilistic constraints related to reliability. The third technique that we develop is designed to efficiently tackle the case of sources of uncertainty that are less regular than process variation, such as workload variation. This technique is exemplified by analyzing the effect that workload units with uncertain processing times have on the timing-, power-, and temperature-related characteristics of the system under consideration. We also address the issue of runtime management of electronic systems that are subject to uncertainty. In this context, we perform an early investigation of the utility of advanced prediction techniques for the purpose of finegrained long-range forecasting of resource usage in large computer systems. All the proposed techniques are assessed by extensive experimental evaluations, which demonstrate the superior performance of our approaches to analysis and design of electronic systems compared to existing techniques.
The World Wide Web contains large amounts of data, and in most cases this data has no explicit structure. The lack of structure makes it difficult for automated agents to understand and use such data. A step towards a more structured World Wide Web is the Semantic Web, which aims at introducing semantics to data on the World Wide Web. One of the key technologies in this endeavour are ontologies, which provide a means for modeling a domain of interest and are used for search and integration of data. In recent years many ontologies have been developed. To be able to use multiple ontologies it is necessary to align them, i.e., find inter-ontology relationships. However, developing and aligning ontologies is not an easy task and it is often the case that ontologies and their alignments are incorrect and incomplete. This can be a problem for semantically-enabled applications. Incorrect and incomplete ontologies and alignments directly influence the quality of the results of such applications, as wrong results can be returned and correct results can be missed. This thesis focuses on the problem of completing ontologies and ontology networks. The contributions of the thesis are threefold. First, we address the issue of completing the is-a structure and alignment in ontologies and ontology networks. We have formalized the problem of completing the is-a structure in ontologies as an abductive reasoning problem and developed algorithms as well as systems for dealing with the problem. With respect to the completion of alignments, we have studied system performance in the Ontology Alignment Evaluation Initiative, a yearly evaluation campaign for ontology alignment systems. We have also addressed the scalability of ontology matching, which is one of the current challenges, by developing an approach for reducing the search space when generating the alignment.Second, high quality completion requires user involvement. As users' time and effort are a limited resource we address the issue of limiting and facilitating user interaction in the completion process. We have conducted a broad study of state-of-the-art ontology alignment systems and identified different issues related to the process. We have also conducted experiments to assess the impact of user errors in the completion process. While the completion of ontologies and ontology networks can be done at any point in the life-cycle of ontologies and ontology networks, some of the issues can be addressed already in the development phase. The third contribution of the thesis addresses this by introducing ontology completion and ontology alignment into an existing ontology development methodology.
This thesis addresses the need to balance the use of facial recognition systems with the need to protect personal privacy in machine learning and biometric identification. As advances in deep learning accelerate their evolution, facial recognition systems enhance security capabilities, but also risk invading personal privacy. Our research identifies and addresses critical vulnerabilities inherent in facial recognition systems, and proposes innovative privacy-enhancing technologies that anonymize facial data while maintaining its utility for legitimate applications. Our investigation centers on the development of methodologies and frameworks that achieve k-anonymity in facial datasets; leverage identity disentanglement to facilitate anonymization; exploit the vulnerabilities of facial recognition systems to underscore their limitations; and implement practical defenses against unauthorized recognition systems. We introduce novel contributions such as AnonFACES, StyleID, IdDecoder, StyleAdv, and DiffPrivate, each designed to protect facial privacy through advanced adversarial machine learning techniques and generative models. These solutions not only demonstrate the feasibility of protecting facial privacy in an increasingly surveilled world, but also highlight the ongoing need for robust countermeasures against the ever-evolving capabilities of facial recognition technology. Continuous innovation in privacy-enhancing technologies is required to safeguard individuals from the pervasive reach of digital surveillance and protect their fundamental right to privacy. By providing open-source, publicly available tools, and frameworks, this thesis contributes to the collective effort to ensure that advancements in facial recognition serve the public good without compromising individual rights. Our multi-disciplinary approach bridges the gap between biometric systems, adversarial machine learning, and generative modeling to pave the way for future research in the domain and support AI innovation where technological advancement and privacy are balanced.