Parallel Corpora for Contrastive and Translation Studies

Parallel Corpora for Contrastive and Translation Studies

Author: Irene Doval

Publisher: John Benjamins Publishing Company

Published: 2019-03-20

Total Pages: 313

ISBN-13: 9027262845

DOWNLOAD EBOOK

This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.


Parallel Text Processing

Parallel Text Processing

Author: Jean Véronis

Publisher: Springer Science & Business Media

Published: 2000-09-30

Total Pages: 442

ISBN-13: 9780792365464

DOWNLOAD EBOOK

With the rising importance of multilingualism in language industries, brought about by global markets and world-wide information exchange, parallel corpora, i.e. corpora of texts accompanied by their translation, have become key resources in the development of natural language processing tools. The applications based upon parallel corpora are numerous and growing in number: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc. The book's chapters have been commissioned from major figures in the field of parallel corpus building and exploitation, with the aim of showing the state of the art in parallel text alignment and use ten to fifteen years after the first parallel-text alignment techniques were developed. Within the book, the following broad themes are addressed: (i) techniques for the alignment of parallel texts at various levels such as sentence, clause, and word; (ii) the use of parallel texts in fields as diverse as translation, lexicography, and information retrieval; (iii) available corpus resources and the evaluation of alignment methods. The book will be of interest to researchers and advanced students of computational linguistics, terminology, lexicography and translation, both in academia and industry.


Parallel corpora, parallel worlds

Parallel corpora, parallel worlds

Author:

Publisher: BRILL

Published: 2016-09-12

Total Pages: 227

ISBN-13: 9004334297

DOWNLOAD EBOOK

From the contents: Stig JOHANSSON: Towards a multilingual corpus for contrastive analysis and translation studies. - Anna SAGVALL HEIN: The PLUG project: parallel corpora in Linkoping, Uppsala, Goteborg: aims and achievements. - Raphael SALKIE: How can linguists profit from parallel corpora? - Trond TROSTERUD: Parallel corpora as tools for investigating and developing minority languages."


Language Corpora Annotation and Processing

Language Corpora Annotation and Processing

Author: Niladri Sekhar Dash

Publisher: Springer Nature

Published: 2021

Total Pages:

ISBN-13: 9811629609

DOWNLOAD EBOOK

This book addresses the research, analysis, and description of the methods and processes that are used in the annotation and processing of language corpora in advanced, semi-advanced, and non-advanced languages. It provides the background information and empirical data needed to understand the nature and depth of problems related to corpus annotation and text processing and shows readers how the linguistic elements found in texts are analyzed and applied to develop language technology systems and devices. As such, it offers valuable insights for researchers, educators, and students of linguistics and language technology.


Annotation, exploitation and evaluation of parallel corpora: TC3 I

Annotation, exploitation and evaluation of parallel corpora: TC3 I

Author: Silvia Hansen-Schirra

Publisher: Language Science Press

Published: 2017-02-27

Total Pages: 165

ISBN-13: 3946234852

DOWNLOAD EBOOK

Exchange between the translation studies and the computational linguistics communities has traditionally not been very intense. Among other things, this is reflected by the different views on parallel corpora. While computational linguistics does not always strictly pay attention to the translation direction (e.g. when translation rules are extracted from (sub)corpora which actually only consist of translations), translation studies are amongst other things concerned with exactly comparing source and target texts (e.g. to draw conclusions on interference and standardization effects). However, there has recently been more exchange between the two fields – especially when it comes to the annotation of parallel corpora. This special issue brings together the different research perspectives. Its contributions show – from both perspectives – how the communities have come to interact in recent years.


Building and Using Comparable Corpora for Multilingual Natural Language Processing

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Author: Serge Sharoff

Publisher: Springer Nature

Published: 2023-08-23

Total Pages: 138

ISBN-13: 3031313844

DOWNLOAD EBOOK

This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.


Parallel Text Processing

Parallel Text Processing

Author: Jean Véronis

Publisher: Springer Science & Business Media

Published: 2013-03-14

Total Pages: 417

ISBN-13: 9401725357

DOWNLOAD EBOOK

l This book evolved from the ARCADE evaluation exercise that started in 1995. The project's goal is to evaluate alignment systems for parallel texts, i. e. , texts accompanied by their translation. Thirteen teams from various places around the world have participated so far and for the first time, some ten to fifteen years after the first alignment techniques were designed, the community has been able to get a clear picture of the behaviour of alignment systems. Several chapters in this book describe the details of competing systems, and the last chapter is devoted to the description of the evaluation protocol and results. The remaining chapters were especially commissioned from researchers who have been major figures in the field in recent years, in an attempt to address a wide range of topics that describe the state of the art in parallel text processing and use. As I recalled in the introduction, the Rosetta stone won eternal fame as the prototype of parallel texts, but such texts are probably almost as old as the invention of writing. Nowadays, parallel texts are electronic, and they are be coming an increasingly important resource for building the natural language processing tools needed in the "multilingual information society" that is cur rently emerging at an incredible speed. Applications are numerous, and they are expanding every day: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc.


The Web as a Parallel Corpus

The Web as a Parallel Corpus

Author:

Publisher:

Published: 2002

Total Pages: 31

ISBN-13:

DOWNLOAD EBOOK

Parallel corpora have become an essential resource for work in multi-lingual natural language processing. In this report, we describe our work using the STRAND system for mining parallel text on the World Wide Web, first reviewing the original algorithm and results and then presenting a set of significant enhancements. These enhancements include the use of supervised learning based on structural features of documents to improve classification performance, a new content-based measure of translational equivalence, and adaptation of the system to take advantage of the Internet Archive for mining parallel text from the Web on a large scale. Finally, the value of these techniques is demonstrated in the construction of a significant parallel corpus for a low-density language pair.


Developing Linguistic Corpora

Developing Linguistic Corpora

Author: Martin Wynne

Publisher: Oxbow Books Limited

Published: 2005

Total Pages: 100

ISBN-13:

DOWNLOAD EBOOK

A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.