[PDF] Full The Web As A Parallel Corpus Download eBook

Machine Translation and the Information Soup

Author: David Farwell

Publisher: Springer

Published: 2003-06-29

Total Pages: 551

ISBN-13: 3540494782

Machine Translation and the Information Soup! Over the past fty years, machine translation has grown from a tantalizing dream to a respectable and stable scienti c-linguistic enterprise, with users, c- mercial systems, university research, and government participation. But until very recently, MT has been performed as a relatively distinct operation, so- what isolated from other text processing. Today, this situation is changing rapidly. The explosive growth of the Web has brought multilingual text into the reach of nearly everyone with a computer. We live in a soup of information, an increasingly multilingual bouillabaisse. And to partake of this soup, we can use MT systems together with more and more tools and language processing technologies|information retrieval engines, - tomated text summarizers, and multimodal and multilingual displays. Though some of them may still be rather experimental, and though they may not quite t together well yet, it is clear that the future will o er text manipulation systems that contain all these functions, seamlessly interconnected in various ways.

Parallel Corpora for Contrastive and Translation Studies

Author: Irene Doval

Publisher: John Benjamins Publishing Company

Published: 2019-03-20

Total Pages: 313

ISBN-13: 9027262845

DOWNLOAD EBOOK

This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.

Advances in Corpus Linguistics

Author: Karin Aijmer

Publisher: Rodopi

Published: 2004

Total Pages: 430

ISBN-13: 9789042017412

DOWNLOAD EBOOK

This book provides an up-to-date survey of current issues and approaches in corpus linguistics in the form of twenty-two recent research articles. The articles cover a wide range of topics illustrating the diversity of research that is characteristic of corpus linguistics today. Central themes are the relationship between theory, intuition and corpus data and the role of corpora in linguistic research. The majority of the articles are empirical studies of specific aspects of English, ranging from lexis and grammar to discourse and pragmatics. Other areas explored are language variation, language change and development, language learning, cross-linguistic comparisons of English and other languages, and the development of linguistic software tools. The contributors to the volume include some of the leading figures in the field such as M.A.K. Halliday, John Sinclair, Geoffrey Leech and Michael Hoey. The theoretical and methodological issues addressed in the volume demonstrate clearly the steady advance of an expanding discipline inspired by an empirical, usage-based approach to the study of language. The volume is essential reading for researchers and students interested in the use of computer corpora in linguistic research.

Crosslingual Implementation of Linguistic Taggers Using Parallel Corpora

Author: Hani Safadi

Publisher: Lulu.com

Published: 2010-04-27

Total Pages: 74

ISBN-13: 0557448093

DOWNLOAD EBOOK

This book addresses the problem of creating linguistic taggers for resource-poor languages using existing taggers in resource rich languages. Linguistic taggers are classifiers that map individual words or phrases from a sentence to a set of tags. Linguistic taggers are usually trained using supervised learning algorithms.The proposed approach does not require that the input sentence be translated into the source language. Instead, projection of linguistic tags is accomplished through the use of a parallel corpus, which is a collection of texts that are available in a source language and a target language. The correspondence between words of the source and target language allows to project tags from source to target language words.A parallel corpus of the source and target languages might not be readily available for many language pairs. To deal with this problem, we describe a system for automatic acquisition of aligned, bilingual corpora from pre-specified domains on the World Wide Web.

Bitext Alignment

Author: Jörg Tiedemann

Publisher: Morgan & Claypool Publishers

Published: 2011

Total Pages: 168

ISBN-13: 1608455106

DOWNLOAD EBOOK

This book provides an overview of various techniques for the alignment of bitexts. It describes general concepts and strategies that can be applied to map corresponding parts in parallel documents on various levels of granularity. Bitexts are valuable linguistic resources for many different research fields and practical applications. The most predominant application is machine translation, in particular, statistical machine translation. However, there are various other threads that can be followed which may be supported by the rich linguistic knowledge implicitly stored in parallel resources. Bitexts have been explored in lexicography, word sense disambiguation, terminology extraction, computer-aided language learning and translation studies to name just a few. The book covers the essential tasks that have to be carried out when building parallel corpora starting from the collection of translated documents up to sub-sentential alignments. In particular, it describes various approaches to document alignment, sentence alignment, word alignment and tree structure alignment. It also includes a list of resources and a comprehensive review of the literature on alignment techniques. Table of Contents: Introduction / Basic Concepts and Terminology / Building Parallel Corpora / Sentence Alignment / Word Alignment / Phrase and Tree Alignment / Concluding Remarks

Web Corpus Construction

Author: Roland Schäfer

Publisher: Morgan & Claypool Publishers

Published: 2013-07-01

Total Pages: 197

ISBN-13: 1627053123

DOWNLOAD EBOOK

The World Wide Web constitutes the largest existing source of texts written in a great variety of languages. A feasible and sound way of exploiting this data for linguistic research is to compile a static corpus for a given language. There are several adavantages of this approach: (i) Working with such corpora obviates the problems encountered when using Internet search engines in quantitative linguistic research (such as non-transparent ranking algorithms). (ii) Creating a corpus from web data is virtually free. (iii) The size of corpora compiled from the WWW may exceed by several orders of magnitudes the size of language resources offered elsewhere. (iv) The data is locally available to the user, and it can be linguistically post-processed and queried with the tools preferred by her/him. This book addresses the main practical tasks in the creation of web corpora up to giga-token size. Among these tasks are the sampling process (i.e., web crawling) and the usual cleanups including boilerplate removal and removal of duplicated content. Linguistic processing and problems with linguistic processing coming from the different kinds of noise in web corpora are also covered. Finally, the authors show how web corpora can be evaluated and compared to other corpora (such as traditionally compiled corpora).

Neural Machine Translation

Author: Philipp Koehn

Publisher: Cambridge University Press

Published: 2020-06-18

Total Pages: 409

ISBN-13: 1108497322

DOWNLOAD EBOOK

Learn how to build machine translation systems with deep learning from the ground up, from basic concepts to cutting-edge research.

Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications

Author:

Publisher: Elsevier

Published: 2018-08-27

Total Pages: 540

ISBN-13: 0444640436

DOWNLOAD EBOOK

Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications, Volume 38, the latest release in this monograph that provides a cohesive and integrated exposition of these advances and associated applications, includes new chapters on Linguistics: Core Concepts and Principles, Grammars, Open-Source Libraries, Application Frameworks, Workflow Systems, Mathematical Essentials, Probability, Inference and Prediction Methods, Random Processes, Bayesian Methods, Machine Learning, Artificial Neural Networks for Natural Language Processing, Information Retrieval, Language Core Tasks, Language Understanding Applications, and more. The synergistic confluence of linguistics, statistics, big data, and high-performance computing is the underlying force for the recent and dramatic advances in analyzing and understanding natural languages, hence making this series all the more important. - Provides a thorough treatment of open-source libraries, application frameworks and workflow systems for natural language analysis and understanding - Presents new chapters on Linguistics: Core Concepts and Principles, Grammars, Open-Source Libraries, Application Frameworks, Workflow Systems, Mathematical Essentials, Probability, and more

Using Comparable Corpora for Under-resourced Areas of Machine Translation

Author: Inguna Skadina

Publisher:

Published: 2019

Total Pages: 323

ISBN-13: 9783319990057

DOWNLOAD EBOOK

This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.

Corpus-based Language Studies

Author: Tony McEnery

Publisher: Taylor & Francis

Published: 2006

Total Pages: 412

ISBN-13: 9780415286237

DOWNLOAD EBOOK

Covering the major approaches to the use of corpus data, this work gathers together influential readings from leading names in the discipline, including Biber, Widdowson, Sinclair, Carter and McCarthy.

Posts

Machine Translation and the Information Soup

Parallel Corpora for Contrastive and Translation Studies

Advances in Corpus Linguistics

Crosslingual Implementation of Linguistic Taggers Using Parallel Corpora

Bitext Alignment

Web Corpus Construction

Neural Machine Translation

Computational Analysis and Understanding of Natural Languages: Principles, Methods and Applications

Using Comparable Corpora for Under-resourced Areas of Machine Translation

Corpus-based Language Studies

Popular eBook

Recent Posts