Building and Using Comparable Corpora

Building and Using Comparable Corpora

Author: Serge Sharoff

Publisher: Springer Science & Business Media

Published: 2013-12-13

Total Pages: 333

ISBN-13: 3642201288

DOWNLOAD EBOOK

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.


Building and Using Comparable Corpora for Multilingual Natural Language Processing

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Author: Serge Sharoff

Publisher: Springer Nature

Published: 2023-08-23

Total Pages: 138

ISBN-13: 3031313844

DOWNLOAD EBOOK

This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.


Using Comparable Corpora for Under-Resourced Areas of Machine Translation

Using Comparable Corpora for Under-Resourced Areas of Machine Translation

Author: Inguna Skadiņa

Publisher: Springer

Published: 2019-02-06

Total Pages: 326

ISBN-13: 3319990047

DOWNLOAD EBOOK

This book provides an overview of how comparable corpora can be used to overcome the lack of parallel resources when building machine translation systems for under-resourced languages and domains. It presents a wealth of methods and open tools for building comparable corpora from the Web, evaluating comparability and extracting parallel data that can be used for the machine translation task. It is divided into several sections, each covering a specific task such as building, processing, and using comparable corpora, focusing particularly on under-resourced language pairs and domains. The book is intended for anyone interested in data-driven machine translation for under-resourced languages and domains, especially for developers of machine translation systems, computational linguists and language workers. It offers a valuable resource for specialists and students in natural language processing, machine translation, corpus linguistics and computer-assisted translation, and promotes the broader use of comparable corpora in natural language processing and computational linguistics.


Corpus Analysis for Language Studies at the University Level

Corpus Analysis for Language Studies at the University Level

Author: Giedrė Valūnaitė Oleškevičienė

Publisher: Cambridge Scholars Publishing

Published: 2021-02-08

Total Pages: 176

ISBN-13: 1527565947

DOWNLOAD EBOOK

This book highlights corpora use in teaching foreign languages in university education. It will appeal to both academics and practitioners interested in the process of teaching foreign languages at more advanced levels while applying corpus analysis and building tools for corpus annotation. It provides a detailed case study of analyzing the terminology of constitutional law in both English and Lithuanian as an example to illustrate the possibility of integrating corpus analysis tools into the process of teaching foreign languages in university education. The book reveals that initial linguistic knowledge is essential when teaching and learning foreign languages at more advanced levels while applying corpus annotation. In addition, it shows that, even though the use of new corpus software is perceived as a positive, there are still certain issues to be solved in this regard, such as the constant renewal of public computers in universities and the technical and methodological support for teachers while using corpora tools.


Investigating Wikipedia

Investigating Wikipedia

Author: Céline Poudat

Publisher: John Benjamins Publishing Company

Published: 2024-11-15

Total Pages: 272

ISBN-13: 9027246467

DOWNLOAD EBOOK

The present volume is intended as a reference book on Wikipedia corpus studies, from corpus construction to exploration and analysis. Wikipedia is a complex object, difficult to manipulate for linguists and corpus researchers. In addition to the encyclopedic articles consulted by millions of users, it contains vast spaces of written discussions, aka talk pages, where Wikipedia authors negotiate the collaborative editing of articles, make evaluations, or discuss related topics. The proposed volume covers Wikipedia articles, their revision histories, and discussions, with a focus on discussions, which have not been studied extensively so far and have also been neglected in previous corpus building efforts. Wikipedia discussions are instances of computer-mediated communication (CMC), thus constituting a completely different, interaction-oriented linguistic genre. Sophisticated tools and methods of linguistic annotation and corpus exploration are needed to exploit the huge and valuable corpus resources that can be constructed from the Wikipedia discussions. The present volume aims at encouraging and facilitating Wikipedia corpus studies, providing standards, recommendations, and innovative methods to build and explore Wikipedia corpora, and presenting corpus studies that make the most of the peculiarities of Wikipedia.


Machine Translation

Machine Translation

Author: Jinsong Su

Publisher: Springer Nature

Published: 2021-10-29

Total Pages: 137

ISBN-13: 9811675120

DOWNLOAD EBOOK

This book constitutes the refereed proceedings of the 17th China Conference on Machine Translation, CCMT 2020, held in Xining, China, in October 2021. The 10 papers presented in this volume were carefully reviewed and selected from 25 submissions and focus on all aspects of machine translation, including preprocessing, neural machine translation models, hybrid model, evaluation method, and post-editing.


Document Analysis and Recognition – ICDAR 2023 Workshops

Document Analysis and Recognition – ICDAR 2023 Workshops

Author: Mickael Coustaty

Publisher: Springer Nature

Published: 2023-08-14

Total Pages: 344

ISBN-13: 3031414985

DOWNLOAD EBOOK

This two-volume set LNCS 14193-14194 constitutes the proceedings of International Workshops co-located with the 17th International Conference on Document Analysis and Recognition, ICDAR 2023, held in San José, CA, USA, during August 21–26, 2023. The total of 43 regular papers presented in this book were carefully selected from 60 submissions. Part I contains 22 regular papers that stem from the following workshops: ICDAR 2023 Workshop on Computational Paleography (IWCP); ICDAR 2023 Workshop on Camera-Based Document Analysis and Recognition (CBDAR); ICDAR 2023 International Workshop on Graphics Recognition (GREC); ICDAR 2023 Workshop on Automatically Domain-Adapted and Personalized Document Analysis (ADAPDA); Part II contains 21 regular papers that stem from the following workshops: ICDAR 2023 Workshop on Machine Vision and NLP for Document Analysis (VINALDO); ICDAR 2023 International Workshop on Machine Learning (WML).


Unsolved!

Unsolved!

Author: Craig P. Bauer

Publisher: Princeton University Press

Published: 2017-05-22

Total Pages: 624

ISBN-13: 1400884799

DOWNLOAD EBOOK

Watch Craig Bauer discuss the Zodiac Killer’s cipher on HISTORY’s new miniseries The Hunt for the Zodiac Killer In 1953, a man was found dead from cyanide poisoning near the Philadelphia airport with a picture of a Nazi aircraft in his wallet. Taped to his abdomen was an enciphered message. In 1912, a book dealer named Wilfrid Voynich came into possession of an illuminated cipher manuscript once belonging to Emperor Rudolf II, who was obsessed with alchemy and the occult. Wartime codebreakers tried—and failed—to unlock the book's secrets, and it remains an enigma to this day. In this lively and entertaining book, Craig Bauer examines these and other vexing ciphers yet to be cracked. Some may reveal the identity of a spy or serial killer, provide the location of buried treasure, or expose a secret society—while others may be elaborate hoaxes. Unsolved! begins by explaining the basics of cryptology, and then explores the history behind an array of unsolved ciphers. It looks at ancient ciphers, ciphers created by artists and composers, ciphers left by killers and victims, Cold War ciphers, and many others. Some are infamous, like the ciphers in the Zodiac letters, while others were created purely as intellectual challenges by figures such as Nobel Prize–winning physicist Richard P. Feynman. Bauer lays out the evidence surrounding each cipher, describes the efforts of geniuses and eccentrics—in some cases both—to decipher it, and invites readers to try their hand at puzzles that have stymied so many others. Unsolved! takes readers from the ancient world to the digital age, providing an amazing tour of many of history's greatest unsolved ciphers.


Healthcare Data Analytics

Healthcare Data Analytics

Author: Chandan K. Reddy

Publisher: CRC Press

Published: 2015-06-23

Total Pages: 756

ISBN-13: 148223212X

DOWNLOAD EBOOK

At the intersection of computer science and healthcare, data analytics has emerged as a promising tool for solving problems across many healthcare-related disciplines. Supplying a comprehensive overview of recent healthcare analytics research, Healthcare Data Analytics provides a clear understanding of the analytical techniques currently available


Database Systems for Advanced Applications

Database Systems for Advanced Applications

Author: An Liu

Publisher: Springer

Published: 2015-07-29

Total Pages: 336

ISBN-13: 3319223240

DOWNLOAD EBOOK

DASFAA is an annual international database conference, located in the Asia-Pacific region,which show cases state-of-the-art R & D activities in databases-terms and their applications. It provides a forum for technical presentations and discussions among database researchers, developers and users from academia, business and industry. DASFAA 2015 the 20th in the series, was held during April 20-23, 2015 in Hanoi, Vietnam. In this year, we carefully selected two workshops, each focusing on specific research issues that contribute to the main themes of the DASFAA conference. This volume contains the final versions of papers accepted for the two workshops: Second International Workshop on Semantic Computing and Personalization (SeCoP 2015); Second International Workshop on Big Data Management and Service (BDMS 2015); and a Poster Session. [All the workshops were selected via a public call-for-proposals process. The workshop organizers put a tremendous amount of effort into soliciting and - lecting papers with a balance of high quality, new ideas and new applications. We asked all workshops to follow a rigid paper selection process, including the procedure to ensure that any Program Committee members are excluded from the paper review process of any paper they are involved with. A requirement about the overall paper acceptance rate of no more than 50% was also imposed on all the workshops.]