Building and Using Comparable Corpora for Multilingual Natural Language Processing

Building and Using Comparable Corpora for Multilingual Natural Language Processing

Author: Serge Sharoff

Publisher: Springer Nature

Published: 2023-08-23

Total Pages: 138

ISBN-13: 3031313844

DOWNLOAD EBOOK

This book provides a comprehensive overview of methods to build comparable corpora and of their applications, including machine translation, cross-lingual transfer, and various kinds of multilingual natural language processing. The authors begin with a brief history on the topic followed by a comparison to parallel resources and an explanation of why comparable corpora have become more widely used. In particular, they provide the basis for the multilingual capabilities of pre-trained models, such as BERT or GPT. The book then focuses on building comparable corpora, aligning their sentences to create a database of suitable translations, and using these sentence translations to produce dictionaries and term banks. Then, it is explained how comparable corpora can be used to build machine translation engines and to develop a wide variety of multilingual applications.


Building and Using Comparable Corpora

Building and Using Comparable Corpora

Author: Serge Sharoff

Publisher: Springer Science & Business Media

Published: 2013-12-13

Total Pages: 333

ISBN-13: 3642201288

DOWNLOAD EBOOK

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume provides a reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.


Corpus Analysis for Language Studies at the University Level

Corpus Analysis for Language Studies at the University Level

Author: Giedrė Valūnaitė Oleškevičienė

Publisher: Cambridge Scholars Publishing

Published: 2021-02-08

Total Pages: 176

ISBN-13: 1527565947

DOWNLOAD EBOOK

This book highlights corpora use in teaching foreign languages in university education. It will appeal to both academics and practitioners interested in the process of teaching foreign languages at more advanced levels while applying corpus analysis and building tools for corpus annotation. It provides a detailed case study of analyzing the terminology of constitutional law in both English and Lithuanian as an example to illustrate the possibility of integrating corpus analysis tools into the process of teaching foreign languages in university education. The book reveals that initial linguistic knowledge is essential when teaching and learning foreign languages at more advanced levels while applying corpus annotation. In addition, it shows that, even though the use of new corpus software is perceived as a positive, there are still certain issues to be solved in this regard, such as the constant renewal of public computers in universities and the technical and methodological support for teachers while using corpora tools.


Data Analytics and Management in Data Intensive Domains

Data Analytics and Management in Data Intensive Domains

Author: Alexander Sychev

Publisher: Springer Nature

Published: 2021-07-15

Total Pages: 231

ISBN-13: 3030812006

DOWNLOAD EBOOK

This book constitutes the post-conference proceedings of the 22nd International Conference on Data Analytics and Management in Data Intensive Domains, DAMDID/RCDL 2020, held in Voronezh, Russia, in October 2020*. The 16 revised full papers and two keynotes were carefully reviewed and selected from 60 submissions. The papers are organized in the following topical sections: data Integration, conceptual models and ontologies; data management in semantic web; data analysis in medicine; data analysis in astronomy; information extraction from text. * The conference was held virtually due to the COVID-19 pandemic.


Computational Phraseology

Computational Phraseology

Author: Gloria Corpas Pastor

Publisher: John Benjamins Publishing Company

Published: 2020-05-15

Total Pages: 341

ISBN-13: 9027261393

DOWNLOAD EBOOK

Whether you wish to deliver on a promise, take a walk down memory lane or even on the wild side, phraseological units (also often referred to as phrasemes or multiword expressions) are present in most communicative situations and in all world’s languages. Phraseology, the study of phraseological units, has therefore become a rare unifying theme across linguistic theories. In recent years, an increasing number of studies have been concerned with the computational treatment of multiword expressions: these pertain among others to their automatic identification, extraction or translation, and to the role they play in various Natural Language Processing applications. Computational Phraseology is a comparatively new field where better understanding and more advances are urgently needed. This book aims to address this pressing need, by bringing together contributions focusing on different perspectives of this promising interdisciplinary field.


CLARIN

CLARIN

Author: Darja Fišer

Publisher: Walter de Gruyter GmbH & Co KG

Published: 2022-10-24

Total Pages: 820

ISBN-13: 3110767376

DOWNLOAD EBOOK

CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU). Watch our talk with the editors Darja Fišer and Andreas Witt here: https://youtu.be/ZOoiGbmMbxI


Recent Advances in Computational Terminology

Recent Advances in Computational Terminology

Author: Didier Bourigault

Publisher: John Benjamins Publishing

Published: 2001-06-15

Total Pages: 400

ISBN-13: 9027298165

DOWNLOAD EBOOK

This first collection of selected articles from researchers in automatic analysis, storage, and use of terminology, and specialists in applied linguistics, computational linguistics, information retrieval, and artificial intelligence offers new insights on computational terminology. The recent needs for intelligent information access, automatic query translation, cross-lingual information retrieval, knowledge management, and document handling have led practitioners and engineers to focus on automated term handling. This book offers new perspectives on their expectations. It will be of interest to terminologists, translators, language or knowledge engineers, librarians and all others dependent on the automation of terminology processing in professional practices. The articles cover themes such as automatic thesaurus construction, automatic term acquisition, automatic term translation, automatic indexing and abstracting, and computer-aided knowledge acquisition. The high academic standing of the contributors together with their experience in terminology management results in a set of contributions that tackle original and unique scientific issues in correlation with genuine applications of terminology processing.


New Advances in Translation Technology

New Advances in Translation Technology

Author: Yuhong Peng

Publisher: Springer Nature

Published: 2024

Total Pages: 282

ISBN-13: 9819729580

DOWNLOAD EBOOK

From using machine learning to shave seconds off translations, to using natural language processing for accurate real-time translation services, this book covers all the aspects. The world of translation technology is ever-evolving, making the task of staying up to date with the most advanced methods a daunting yet rewarding undertaking. That is why we have edited this bookto provide readers with an up-to-date guide to the new advances in translation technology. In this book, readers can expect to find a comprehensive overview of all the latest developments in the field of translation technology. Not only that, the authors dive into the exciting possibilities of artificial intelligence in translation, exploring its potential to revolutionize the way languages are translated and understood. The authors also explore aspects of the teaching of translation technology. Teaching translation technology to students is essential in ensuring the future of this field. With advances in technology such as machine learning, natural language processing, and artificial intelligence, it is important to equip students with the skills to keep up with the latest developments in the field. This book is the definitive guide to translation technology and all of its associated potential. With chapters written by leading translation technology experts and thought leaders, this book is an essential point of reference for anyone looking to understand the breathtaking potential of translation technology.


Handbook of Linguistic Annotation

Handbook of Linguistic Annotation

Author: Nancy Ide

Publisher: Springer

Published: 2017-06-16

Total Pages: 1440

ISBN-13: 9402408819

DOWNLOAD EBOOK

This handbook offers a thorough treatment of the science of linguistic annotation. Leaders in the field guide the reader through the process of modeling, creating an annotation language, building a corpus and evaluating it for correctness. Essential reading for both computer scientists and linguistic researchers.Linguistic annotation is an increasingly important activity in the field of computational linguistics because of its critical role in the development of language models for natural language processing applications. Part one of this book covers all phases of the linguistic annotation process, from annotation scheme design and choice of representation format through both the manual and automatic annotation process, evaluation, and iterative improvement of annotation accuracy. The second part of the book includes case studies of annotation projects across the spectrum of linguistic annotation types, including morpho-syntactic tagging, syntactic analyses, a range of semantic analyses (semantic roles, named entities, sentiment and opinion), time and event and spatial analyses, and discourse level analyses including discourse structure, co-reference, etc. Each case study addresses the various phases and processes discussed in the chapters of part one.


Computational Collective Intelligence

Computational Collective Intelligence

Author: Ngoc Thanh Nguyen

Publisher: Springer Nature

Published: 2020-11-23

Total Pages: 908

ISBN-13: 3030630072

DOWNLOAD EBOOK

This volume constitutes the refereed proceedings of the 12th International Conference on Computational Collective Intelligence, ICCCI 2020, held in Da Nang, Vietnam, in November 2020.* The 70 full papers presented were carefully reviewed and selected from 314 submissions. The papers are grouped in topical sections on: knowledge engineering and semantic web; social networks and recommender systems; collective decision-making; applications of collective intelligence; data mining methods and applications; machine learning methods; deep learning and applications for industry 4.0; computer vision techniques; biosensors and biometric techniques; innovations in intelligent systems; natural language processing; low resource languages processing; computational collective intelligence and natural language processing; computational intelligence for multimedia understanding; and intelligent processing of multimedia in web systems. *The conference was held virtually due to the COVID-19 pandemic.