History, Features, and Typology of Language Corpora

History, Features, and Typology of Language Corpora

Author: Niladri Sekhar Dash

Publisher: Springer

Published: 2018-02-01

Total Pages: 311

ISBN-13: 9811074585

DOWNLOAD EBOOK

This book discusses key issues of corpus linguistics like the definition of the corpus, primary features of a corpus, and utilization and limitations of corpora. It presents a unique classification scheme of language corpora to show how they can be studied from the perspective of genre, nature, text type, purpose, and application. A reference to parallel translation corpus is mandatory in the discussion of corpus generation, which the authors thoroughly address here, with a focus on Indian language corpora and English. Web-text corpus, a new development in corpus linguistics, is also discussed with elaborate reference to Indian web text corpora. The book also presents a short history of corpus generation and provides scenarios before and after the advent of computer-generated digital corpora. This book has several important features: it discusses many technical issues of the field in a lucid manner; contains extensive new diagrams and charts for easy comprehension; and presents discussions in simplified English to cater to the needs of non-native English readers. This is an important resource authored by academics who have many years of experience teaching and researching corpus linguistics. Its focus on Indian languages and on English corpora makes it applicable to students of graduate and postgraduate courses in applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language.


Developing Linguistic Corpora

Developing Linguistic Corpora

Author: Martin Wynne

Publisher: Oxbow Books Limited

Published: 2005

Total Pages: 100

ISBN-13:

DOWNLOAD EBOOK

A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.


Understanding Corpus Linguistics

Understanding Corpus Linguistics

Author: Danielle Barth

Publisher: Routledge

Published: 2021-11-18

Total Pages: 276

ISBN-13: 1000466752

DOWNLOAD EBOOK

This textbook introduces the fundamental concepts and methods of corpus linguistics for students approaching this topic for the first time, putting specific emphasis on the enormous linguistic diversity represented by approximately 7,000 human languages and broadening the scope of current concerns in general corpus linguistics. Including a basic toolkit to help the reader investigate language in different usage contexts, this book: Shows the relevance of corpora to a range of linguistic areas from phonology to sociolinguistics and discourse Covers recent developments in the application of corpus linguistics to the study of understudied languages and linguistic typology Features exercises, short problems, and questions Includes examples from real studies in over 15 languages plus multilingual corpora Providing the necessary corpus linguistics skills to critically evaluate and replicate studies, this book is essential reading for anyone studying corpus linguistics.


Corpus-based Perspectives in Linguistics

Corpus-based Perspectives in Linguistics

Author: Yuji Kawaguchi

Publisher: John Benjamins Publishing

Published: 2007

Total Pages: 464

ISBN-13: 9789027233189

DOWNLOAD EBOOK

UBLI has conducted field surveys since 2002 and built spoken language corpora for French, Spanish, Italian (Salentino dialect), Russian, Malaysian, Turkish, Japanese, and Canadian multilinguals. This volume features new research presented at the UBLI second workshop on Corpus Linguistics – Research Domain, which was held on September 14, 2006. The first part consisting of eleven presentations to this workshop shows a wide range of subjects within the area of corpus-based research, such as dictionary, linguistic atlas, dialect, translation, ancient texts, non-standard texts, sociolinguistics, second language acquisition, and natural language processing. The second part of this volume comprises ten additional contributions to both written and spoken corpora by the members and research assistants of UBLI.


Cross-Linguistic Corpora for the Study of Translations

Cross-Linguistic Corpora for the Study of Translations

Author: Silvia Hansen-Schirra

Publisher: Walter de Gruyter

Published: 2012-12-06

Total Pages: 320

ISBN-13: 3110260328

DOWNLOAD EBOOK

The book specifies a corpus architecture, including annotation and querying techniques, and its implementation. The corpus architecture is developed for empirical studies of translations, and beyond those for the study of texts which are inter-lingually comparable, particularly texts of similar registers. The compiled corpus, CroCo, is a resource for research and is, with some copyright restrictions, accessible to other research projects. Most of the research was undertaken as part of a DFG-Project into linguistic properties of translations. Fundamentally, this research project was a corpus-based investigation into the language pair English-German. The long-term goal is a contribution to the study of translation as a contact variety, and beyond this to language comparison and language contact more generally with the language pair English - German as our object languages. This goal implies a thorough interest in possible specific properties of translations, and beyond this in an empirical translation theory. The methodology developed is not restricted to the traditional exclusively system-based comparison of earlier days, where real-text excerpts or constructed examples are used as mere illustrations of assumptions and claims, but instead implements an empirical research strategy involving structured data (the sub-corpora and their relationships to each other, annotated and aligned on various theoretically motivated levels of representation), the formation of hypotheses and their operationalizations, statistics on the data, critical examinations of their significance, and interpretation against the background of system-based comparisons and other independent sources of explanation for the phenomena observed. Further applications of the resource developed in computational linguistics are outlined and evaluated.


Language Corpora Annotation and Processing

Language Corpora Annotation and Processing

Author: Niladri Sekhar Dash

Publisher: Springer Nature

Published: 2021

Total Pages:

ISBN-13: 9811629609

DOWNLOAD EBOOK

This book addresses the research, analysis, and description of the methods and processes that are used in the annotation and processing of language corpora in advanced, semi-advanced, and non-advanced languages. It provides the background information and empirical data needed to understand the nature and depth of problems related to corpus annotation and text processing and shows readers how the linguistic elements found in texts are analyzed and applied to develop language technology systems and devices. As such, it offers valuable insights for researchers, educators, and students of linguistics and language technology.


Utility and Application of Language Corpora

Utility and Application of Language Corpora

Author: Niladri Sekhar Dash

Publisher: Springer

Published: 2018-08-13

Total Pages: 308

ISBN-13: 9811318018

DOWNLOAD EBOOK

This book discusses some of the basic issues relating to corpus generation and the methods normally used to generate a corpus. Since corpus-related research goes beyond corpus generation, the book also addresses other major topics connected with the use and application of language corpora, namely, corpus readiness in the context of corpus sanitation and pre-editing of corpus texts; the application of statistical methods; and various text processing techniques. Importantly, it explores how corpora can be used as a primary or secondary resource in English language teaching, in creating dictionaries, in word sense disambiguation, in various language technologies, and in other branches of linguistics. Lastly, the book sheds light on the status quo of corpus generation in Indian languages and identifies current and future needs. Discussing various technical issues in the field in a lucid manner, providing extensive new diagrams and charts for easy comprehension, and using simplified English, the book is an ideal resource for non-native English readers. Written by academics with many years of experience teaching and researching corpus linguistics, its focus on Indian languages and on English corpora makes it applicable to graduate and postgraduate students of applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language.


Corpus Linguistics: An Introduction

Corpus Linguistics: An Introduction

Author: Dash, Niladri Sekhar

Publisher: Pearson Education India

Published: 2008

Total Pages: 208

ISBN-13: 8131752623

DOWNLOAD EBOOK

Corpus Linguistics: An Introduction will appeal to a wide spectrum of scholars, researchers, and particularly to students of linguistics. It offers guidelines for the creation and usage of corpora in the form of empirical language databases with direct functional and theoretical interpretation of a natural language. Drawn from original research and written in an accessible language and style, this book will create avenues for further advancements in mainstream and applied linguistics and language technology.


Routledge Encyclopedia of Technology and the Humanities

Routledge Encyclopedia of Technology and the Humanities

Author: Chan Sin-wai

Publisher: Taylor & Francis

Published: 2024-04-29

Total Pages: 389

ISBN-13: 1040005829

DOWNLOAD EBOOK

Routledge Encyclopedia of Technology and the Humanities is a pioneer attempt to introduce a wide range of disciplines in the emerging field of techno-humanities to the English-reading world. This book covers topics such as archaeology, cultural heritage, design, fashion, linguistics, music, philosophy, and translation. It has 20 chapters, contributed by 26 local and international scholars. Each chapter has its own theme and addresses issues of significant interest in the respective disciplines. References are provided at the end of each chapter for further exploration into the literature of the relevant areas. To facilitate an easy reading of the information presented in this volume, chapters have been arranged according to the alphabetical order of the topics covered. This Encyclopedia will appeal to researchers and professionals in the field of technology and the humanities, and can be used by undergraduate and graduate students studying the humanities.


Text Types and the History of English

Text Types and the History of English

Author: Manfred Görlach

Publisher: Walter de Gruyter

Published: 2008-08-22

Total Pages: 353

ISBN-13: 3110197162

DOWNLOAD EBOOK

The history of modern European languages has been largely determined by the range of functions they have acquired, particularly after 1500. This development necessitated a notable expansion of their syntax and lexis, but is most characteristically reflected in the conventionalization of text types. Starting from the German concept of Textsorte as developed from the 1960s onwards, the present account is a first comprehensive attempt at charting the field for the history and present-day situation of the English language. In text types, a designation is linked with a more or less stable form which guides the writer’s production as well as the reader's expectation, permitting one to recognize straightforward uses as well as deliberate misuses. Some two thousand of such designations are here listed with minimal definitions and dates for first occurrences. The discussion then concentrates on selected types, which are seen as especially illustrative for English: book dedications, cooking recipes, advertisements, church hymns, lexical entries, and jokes. Their functions and development over time are treated in correlation with their specific linguistic characteristics and adaptations to different period styles and social changes in the readership. The functional range of text types in traditions outside England and the consequences of the export of English categories are exemplified by the history of Scots/Scottish English and of English in India. The arguments are accompanied by a lavish supply of textual excerpts and more than fifty pages of facsimiles, which are especially relevant for insights derived from typographical features. A full bibliography and indices are provided at the end. The book will prove useful for decisions on the constitution of representative text corpora and stimulate research into a greater number of individual text types as well as contrastive analyses at least among European languages.