Programming for Corpus Linguistics

Programming for Corpus Linguistics

Author: Oliver Mason

Publisher: Edinburgh Textbooks in Empiric

Published: 2000

Total Pages: 0

ISBN-13: 9780748614073

DOWNLOAD EBOOK

Specialised linguistic research needs can no longer be met by available software. This book enables the researcher to write programs for text and corpus processing, using the popular and easy to learn Java language.


Quantitative Corpus Linguistics with R

Quantitative Corpus Linguistics with R

Author: Stefan Th. Gries

Publisher: Routledge

Published: 2009-03-04

Total Pages: 257

ISBN-13: 1135895600

DOWNLOAD EBOOK

The first textbook of its kind, Quantitative Corpus Linguistics with R demonstrates how to use the open source programming language R for corpus linguistic analyses. Computational and corpus linguists doing corpus work will find that R provides an enormous range of functions that currently require several programs to achieve – searching and processing corpora, arranging and outputting the results of corpus searches, statistical evaluation, and graphing.


Corpus Linguistics

Corpus Linguistics

Author: Douglas Biber

Publisher: Cambridge University Press

Published: 1998-04-23

Total Pages: 324

ISBN-13: 9780521499576

DOWNLOAD EBOOK

An investigation into the way people use language in speech and writing, this volume introduces the corpus-based approach, which is based on analysis of large databases of real language examples stored on computer.


A Practical Handbook of Corpus Linguistics

A Practical Handbook of Corpus Linguistics

Author: Magali Paquot

Publisher: Springer Nature

Published: 2021-05-04

Total Pages: 686

ISBN-13: 3030462161

DOWNLOAD EBOOK

This handbook is a comprehensive practical resource on corpus linguistics. It features a range of basic and advanced approaches, methods and techniques in corpus linguistics, from corpus compilation principles to quantitative data analyses. The Handbook is organized in six Parts. Parts I to III feature chapters that discuss key issues and the know-how related to various topics around corpus design, methods and corpus types. Parts IV-V aim to offer a user-friendly introduction to the quantitative analysis of corpus data: for each statistical technique discussed, chapters provide a practical guide with R and come with supplementary online material. Part VI focuses on how to write a corpus linguistic paper and how to meta-analyze corpus linguistic research. The volume can serve as a course book as well as for individual study. It will be an essential reading for students of corpus linguistics as well as experienced researchers who want to expand their knowledge of the field.


Programming for Corpus Linguistics with Python and Dataframes

Programming for Corpus Linguistics with Python and Dataframes

Author: Daniel Keller

Publisher: Cambridge University Press

Published: 2024-06-30

Total Pages: 226

ISBN-13: 1108916384

DOWNLOAD EBOOK

This Element offers intermediate or experienced programmers algorithms for Corpus Linguistic (CL) programming in the Python language using dataframes that provide a fast, efficient, intuitive set of methods for working with large, complex datasets such as corpora. This Element demonstrates principles of dataframe programming applied to CL analyses, as well as complete algorithms for creating concordances; producing lists of collocates, keywords, and lexical bundles; and performing key feature analysis. An additional algorithm for creating dataframe corpora is presented including methods for tokenizing, part-of-speech tagging, and lemmatizing using spaCy. This Element provides a set of core skills that can be applied to a range of CL research questions, as well as to original analyses not possible with existing corpus software.


Practical Corpus Linguistics

Practical Corpus Linguistics

Author: Martin Weisser

Publisher: John Wiley & Sons

Published: 2016-02-16

Total Pages: 306

ISBN-13: 1118831888

DOWNLOAD EBOOK

This is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed. Designed to equip readers with the technical skills necessary to analyze and interpret language data, both written and (orthographically) transcribed Introduces a number of easy-to-use, yet powerful, free analysis resources consisting of standalone programs and web interfaces for use with Windows, Mac OS X, and Linux Each section includes practical exercises, a list of sources and further reading, and illustrated step-by-step introductions to analysis tools Requires only a basic knowledge of computer concepts in order to develop the specific linguistic analysis skills required for understanding/analyzing corpus data


Natural Language Processing for Corpus Linguistics

Natural Language Processing for Corpus Linguistics

Author: Jonathan Dunn

Publisher: Cambridge University Press

Published: 2022-03-31

Total Pages: 149

ISBN-13: 1009083740

DOWNLOAD EBOOK

Corpus analysis can be expanded and scaled up by incorporating computational methods from natural language processing. This Element shows how text classification and text similarity models can extend our ability to undertake corpus linguistics across very large corpora. These computational methods are becoming increasingly important as corpora grow too large for more traditional types of linguistic analysis. We draw on five case studies to show how and why to use computational methods, ranging from usage-based grammar to authorship analysis to using social media for corpus-based sociolinguistics. Each section is accompanied by an interactive code notebook that shows how to implement the analysis in Python. A stand-alone Python package is also available to help readers use these methods with their own data. Because large-scale analysis introduces new ethical problems, this Element pairs each new methodology with a discussion of potential ethical implications.


Essential Python for Corpus Linguistics

Essential Python for Corpus Linguistics

Author: Mark Johnson

Publisher: Wiley-Blackwell

Published: 2008

Total Pages: 208

ISBN-13: 9781405145640

DOWNLOAD EBOOK

Linguistic research increasingly relies on large electronic corpora for its primary data. While off-the-shelf programs can perform a set of standard searches, specialized questions usually require a custom-written program to find their answers. Essential Python for Corpus Linguistics uses the programming language Python to explain how to write simple programs that extract linguistically useful information, such as the frequency of a given utterance in a particular context within a corpus, or instances of certain phrasal structures in a Treebank. Assuming no prior programming background, the book provides numerous example programs that search for phonological, morphological and syntactic constructions in corpora, and the associated web site provides sample data and programs, which make it easy to start working independently. This book is a valuable resource for linguists who use corpus methods but have no programming training.


Corpus Linguistics and Statistics with R

Corpus Linguistics and Statistics with R

Author: Guillaume Desagulier

Publisher: Springer

Published: 2017-11-17

Total Pages: 359

ISBN-13: 3319645722

DOWNLOAD EBOOK

This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.


Developing Linguistic Corpora

Developing Linguistic Corpora

Author: Martin Wynne

Publisher: Oxbow Books Limited

Published: 2005

Total Pages: 100

ISBN-13:

DOWNLOAD EBOOK

A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.