The linguistic community tend to regard statistical methods, or more generally quantitative techniques, with a certain amount of fear and suspicion. There is a feeling that statistics falls in the province of science and mathematics and such methods may destroy the magic of the literary text. This book seeks to make quantitative methods and statistical techniques less forbidding and show how they can contribute to linguistic analysis and research. It present some mathematical and statistical properties of natural languages and introduces some of the quantitative methods which are of the most value in working empirically with texts and corpora. The various issues are illustrated with helpful examples from the most basic descriptive techniques to decision-taking techniques and to more sophisticated multivariate statistical language models.
Presents a wide variety of linguistic examples to demonstrate the use of statistics in summarizing data appropriately. The range of techniques introduced will help readers to evaluate and use literature employing statistical analysis, and to apply statistics in their own research.
Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook provides a straightforward introduction to the statistical analysis of language. Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using 'R', the leading computational statistics programme. The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models. The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models. Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data.
This book provides a linguist with a statistical toolkit for exploration and analysis of linguistic data. It employs R, a free software environment for statistical computing, which is increasingly popular among linguists. How to do Linguistics with R: Data exploration and statistical analysis is unique in its scope, as it covers a wide range of classical and cutting-edge statistical methods, including different flavours of regression analysis and ANOVA, random forests and conditional inference trees, as well as specific linguistic approaches, among which are Behavioural Profiles, Vector Space Models and various measures of association between words and constructions. The statistical topics are presented comprehensively, but without too much technical detail, and illustrated with linguistic case studies that answer non-trivial research questions. The book also demonstrates how to visualize linguistic data with the help of attractive informative graphs, including the popular ggplot2 system and Google visualization tools. This book has a companion website: http://doi.org/10.1075/z.195.website
Statistics in Language Research gives a non-technical but more or less complete treatment of Analysis of Variance (ANOVA) for language researchers. ANOVA is the most frequently used technique when handling the outcomes of research designs with more than two treatments or groups. This technique is used in all parts of linguistics which deal with observations obtained in survey studies and in (quasi-)experimental research, like applied linguistics, psycholinguistics, sociolinguistics, language and speech pathology and phonetics. Most statistical textbooks in the social sciences take examples typical of their own field and, in addition, omit subjects which are particularly relevant for language researchers, like power analysis, quasi F, F1, F2 and minF'. This book offers a thorough introduction to the basic principles of analysis of variance, based on examples taken from language research, and goes beyond the conventional topics treated in introductory textbooks, as it covers topics like 'violations of assumptions', 'missing data', 'problems in repeated measures designs', 'alternatives to analysis of variance' (such as randomization tests and multilevel analysis). Each chapter consists of four sections: treatment of the subject under discussion, a summary of relevant terms and concepts, a section devoted to reporting statistics, and finally an exercise section. After the first introductory chapter, in which fundamental concepts like 'variables', 'cases' and SPSS data formats are presented, the book continues with two 'refreshment' chapters, in which the principles of statistical testing are revised, focusing on the well-known t test. These chapters also deal with the essential, but often neglected concepts of 'statistical power' and 'sample size'. In every chapter examples of SPSS input and output are given.
Traditional approaches focused on significance tests have often been difficult for linguistics researchers to visualise. Statistics in Corpus Linguistics Research: A New Approach breaks these significance tests down for researchers in corpus linguistics and linguistic analysis, promoting a visual approach to understanding the performance of tests with real data, and demonstrating how to derive new intervals and tests. Accessibly written, this book discusses the ‘why’ behind the statistical model, allowing readers a greater facility for choosing their own methodologies. Accessibly written for those with little to no mathematical or statistical background, it explains the mathematical fundamentals of simple significance tests by relating them to confidence intervals. With sample datasets and easy-to-read visuals, this book focuses on practical issues, such as how to: • pose research questions in terms of choice and constraint; • employ confidence intervals correctly (including in graph plots); • select optimal significance tests (and what results mean); • measure the size of the effect of one variable on another; • estimate the similarity of distribution patterns; and • evaluate whether the results of two experiments significantly differ. Appropriate for anyone from the student just beginning their career to the seasoned researcher, this book is both a practical overview and valuable resource.
This textbook examines empirical linguistics from a theoretical linguist’s perspective. It provides both a theoretical discussion of what quantitative corpus linguistics entails and detailed, hands-on, step-by-step instructions to implement the techniques in the field. The statistical methodology and R-based coding from this book teach readers the basic and then more advanced skills to work with large data sets in their linguistics research and studies. Massive data sets are now more than ever the basis for work that ranges from usage-based linguistics to the far reaches of applied linguistics. This book presents much of the methodology in a corpus-based approach. However, the corpus-based methods in this book are also essential components of recent developments in sociolinguistics, historical linguistics, computational linguistics, and psycholinguistics. Material from the book will also be appealing to researchers in digital humanities and the many non-linguistic fields that use textual data analysis and text-based sensorimetrics. Chapters cover topics including corpus processing, frequencing data, and clustering methods. Case studies illustrate each chapter with accompanying data sets, R code, and exercises for use by readers. This book may be used in advanced undergraduate courses, graduate courses, and self-study.
This book in the Edinburgh Textbooks in Empirical Linguistics series is a comprehensive introduction to the statistics currently used in corpus linguistics. Statistical techniques and corpus applications - whether oriented towards linguistics or language engineering - often go hand in glove, and corpus linguists have used an increasingly wide variety of statistics, drawing on techniques developed in a great many fields. This is the first one-volume introduction to the subject.