Unlike other forms of adaptive testing, multistage testing (MST) is highly suitable for testing educational achievement because it can be adapted to educational surveys and student testing. This volume provides the first unified source of information on the design, psychometrics, implementation, and operational use of MST. It shows how to apply theoretical statistical tools to testing in novel and useful ways. It also explains how to explicitly tie the assumptions made by each model to observable (or at least inferable) data conditions.
The goal of this guide and manual is to provide a practical and brief overview of the theory on computerized adaptive testing (CAT) and multistage testing (MST) and to illustrate the methodologies and applications using R open source language and several data examples. Implementation relies on the R packages catR and mstR that have been already or are being developed by the first author (with the team) and that include some of the newest research algorithms on the topic. The book covers many topics along with the R-code: the basics of R, theoretical overview of CAT and MST, CAT designs, CAT assembly methodologies, CAT simulations, catR package, CAT applications, MST designs, IRT-based MST methodologies, tree-based MST methodologies, mstR package, and MST applications. CAT has been used in many large-scale assessments over recent decades, and MST has become very popular in recent years. R open source language also has become one of the most useful tools for applications in almost all fields, including business and education. Though very useful and popular, R is a difficult language to learn, with a steep learning curve. Given the obvious need for but with the complex implementation of CAT and MST, it is very difficult for users to simulate or implement CAT and MST. Until this manual, there has been no book for users to design and use CAT and MST easily and without expense; i.e., by using the free R software. All examples and illustrations are generated using predefined scripts in R language, available for free download from the book's website.
This book offers a comprehensive introduction to the latest developments in the theory and practice of CAT. It can be used both as a basic reference and a valuable resource on test theory. It covers such topics as item selection and ability estimation, item pool development and maintenance, item calibration and model fit, and testlet-based adaptive testing, as well as the operational aspects of existing large-scale CAT programs.
The arrival of the computer in educational and psychological testing has led to the current popularity of adaptive testing---a testing format in which the computer uses statistical information about the test items to automatically adapt their selection to a real-time update of the test taker’s ability estimate. This book covers such key features of adaptive testing as item selection and ability estimation, adaptive testing with multidimensional abilities, sequencing adaptive test batteries, multistage adaptive testing, item-pool design and maintenance, estimation of item and item-family parameters, item and person fit, as well as adaptive mastery and classification testing. It also shows how these features are used in the daily operations of several large-scale adaptive testing programs.
Integrating Timing Considerations to Improve Testing Practices synthesizes a wealth of theory and research on time issues in assessment into actionable advice for test development, administration, and scoring. One of the major advantages of computer-based testing is the capability to passively record test-taking metadata—including how examinees use time and how time affects testing outcomes. This has opened many questions for testing administrators. Is there a trade-off between speed and accuracy in test taking? What considerations should influence equitable decisions about extended-time accommodations? How can test administrators use timing data to balance the costs and resulting validity of tests administered at commercial testing centers? In this comprehensive volume, experts in the field discuss the impact of timing considerations, constraints, and policies on valid score interpretations; administrative accommodations, test construction, and examinees’ experiences and behaviors; and how to implement the findings into practice. These 12 chapters provide invaluable resources for testing professionals to better understand the inextricable links between effective time allocation and the purposes of high-stakes testing.
The general theme of this book is to present the applications of artificial intelligence (AI) in test development. In particular, this book includes research and successful examples of using AI technology in automated item generation, automated test assembly, automated scoring, and computerized adaptive testing. By utilizing artificial intelligence, the efficiency of item development, test form construction, test delivery, and scoring could be dramatically increased. Chapters on automated item generation offer different perspectives related to generating a large number of items with controlled psychometric properties including the latest development of using machine learning methods. Automated scoring is illustrated for different types of assessments such as speaking and writing from both methodological aspects and practical considerations. Further, automated test assembly is elaborated for the conventional linear tests from both classical test theory and item response theory perspectives. Item pool design and assembly for the linear-on-the-fly tests elaborates more complications in practice when test security is a big concern. Finally, several chapters focus on computerized adaptive testing (CAT) at either item or module levels. CAT is further illustrated as an effective approach to increasing test-takers’ engagement in testing. In summary, the book includes both theoretical, methodological, and applied research and practices that serve as the foundation for future development. These chapters provide illustrations of efforts to automate the process of test development. While some of these automation processes have become common practices such as automated test assembly, automated scoring, and computerized adaptive testing, some others such as automated item generation calls for more research and exploration. When new AI methods are emerging and evolving, it is expected that researchers can expand and improve the methods for automating different steps in test development to enhance the automation features and practitioners can adopt quality automation procedures to improve assessment practices.
First thorough treatment of multidimensional item response theory Description of methods is supported by numerous practical examples Describes procedures for multidimensional computerized adaptive testing
The second edition of the Handbook of Test Development provides graduate students and professionals with an up-to-date, research-oriented guide to the latest developments in the field. Including thirty-two chapters by well-known scholars and practitioners, it is divided into five sections, covering the foundations of test development, content definition, item development, test design and form assembly, and the processes of test administration, documentation, and evaluation. Keenly aware of developments in the field since the publication of the first edition, including changes in technology, the evolution of psychometric theory, and the increased demands for effective tests via educational policy, the editors of this edition include new chapters on assessing noncognitive skills, measuring growth and learning progressions, automated item generation and test assembly, and computerized scoring of constructed responses. The volume also includes expanded coverage of performance testing, validity, fairness, and numerous other topics. Edited by Suzanne Lane, Mark R. Raymond, and Thomas M. Haladyna, The Handbook of Test Development, 2nd edition, is based on the revised Standards for Educational and Psychological Testing, and is appropriate for graduate courses and seminars that deal with test development and usage, professional testing services and credentialing agencies, state and local boards of education, and academic libraries serving these groups.
D-scoring Method of Measurement presents a unified framework of classical and latent measurement referred to as D-scoring method of measurement (DSM). Provided are detailed descriptions of DSM procedures and illustrative examples of how to apply the DSM in various scenarios of measurement. The DSM is designed to combine merits of the traditional CTT and IRT for the purpose of transparency, ease of interpretations, computational simplicity of test scoring and scaling, and practical efficiency, particularly in large-scale assessments. Through detailed descriptions of DSM procedures, this book shows how practical applications of such procedures are facilitated by the inclusion of operationalized guidance for their execution using the computer program DELTA for DSM-based scoring, equating, and item analysis of test data. In doing so, the book shows how DSM procedures can be readily translated into computer source codes for other popular software packages such as R. D-scoring Method of Measurement equips researchers and practitioners in the field of educational and psychological measurement with a comprehensive understanding of the DSM as a unified framework of classical and latent scoring, equating, and psychometric analysis.
This new text provides the most current coverage of measurement and psychometrics in a single volume. Authors W. Holmes Finch and Brian F. French first review the basics of psychometrics and measurement, before moving on to more complex topics such as equating and scaling, item response theory, standard setting, and computer adaptive testing. Also included are discussions of cutting-edge topics utilized by practitioners in the field, such as automated test development, game-based assessment, and automated test scoring. This book is ideal for use as a primary text for graduate-level psychometrics/measurement courses, as well as for researchers in need of a broad resource for understanding test theory. Features: "How it Works" and "Psychometrics in the Real World" boxes break down important concepts through worked examples, and show how theory can be applied to practice. End-of-chapter exercises allow students to test their comprehension of the material, while suggested readings and website links provide resources for further investigation. A collection of free online resources include the full output from R, SPSS, and Excel for each of the analyses conducted in the book, as well as additional exercises, sample homework assignments, answer keys, and PowerPoint lecture slides.