A practical guide to the construction of thesauri for use in information retrieval, written by leading experts in the field. Includes: planning and design; vocabulary control; specificity and compound terms; structure and relationships; auxiliary retrieval devices; multilingual thesauri; AAT Compound Term Rules. The US ANSI/NISO Z39.19 Thesaurus construction standard is also covered.
A practical guide to the construction of thesauri for use in information retrieval, written by leading experts in the field. Includes: planning and design; vocabulary control; specificity and compound terms; structure and relationships; auxiliary retrieval devices; multilingual thesauri; AAT Compound Term Rules. The US ANSI/NISO Z39.19 Thesaurus construction standard is also covered.
Many information professionals working in small units today fail to find the published tools for subject based organization that are appropriate to their local needs, whether they are archivists, special librarians, information officers, or knowledge or content managers. Large established standards for document description and organization are too unwieldy, unnecessarily detailed, or too expensive to install and maintain. In other cases the available systems are insufficient for a specialist environment, or don't bring things together in a helpful way. A purpose built, in-house system would seem to be the answer, but too often the skills necessary to create one are lacking. This practical text examines the criteria relevant to the selection of a subject management system, describes the characteristics of some common types of subject tool, and takes the novice step-by-step through the process of creating a system for a specialist environment. The methodology employed is a standard technique for the building of a thesaurus that incidentally creates a compatible classification or taxonomy, both of which may be used in a variety of ways for document or information management. Key areas covered are: What is a thesaurus? Tools for subject access and retrieval What a thesaurus is used for Why use a thesaurus? Examples of thesauri The structure of a thesaurus Thesaural relations Practical thesaurus construction The vocabulary of the thesaurus Building the systematic structure Conversion to alphabetic format Forms of entry in the thesaurus Maintaining the thesaurus Thesaurus software The wider environment. Readership: Although primarily aimed at the practising information professional, the book is also suitable for students of library and information science.
This detailed book is a “how-to” guide to building controlled vocabulary tools, cataloging and indexing cultural materials with terms and names from controlled vocabularies, and using vocabularies in search engines and databases to enhance discovery and retrieval online. Also covered are the following: What are controlled vocabularies and why are they useful? Which vocabularies exist for cataloging art and cultural objects? How should they be integrated in a cataloging system? How should they be used for indexing and for retrieval? How should an institution construct a local authority file? The links in a controlled vocabulary ensure that relationships are defined and maintained for both cataloging and retrieval, clarifying whether a rose window and a Catherine wheel are the same thing, or how pot-metal glass is related to the more general term stained glass. The book provides organizations and individuals with a practical tool for creating and implementing vocabularies as reference tools, sources of documentation, and powerful enhancements for online searching.
Explorations in Automatic Thesaurus Discovery presents an automated method for creating a first-draft thesaurus from raw text. It describes natural processing steps of tokenization, surface syntactic analysis, and syntactic attribute extraction. From these attributes, word and term similarity is calculated and a thesaurus is created showing important common terms and their relation to each other, common verb--noun pairings, common expressions, and word family members. The techniques are tested on twenty different corpora ranging from baseball newsgroups, assassination archives, medical X-ray reports, abstracts on AIDS, to encyclopedia articles on animals, even on the text of the book itself. The corpora range from 40,000 to 6 million characters of text, and results are presented for each in the Appendix. The methods described in the book have undergone extensive evaluation. Their time and space complexity are shown to be modest. The results are shown to converge to a stable state as the corpus grows. The similarities calculated are compared to those produced by psychological testing. A method of evaluation using Artificial Synonyms is tested. Gold Standards evaluation show that techniques significantly outperform non-linguistic-based techniques for the most important words in corpora. Explorations in Automatic Thesaurus Discovery includes applications to the fields of information retrieval using established testbeds, existing thesaural enrichment, semantic analysis. Also included are applications showing how to create, implement, and test a first-draft thesaurus.
Relationships abound in the library and information science (LIS) world. Those relationships may be social in nature, as, for instance, when we deal with human relationships among library personnel or relationships (i. e. , "public relations") between an information center and its clientele. The relationships may be educational, as, for example, when we examine the relationship between the curriculum of an accredited school and the needs of the work force it is preparing students to join. Or the relationships may be economic, as when we investigate the relationship between the cost of journals and the frequency with which they are cited. Many of the relationships of concern to us reflect phenomena entirely internal to the field: the relationship between manuscript collections, archives, and special collections; the relationship between end user search behavior and the effectiveness of searches; the relationship between access to and use of information resources; the relationship between recall and precision; the relationship between various bibliometric laws; etc. The list of such relationships could go on and on. The relationships addressed in this volume are restricted to those involved in the organization of recorded knowledge, which tend to have a conceptual or semantic basis, although statistical means are sometimes used in their discovery.
Covers planning and design of thesaurus systems, thesaurus construction standards, vocabulary control, specifity and compound terms, structure, and auxiliary retrieval devices. Discusses possible forms of thesaurus presentation, different types of thesauri, including multilingual thesauri merged vocabularies and searching thesauri, and looks at maintenance and updating, and computer aids.