Hidden Conditional Random Fields for Speech Recognition

Hidden Conditional Random Fields for Speech Recognition

Author: Yun-Hsuan Sung

Publisher: Stanford University

Published: 2010

Total Pages: 161

ISBN-13:

DOWNLOAD EBOOK

This thesis investigates using a new graphical model, hidden conditional random fields (HCRFs), for speech recognition. Conditional random fields (CRFs) are discriminative sequence models that have been successfully applied to several tasks in text processing, such as named entity recognition. Recently, there has been increasing interest in applying CRFs to speech recognition due to the similarity between speech and text processing. HCRFs are CRFs augmented with hidden variables that are capable of representing the dynamic changes and variations in speech signals. HCRFs also have the ability to incorporate correlated features from both speech signals and text without making strong independence assumptions among them. This thesis presents my current research on applying HCRFs to speech recognition and HCRFs' potential to replace the current hidden Markov model (HMM) for acoustic modeling. Experimental results of phone classification, phone recognition, and speaker adaptation are presented and discussed. Our monophone HCRFs outperform both maximum mutual information estimation (MMIE) and minimum phone error (MPE) trained HMMs and achieve the-start-of-the-art performance in TIMIT phone classification and recognition tasks. We also show how to jointly train acoustic models and language models in HCRFs, which shows improvement in the results. Maximum a posterior (MAP) and maximum conditional likelihood linear regression (MCLLR) successfully adapt speaker-independent models to speaker-dependent models with a small amount of adaptation data for HCRF speaker adaptation. Finally, we explore adding gender and dialect features for phone recognition, and experimental results are presented.


An Introduction to Conditional Random Fields

An Introduction to Conditional Random Fields

Author: Charles Sutton

Publisher: Now Pub

Published: 2012

Total Pages: 120

ISBN-13: 9781601985729

DOWNLOAD EBOOK

An Introduction to Conditional Random Fields provides a comprehensive tutorial aimed at application-oriented practitioners seeking to apply CRFs. The monograph does not assume previous knowledge of graphical modeling, and so is intended to be useful to practitioners in a wide variety of fields.


The Application of Hidden Markov Models in Speech Recognition

The Application of Hidden Markov Models in Speech Recognition

Author: Mark Gales

Publisher: Now Publishers Inc

Published: 2008

Total Pages: 125

ISBN-13: 1601981201

DOWNLOAD EBOOK

The Application of Hidden Markov Models in Speech Recognition presents the core architecture of a HMM-based LVCSR system and proceeds to describe the various refinements which are needed to achieve state-of-the-art performance.


Spoken Language Understanding

Spoken Language Understanding

Author: Gokhan Tur

Publisher: John Wiley & Sons

Published: 2011-05-03

Total Pages: 443

ISBN-13: 1119993946

DOWNLOAD EBOOK

Spoken language understanding (SLU) is an emerging field in between speech and language processing, investigating human/ machine and human/ human communication by leveraging technologies from signal processing, pattern recognition, machine learning and artificial intelligence. SLU systems are designed to extract the meaning from speech utterances and its applications are vast, from voice search in mobile devices to meeting summarization, attracting interest from both commercial and academic sectors. Both human/machine and human/human communications can benefit from the application of SLU, using differing tasks and approaches to better understand and utilize such communications. This book covers the state-of-the-art approaches for the most popular SLU tasks with chapters written by well-known researchers in the respective fields. Key features include: Presents a fully integrated view of the two distinct disciplines of speech processing and language processing for SLU tasks. Defines what is possible today for SLU as an enabling technology for enterprise (e.g., customer care centers or company meetings), and consumer (e.g., entertainment, mobile, car, robot, or smart environments) applications and outlines the key research areas. Provides a unique source of distilled information on methods for computer modeling of semantic information in human/machine and human/human conversations. This book can be successfully used for graduate courses in electronics engineering, computer science or computational linguistics. Moreover, technologists interested in processing spoken communications will find it a useful source of collated information of the topic drawn from the two distinct disciplines of speech processing and language processing under the new area of SLU.


Computational Linguistics and Intelligent Text Processing

Computational Linguistics and Intelligent Text Processing

Author: Alexander Gelbukh

Publisher: Springer Nature

Published: 2023-02-25

Total Pages: 683

ISBN-13: 3031243404

DOWNLOAD EBOOK

The two-volume set LNCS 13451 and 13452 constitutes revised selected papers from the CICLing 2019 conference which took place in La Rochelle, France, April 2019. The total of 95 papers presented in the two volumes was carefully reviewed and selected from 335 submissions. The book also contains 3 invited papers. The papers are organized in the following topical sections: General, Information extraction, Information retrieval, Language modeling, Lexical resources, Machine translation, Morphology, sintax, parsing, Name entity recognition, Semantics and text similarity, Sentiment analysis, Speech processing, Text categorization, Text generation, and Text mining.


Hybrid Random Fields

Hybrid Random Fields

Author: Antonino Freno

Publisher: Springer Science & Business Media

Published: 2011-04-11

Total Pages: 217

ISBN-13: 3642203086

DOWNLOAD EBOOK

This book presents an exciting new synthesis of directed and undirected, discrete and continuous graphical models. Combining elements of Bayesian networks and Markov random fields, the newly introduced hybrid random fields are an interesting approach to get the best of both these worlds, with an added promise of modularity and scalability. The authors have written an enjoyable book---rigorous in the treatment of the mathematical background, but also enlivened by interesting and original historical and philosophical perspectives. -- Manfred Jaeger, Aalborg Universitet The book not only marks an effective direction of investigation with significant experimental advances, but it is also---and perhaps primarily---a guide for the reader through an original trip in the space of probabilistic modeling. While digesting the book, one is enriched with a very open view of the field, with full of stimulating connections. [...] Everyone specifically interested in Bayesian networks and Markov random fields should not miss it. -- Marco Gori, Università degli Studi di Siena Graphical models are sometimes regarded---incorrectly---as an impractical approach to machine learning, assuming that they only work well for low-dimensional applications and discrete-valued domains. While guiding the reader through the major achievements of this research area in a technically detailed yet accessible way, the book is concerned with the presentation and thorough (mathematical and experimental) investigation of a novel paradigm for probabilistic graphical modeling, the hybrid random field. This model subsumes and extends both Bayesian networks and Markov random fields. Moreover, it comes with well-defined learning algorithms, both for discrete and continuous-valued domains, which fit the needs of real-world applications involving large-scale, high-dimensional data.


Hierarchical Neural Network Structures for Phoneme Recognition

Hierarchical Neural Network Structures for Phoneme Recognition

Author: Daniel Vasquez

Publisher: Springer Science & Business Media

Published: 2012-10-18

Total Pages: 146

ISBN-13: 3642344240

DOWNLOAD EBOOK

In this book, hierarchical structures based on neural networks are investigated for automatic speech recognition. These structures are mainly evaluated within the phoneme recognition task under the Hybrid Hidden Markov Model/Artificial Neural Network (HMM/ANN) paradigm. The baseline hierarchical scheme consists of two levels each which is based on a Multilayered Perceptron (MLP). Additionally, the output of the first level is used as an input for the second level. This system can be substantially speeded up by removing the redundant information contained at the output of the first level.


The Handbook of Multimodal-Multisensor Interfaces, Volume 2

The Handbook of Multimodal-Multisensor Interfaces, Volume 2

Author: Sharon Oviatt

Publisher: Morgan & Claypool

Published: 2018-10-08

Total Pages: 541

ISBN-13: 1970001690

DOWNLOAD EBOOK

The Handbook of Multimodal-Multisensor Interfaces provides the first authoritative resource on what has become the dominant paradigm for new computer interfaces: user input involving new media (speech, multi-touch, hand and body gestures, facial expressions, writing) embedded in multimodal-multisensor interfaces that often include biosignals. This edited collection is written by international experts and pioneers in the field. It provides a textbook, reference, and technology roadmap for professionals working in this and related areas. This second volume of the handbook begins with multimodal signal processing, architectures, and machine learning. It includes recent deep learning approaches for processing multisensorial and multimodal user data and interaction, as well as context-sensitivity. A further highlight is processing of information about users' states and traits, an exciting emerging capability in next-generation user interfaces. These chapters discuss real-time multimodal analysis of emotion and social signals from various modalities, and perception of affective expression by users. Further chapters discuss multimodal processing of cognitive state using behavioral and physiological signals to detect cognitive load, domain expertise, deception, and depression. This collection of chapters provides walk-through examples of system design and processing, information on tools and practical resources for developing and evaluating new systems, and terminology and tutorial support for mastering this rapidly expanding field. In the final section of this volume, experts exchange views on the timely and controversial challenge topic of multimodal deep learning. The discussion focuses on how multimodal-multisensor interfaces are most likely to advance human performance during the next decade.


Robust Automatic Speech Recognition

Robust Automatic Speech Recognition

Author: Jinyu Li

Publisher: Academic Press

Published: 2015-10-30

Total Pages: 308

ISBN-13: 0128026162

DOWNLOAD EBOOK

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications.The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided.The reader will: - Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition - Learn the links and relationship between alternative technologies for robust speech recognition - Be able to use the technology analysis and categorization detailed in the book to guide future technology development - Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition - The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks - Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment - Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques - Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years