Robust Speech Recognition in Embedded Systems and PC Applications

Robust Speech Recognition in Embedded Systems and PC Applications

Author: Jean-Claude Junqua

Publisher: Springer Science & Business Media

Published: 2006-04-18

Total Pages: 193

ISBN-13: 0306470276

DOWNLOAD EBOOK

Robust Speech Recognition in Embedded Systems and PC Applications provides a link between the technology and the application worlds. As speech recognition technology is now good enough for a number of applications and the core technology is well established around hidden Markov models many of the differences between systems found in the field are related to implementation variants. We distinguish between embedded systems and PC-based applications. Embedded applications are usually cost sensitive and require very simple and optimized methods to be viable. Robust Speech Recognition in Embedded Systems and PC Applications reviews the problems of robust speech recognition, summarizes the current state of the art of robust speech recognition while providing some perspectives, and goes over the complementary technologies that are necessary to build an application, such as dialog and user interface technologies. Robust Speech Recognition in Embedded Systems and PC Applications is divided into five chapters. The first one reviews the main difficulties encountered in automatic speech recognition when the type of communication is unknown. The second chapter focuses on environment-independent/adaptive speech recognition approaches and on the mainstream methods applicable to noise robust speech recognition. The third chapter discusses several critical technologies that contribute to making an application usable. It also provides some design recommendations on how to design prompts, generate user feedback and develop speech user interfaces. The fourth chapter reviews several techniques that are particularly useful for embedded systems or to decrease computational complexity. It also presents some case studies for embedded applications and PC-based systems. Finally, the fifth chapter provides a future outlook for robust speech recognition, emphasizing the areas that the author sees as the most promising for the future. Robust Speech Recognition in Embedded Systems and PC Applications serves as a valuable reference and although not intended as a formal University textbook, contains some material that can be used for a course at the graduate or undergraduate level. It is a good complement for the book entitled Robustness in Automatic Speech Recognition: Fundamentals and Applications co-authored by the same author.


Dynamic Speech Models

Dynamic Speech Models

Author: Li Deng

Publisher: Springer Nature

Published: 2022-05-31

Total Pages: 105

ISBN-13: 3031025555

DOWNLOAD EBOOK

Speech dynamics refer to the temporal characteristics in all stages of the human speech communication process. This speech “chain” starts with the formation of a linguistic message in a speaker's brain and ends with the arrival of the message in a listener's brain. Given the intricacy of the dynamic speech process and its fundamental importance in human communication, this monograph is intended to provide a comprehensive material on mathematical models of speech dynamics and to address the following issues: How do we make sense of the complex speech process in terms of its functional role of speech communication? How do we quantify the special role of speech timing? How do the dynamics relate to the variability of speech that has often been said to seriously hamper automatic speech recognition? How do we put the dynamic process of speech into a quantitative form to enable detailed analyses? And finally, how can we incorporate the knowledge of speech dynamics into computerized speech analysis and recognition algorithms? The answers to all these questions require building and applying computational models for the dynamic speech process. What are the compelling reasons for carrying out dynamic speech modeling? We provide the answer in two related aspects. First, scientific inquiry into the human speech code has been relentlessly pursued for several decades. As an essential carrier of human intelligence and knowledge, speech is the most natural form of human communication. Embedded in the speech code are linguistic (as well as para-linguistic) messages, which are conveyed through four levels of the speech chain. Underlying the robust encoding and transmission of the linguistic messages are the speech dynamics at all the four levels. Mathematical modeling of speech dynamics provides an effective tool in the scientific methods of studying the speech chain. Such scientific studies help understand why humans speak as they do and how humans exploit redundancy and variability by way of multitiered dynamic processes to enhance the efficiency and effectiveness of human speech communication. Second, advancement of human language technology, especially that in automatic recognition of natural-style human speech is also expected to benefit from comprehensive computational modeling of speech dynamics. The limitations of current speech recognition technology are serious and are well known. A commonly acknowledged and frequently discussed weakness of the statistical model underlying current speech recognition technology is the lack of adequate dynamic modeling schemes to provide correlation structure across the temporal speech observation sequence. Unfortunately, due to a variety of reasons, the majority of current research activities in this area favor only incremental modifications and improvements to the existing HMM-based state-of-the-art. For example, while the dynamic and correlation modeling is known to be an important topic, most of the systems nevertheless employ only an ultra-weak form of speech dynamics; e.g., differential or delta parameters. Strong-form dynamic speech modeling, which is the focus of this monograph, may serve as an ultimate solution to this problem. After the introduction chapter, the main body of this monograph consists of four chapters. They cover various aspects of theory, algorithms, and applications of dynamic speech models, and provide a comprehensive survey of the research work in this area spanning over past 20~years. This monograph is intended as advanced materials of speech and signal processing for graudate-level teaching, for professionals and engineering practioners, as well as for seasoned researchers and engineers specialized in speech processing


Computing PROSODY

Computing PROSODY

Author: Yoshinori Sagisaka

Publisher: Springer Science & Business Media

Published: 2012-12-06

Total Pages: 405

ISBN-13: 1461222583

DOWNLOAD EBOOK

This book presents a collection of papers from the Spring 1995 Work shop on Computational Approaches to Processing the Prosody of Spon taneous Speech, hosted by the ATR Interpreting Telecommunications Re search Laboratories in Kyoto, Japan. The workshop brought together lead ing researchers in the fields of speech and signal processing, electrical en gineering, psychology, and linguistics, to discuss aspects of spontaneous speech prosody and to suggest approaches to its computational analysis and modelling. The book is divided into four sections. Part I gives an overview and theoretical background to the nature of spontaneous speech, differentiating it from the lab-speech that has been the focus of so many earlier analyses. Part II focuses on the prosodic features of discourse and the structure of the spoken message, Part ilIon the generation and modelling of prosody for computer speech synthesis. Part IV discusses how prosodic information can be used in the context of automatic speech recognition. Each section of the book starts with an invited overview paper to situate the chapters in the context of current research. We feel that this collection of papers offers interesting insights into the scope and nature of the problems concerned with the computational analysis and modelling of real spontaneous speech, and expect that these works will not only form the basis of further developments in each field but also merge to form an integrated computational model of prosody for a better understanding of human processing of the complex interactions of the speech chain.


Cross-Word Modeling for Arabic Speech Recognition

Cross-Word Modeling for Arabic Speech Recognition

Author: Dia AbuZeina

Publisher: Springer Science & Business Media

Published: 2011-11-25

Total Pages: 82

ISBN-13: 1461412137

DOWNLOAD EBOOK

Cross-Word Modeling for Arabic Speech Recognition utilizes phonological rules in order to model the cross-word problem, a merging of adjacent words in speech caused by continuous speech, to enhance the performance of continuous speech recognition systems. The author aims to provide an understanding of the cross-word problem and how it can be avoided, specifically focusing on Arabic phonology using an HHM-based classifier.


Cognitive Infocommunications, Theory and Applications

Cognitive Infocommunications, Theory and Applications

Author: Ryszard Klempous

Publisher: Springer

Published: 2018-08-25

Total Pages: 465

ISBN-13: 3319959964

DOWNLOAD EBOOK

The book gathers the chapters of Cognitive InfoCommunication research relevant to a variety of application areas, including data visualization, emotion expression, brain-computer interfaces or speech technologies. It provides an overview of the kind of cognitive capabilities that are being analyzed and developed. Based on this common ground, it may become possible to see new opportunities for synergy among disciplines that were heretofore viewed as being separate. Cognitive InfoCommunication begins by modeling human cognitive states and aptitudes in order to better understand what the user of a system is capable of comprehending and doing. The patterns of exploration and the specific tools that are described can certainly be of interest and of great relevance for all researchers who focus on modeling human states and aptitudes. This innovative research area provides answers to the latest challenges in influence of cognitive states and aptitudes in order to facilitate learning or generally improve performance in certain cognitive tasks such as decision making. Some capabilities are purely human, while others are purely artificial, but in general this distinction is rarely clear-cut. Therefore, when discussing new human cognitive capabilities, the technological background which makes them possible cannot be neglected, and indeed often plays a central role. This book highlights the synergy between various fields that are perfectly fit under the umbrella of CogInfoCom and contribute to understanding and developing new, human-artificial intelligence hybrid capabilities. These, merged capabilities are currently appearing, and the importance of the role they play in everyday life are unique to the cognitive entity generation that is currently growing up.


The Application of Hidden Markov Models in Speech Recognition

The Application of Hidden Markov Models in Speech Recognition

Author: Mark Gales

Publisher: Now Publishers Inc

Published: 2008

Total Pages: 125

ISBN-13: 1601981201

DOWNLOAD EBOOK

The Application of Hidden Markov Models in Speech Recognition presents the core architecture of a HMM-based LVCSR system and proceeds to describe the various refinements which are needed to achieve state-of-the-art performance.


Advances in Speech Recognition

Advances in Speech Recognition

Author: Noam Shabtai

Publisher: BoD – Books on Demand

Published: 2010-08-16

Total Pages: 177

ISBN-13: 9533070978

DOWNLOAD EBOOK

In the last decade, further applications of speech processing were developed, such as speaker recognition, human-machine interaction, non-English speech recognition, and non-native English speech recognition. This book addresses a few of these applications. Furthermore, major challenges that were typically ignored in previous speech recognition research, such as noise and reverberation, appear repeatedly in recent papers. I would like to sincerely thank the contributing authors, for their effort to bring their insights and perspectives on current open questions in speech recognition research.


Connectionist Speech Recognition

Connectionist Speech Recognition

Author: Hervé A. Bourlard

Publisher: Springer Science & Business Media

Published: 2012-12-06

Total Pages: 329

ISBN-13: 1461532108

DOWNLOAD EBOOK

Connectionist Speech Recognition: A Hybrid Approach describes the theory and implementation of a method to incorporate neural network approaches into state of the art continuous speech recognition systems based on hidden Markov models (HMMs) to improve their performance. In this framework, neural networks (and in particular, multilayer perceptrons or MLPs) have been restricted to well-defined subtasks of the whole system, i.e. HMM emission probability estimation and feature extraction. The book describes a successful five-year international collaboration between the authors. The lessons learned form a case study that demonstrates how hybrid systems can be developed to combine neural networks with more traditional statistical approaches. The book illustrates both the advantages and limitations of neural networks in the framework of a statistical systems. Using standard databases and comparison with some conventional approaches, it is shown that MLP probability estimation can improve recognition performance. Other approaches are discussed, though there is no such unequivocal experimental result for these methods. Connectionist Speech Recognition is of use to anyone intending to use neural networks for speech recognition or within the framework provided by an existing successful statistical approach. This includes research and development groups working in the field of speech recognition, both with standard and neural network approaches, as well as other pattern recognition and/or neural network researchers. The book is also suitable as a text for advanced courses on neural networks or speech processing.


Berkshire Encyclopedia of Human-computer Interaction

Berkshire Encyclopedia of Human-computer Interaction

Author: William Sims Bainbridge

Publisher: Berkshire Publishing Group LLC

Published: 2004

Total Pages: 900

ISBN-13: 0974309125

DOWNLOAD EBOOK

Presents a collection of articles on human-computer interaction, covering such topics as applications, methods, hardware, and computers and society.


Speech and Computer

Speech and Computer

Author: Andrey Ronzhin

Publisher: Springer

Published: 2016-08-15

Total Pages: 747

ISBN-13: 3319439588

DOWNLOAD EBOOK

This book constitutes the proceedings of the 18th International Conference on Speech and Computer, SPECOM 2016, held in Budapest, Hungary, in August 2016. The 85 papers presented in this volume were carefully reviewed and selected from 154 submissions.