Automatic Speech Recognition

Automatic Speech Recognition

Publisher: Springer
Automatic Speech Recognition Book Resume:

This book provides a comprehensive overview of the recent advancement in the field of automatic speech recognition with a focus on deep learning models including deep neural networks and many of their variants. This is the first automatic speech recognition book dedicated to the deep learning approach. In addition to the rigorous mathematical treatment of the subject, the book also presents insights and theoretical foundation of a series of highly successful deep learning models.

Robust Automatic Speech Recognition

Author: Jinyu Li,Li Deng,Reinhold Haeb-Umbach,Yifan Gong
Publisher: Academic Press
Robust Automatic Speech Recognition by Jinyu Li,Li Deng,Reinhold Haeb-Umbach,Yifan Gong Book Resume:

Robust Automatic Speech Recognition: A Bridge to Practical Applications establishes a solid foundation for automatic speech recognition that is robust against acoustic environmental distortion. It provides a thorough overview of classical and modern noise-and reverberation robust techniques that have been developed over the past thirty years, with an emphasis on practical methods that have been proven to be successful and which are likely to be further developed for future applications. The strengths and weaknesses of robustness-enhancing speech recognition techniques are carefully analyzed. The book covers noise-robust techniques designed for acoustic models which are based on both Gaussian mixture models and deep neural networks. In addition, a guide to selecting the best methods for practical applications is provided. The reader will: Gain a unified, deep and systematic understanding of the state-of-the-art technologies for robust speech recognition Learn the links and relationship between alternative technologies for robust speech recognition Be able to use the technology analysis and categorization detailed in the book to guide future technology development Be able to develop new noise-robust methods in the current era of deep learning for acoustic modeling in speech recognition The first book that provides a comprehensive review on noise and reverberation robust speech recognition methods in the era of deep neural networks Connects robust speech recognition techniques to machine learning paradigms with rigorous mathematical treatment Provides elegant and structural ways to categorize and analyze noise-robust speech recognition techniques Written by leading researchers who have been actively working on the subject matter in both industrial and academic organizations for many years

Automatic Speech Recognition on Mobile Devices and over Communication Networks

Author: Zheng-Hua Tan,Boerge Lindberg
Publisher: Springer Science & Business Media
Automatic Speech Recognition on Mobile Devices and over Communication Networks by Zheng-Hua Tan,Boerge Lindberg Book Resume:

The advances in computing and networking have sparked an enormous interest in deploying automatic speech recognition on mobile devices and over communication networks. This book brings together academic researchers and industrial practitioners to address the issues in this emerging realm and presents the reader with a comprehensive introduction to the subject of speech recognition in devices and networks. It covers network, distributed and embedded speech recognition systems.

Automatic Speech Recognition

Author: Kai-Fu Lee
Publisher: Springer Science & Business Media
Automatic Speech Recognition by Kai-Fu Lee Book Resume:

Speech Recognition has a long history of being one of the difficult problems in Artificial Intelligence and Computer Science. As one goes from problem solving tasks such as puzzles and chess to perceptual tasks such as speech and vision, the problem characteristics change dramatically: knowledge poor to knowledge rich; low data rates to high data rates; slow response time (minutes to hours) to instantaneous response time. These characteristics taken together increase the computational complexity of the problem by several orders of magnitude. Further, speech provides a challenging task domain which embodies many of the requirements of intelligent behavior: operate in real time; exploit vast amounts of knowledge, tolerate errorful, unexpected unknown input; use symbols and abstractions; communicate in natural language and learn from the environment. Voice input to computers offers a number of advantages. It provides a natural, fast, hands free, eyes free, location free input medium. However, there are many as yet unsolved problems that prevent routine use of speech as an input device by non-experts. These include cost, real time response, speaker independence, robustness to variations such as noise, microphone, speech rate and loudness, and the ability to handle non-grammatical speech. Satisfactory solutions to each of these problems can be expected within the next decade. Recognition of unrestricted spontaneous continuous speech appears unsolvable at present. However, by the addition of simple constraints, such as clarification dialog to resolve ambiguity, we believe it will be possible to develop systems capable of accepting very large vocabulary continuous speechdictation.

Techniques for Noise Robustness in Automatic Speech Recognition

Author: Tuomas Virtanen,Rita Singh,Bhiksha Raj
Publisher: John Wiley & Sons
Techniques for Noise Robustness in Automatic Speech Recognition by Tuomas Virtanen,Rita Singh,Bhiksha Raj Book Resume:

Automatic speech recognition (ASR) systems are findingincreasing use in everyday life. Many of the commonplaceenvironments where the systems are used are noisy, for exampleusers calling up a voice search system from a busy cafeteria or astreet. This can result in degraded speech recordings and adverselyaffect the performance of speech recognition systems. As theuse of ASR systems increases, knowledge of the state-of-the-art intechniques to deal with such problems becomes critical to systemand application engineers and researchers who work with or on ASRtechnologies. This book presents a comprehensive survey of thestate-of-the-art in techniques used to improve the robustness ofspeech recognition systems to these degrading externalinfluences. Key features: Reviews all the main noise robust ASR approaches, includingsignal separation, voice activity detection, robust featureextraction, model compensation and adaptation, missing datatechniques and recognition of reverberant speech. Acts as a timely exposition of the topic in light of morewidespread use in the future of ASR technology in challengingenvironments. Addresses robustness issues and signal degradation which areboth key requirements for practitioners of ASR. Includes contributions from top ASR researchers from leadingresearch units in the field

Robustness in Automatic Speech Recognition

Author: Jean-Claude Junqua,Jean-Paul Haton
Publisher: Springer Science & Business Media
Robustness in Automatic Speech Recognition by Jean-Claude Junqua,Jean-Paul Haton Book Resume:

Foreword Looking back the past 30 years. we have seen steady progress made in the area of speech science and technology. I still remember the excitement in the late seventies when Texas Instruments came up with a toy named "Speak-and-Spell" which was based on a VLSI chip containing the state-of-the-art linear prediction synthesizer. This caused a speech technology fever among the electronics industry. Particularly. applications of automatic speech recognition were rigorously attempt ed by many companies. some of which were start-ups founded just for this purpose. Unfortunately. it did not take long before they realized that automatic speech rec ognition technology was not mature enough to satisfy the need of customers. The fever gradually faded away. In the meantime. constant efforts have been made by many researchers and engi neers to improve the automatic speech recognition technology. Hardware capabilities have advanced impressively since that time. In the past few years. we have been witnessing and experiencing the advent of the "Information Revolution." What might be called the second surge of interest to com mercialize speech technology as a natural interface for man-machine communication began in much better shape than the first one. With computers much more powerful and faster. many applications look realistic this time. However. there are still tremendous practical issues to be overcome in order for speech to be truly the most natural interface between humans and machines.

Automatic Speech and Speaker Recognition

Author: Chin-Hui Lee,Frank K. Soong,Kuldip K. Paliwal
Publisher: Springer Science & Business Media
Automatic Speech and Speaker Recognition by Chin-Hui Lee,Frank K. Soong,Kuldip K. Paliwal Book Resume:

Research in the field of automatic speech and speaker recognition has made a number of significant advances in the last two decades, influenced by advances in signal processing, algorithms, architectures, and hardware. These advances include: the adoption of a statistical pattern recognition paradigm; the use of the hidden Markov modeling framework to characterize both the spectral and the temporal variations in the speech signal; the use of a large set of speech utterance examples from a large population of speakers to train the hidden Markov models of some fundamental speech units; the organization of speech and language knowledge sources into a structural finite state network; and the use of dynamic, programming based heuristic search methods to find the best word sequence in the lexical network corresponding to the spoken utterance. Automatic Speech and Speaker Recognition: Advanced Topics groups together in a single volume a number of important topics on speech and speaker recognition, topics which are of fundamental importance, but not yet covered in detail in existing textbooks. Although no explicit partition is given, the book is divided into five parts: Chapters 1-2 are devoted to technology overviews; Chapters 3-12 discuss acoustic modeling of fundamental speech units and lexical modeling of words and pronunciations; Chapters 13-15 address the issues related to flexibility and robustness; Chapter 16-18 concern the theoretical and practical issues of search; Chapters 19-20 give two examples of algorithm and implementational aspects for recognition system realization. Audience: A reference book for speech researchers and graduate students interested in pursuing potential research on the topic. May also be used as a text for advanced courses on the subject.

Automatic Speech Analysis and Recognition

Author: Jean-Paul Haton
Publisher: Springer Science & Business Media
Automatic Speech Analysis and Recognition by Jean-Paul Haton Book Resume:

This book is the result of the second NATO Advanced Study Institute on speech processing held at the Chateau de Bonas, France, from June 29th to July 10th, 1981. This Institute provided a high-level coverage of the fields of speech transmission, recognition and understanding, which constitute important areas where research activity has re cently been associated with actual industrial developments. This book will therefore include both fundamental and applied topics. Ten survey papers by some of the best specialists in the field are included. They give an up-to-date presentation of several important problems in automatic speech processing. As a consequence the book can be considered as a reference manual on some important areas of automatic speech processing. The surveys are indicated by 'a * in the table of contents. This book also contains research papers corresponding to original works, which were presented during the panel sessions of the Institute. For the sake of clarity the book has been divided into five sections : 1. Speech Analysis and Transmission: An emphasis has been laid on the techniques of linear prediction (LPC), and the problems involved in the transmission of speech at various bit rates are addressed in details. 2. Acoustics and Phonetics : One'of the major bottleneck in the development of speech recogni tion systems remains the transcription of the continuous speech wave into some discrete strings or lattices of phonetic symbols. Two survey papers discuss this problem from different points of view and several practical systems are also described.

New Systems and Architectures for Automatic Speech Recognition and Synthesis

Author: Renato DeMori,Ching Y. Suen
Publisher: Springer Science & Business Media
New Systems and Architectures for Automatic Speech Recognition and Synthesis by Renato DeMori,Ching Y. Suen Book Resume:

Proceedings of the NATO Advanced Study Institute on New Systems and Architecture for Automatic Speech Recognition and Synthesis, held at Bonas, Gers, France, 2-14 July 1984

Distant Speech Recognition

Author: Matthias Woelfel,John McDonough
Publisher: John Wiley & Sons
Distant Speech Recognition by Matthias Woelfel,John McDonough Book Resume:

A complete overview of distant automatic speech recognition The performance of conventional Automatic Speech Recognition (ASR) systems degrades dramatically as soon as the microphone is moved away from the mouth of the speaker. This is due to a broad variety of effects such as background noise, overlapping speech from other speakers, and reverberation. While traditional ASR systems underperform for speech captured with far-field sensors, there are a number of novel techniques within the recognition system as well as techniques developed in other areas of signal processing that can mitigate the deleterious effects of noise and reverberation, as well as separating speech from overlapping speakers. Distant Speech Recognitionpresents a contemporary and comprehensive description of both theoretic abstraction and practical issues inherent in the distant ASR problem. Key Features: Covers the entire topic of distant ASR and offers practical solutions to overcome the problems related to it Provides documentation and sample scripts to enable readers to construct state-of-the-art distant speech recognition systems Gives relevant background information in acoustics and filter techniques, Explains the extraction and enhancement of classification relevant speech features Describes maximum likelihood as well as discriminative parameter estimation, and maximum likelihood normalization techniques Discusses the use of multi-microphone configurations for speaker tracking and channel combination Presents several applications of the methods and technologies described in this book Accompanying website with open source software and tools to construct state-of-the-art distant speech recognition systems This reference will be an invaluable resource for researchers, developers, engineers and other professionals, as well as advanced students in speech technology, signal processing, acoustics, statistics and artificial intelligence fields.

Language Modeling for Automatic Speech Recognition of Inflective Languages

Author: Gregor Donaj,Zdravko Kačič
Publisher: Springer
Language Modeling for Automatic Speech Recognition of Inflective Languages by Gregor Donaj,Zdravko Kačič Book Resume:

This book covers language modeling and automatic speech recognition for inflective languages (e.g. Slavic languages), which represent roughly half of the languages spoken in Europe. These languages do not perform as well as English in speech recognition systems and it is therefore harder to develop an application with sufficient quality for the end user. The authors describe the most important language features for the development of a speech recognition system. This is then presented through the analysis of errors in the system and the development of language models and their inclusion in speech recognition systems, which specifically address the errors that are relevant for targeted applications. The error analysis is done with regard to morphological characteristics of the word in the recognized sentences. The book is oriented towards speech recognition with large vocabularies and continuous and even spontaneous speech. Today such applications work with a rather small number of languages compared to the number of spoken languages.

Acoustical and Environmental Robustness in Automatic Speech Recognition

Author: A. Acero
Publisher: Springer Science & Business Media
Acoustical and Environmental Robustness in Automatic Speech Recognition by A. Acero Book Resume:

The need for automatic speech recognition systems to be robust with respect to changes in their acoustical environment has become more widely appreciated in recent years, as more systems are finding their way into practical applications. Although the issue of environmental robustness has received only a small fraction of the attention devoted to speaker independence, even speech recognition systems that are designed to be speaker independent frequently perform very poorly when they are tested using a different type of microphone or acoustical environment from the one with which they were trained. The use of microphones other than a "close talking" headset also tends to severely degrade speech recognition -performance. Even in relatively quiet office environments, speech is degraded by additive noise from fans, slamming doors, and other conversations, as well as by the effects of unknown linear filtering arising reverberation from surface reflections in a room, or spectral shaping by microphones or the vocal tracts of individual speakers. Speech-recognition systems designed for long-distance telephone lines, or applications deployed in more adverse acoustical environments such as motor vehicles, factory floors, oroutdoors demand far greaterdegrees ofenvironmental robustness. There are several different ways of building acoustical robustness into speech recognition systems. Arrays of microphones can be used to develop a directionally-sensitive system that resists intelference from competing talkers and other noise sources that are spatially separated from the source of the desired speech signal.

The Speech Processing Lexicon

Author: Aditi Lahiri,Sandra Kotzor
Publisher: Walter de Gruyter GmbH & Co KG
The Speech Processing Lexicon by Aditi Lahiri,Sandra Kotzor Book Resume:

In this book, some of today’s leading neurolinguists and psycholinguists provide insight into the nature of phonological processing using behavioural measures, computational modeling, EEG and fMRI. The essays cover a range of topics including categorization, acoustic variability and invariance, underspecification, talker-specificity and machine learning, focusing on the acoustics, perception, acquisition and neural representation of speech.

Robust Adaptation to Non-Native Accents in Automatic Speech Recognition

Author: Silke Goronzy
Publisher: Springer
Robust Adaptation to Non-Native Accents in Automatic Speech Recognition by Silke Goronzy Book Resume:

Speech recognition technology is being increasingly employed in human-machine interfaces. A remaining problem however is the robustness of this technology to non-native accents, which still cause considerable difficulties for current systems. In this book, methods to overcome this problem are described. A speaker adaptation algorithm that is capable of adapting to the current speaker with just a few words of speaker-specific data based on the MLLR principle is developed and combined with confidence measures that focus on phone durations as well as on acoustic features. Furthermore, a specific pronunciation modelling technique that allows the automatic derivation of non-native pronunciations without using non-native data is described and combined with the previous techniques to produce a robust adaptation to non-native accents in an automatic speech recognition system.

Readings in Speech Recognition

Author: Alexander Waibel,Kai-Fu Lee
Publisher: Elsevier
Readings in Speech Recognition by Alexander Waibel,Kai-Fu Lee Book Resume:

After more than two decades of research activity, speech recognition has begun to live up to its promise as a practical technology and interest in the field is growing dramatically. Readings in Speech Recognition provides a collection of seminal papers that have influenced or redirected the field and that illustrate the central insights that have emerged over the years. The editors provide an introduction to the field, its concerns and research problems. Subsequent chapters are devoted to the main schools of thought and design philosophies that have motivated different approaches to speech recognition system design. Each chapter includes an introduction to the papers that highlights the major insights or needs that have motivated an approach to a problem and describes the commonalities and differences of that approach to others in the book.

Audio Processing and Speech Recognition

Author: Soumya Sen,Anjan Dutta,Nilanjan Dey
Publisher: Springer
Audio Processing and Speech Recognition by Soumya Sen,Anjan Dutta,Nilanjan Dey Book Resume:

This book offers an overview of audio processing, including the latest advances in the methodologies used in audio processing and speech recognition. First, it discusses the importance of audio indexing and classical information retrieval problem and presents two major indexing techniques, namely Large Vocabulary Continuous Speech Recognition (LVCSR) and Phonetic Search. It then offers brief insights into the human speech production system and its modeling, which are required to produce artificial speech. It also discusses various components of an automatic speech recognition (ASR) system. Describing the chronological developments in ASR systems, and briefly examining the statistical models used in ASR as well as the related mathematical deductions, the book summarizes a number of state-of-the-art classification techniques and their application in audio/speech classification. By providing insights into various aspects of audio/speech processing and speech recognition, this book appeals a wide audience, from researchers and postgraduate students to those new to the field.

Python Deep Learning Cookbook

Author: Indra den Bakker
Publisher: Packt Publishing Ltd
Python Deep Learning Cookbook by Indra den Bakker Book Resume:

Solve different problems in modelling deep neural networks using Python, Tensorflow, and Keras with this practical guide About This Book Practical recipes on training different neural network models and tuning them for optimal performance Use Python frameworks like TensorFlow, Caffe, Keras, Theano for Natural Language Processing, Computer Vision, and more A hands-on guide covering the common as well as the not so common problems in deep learning using Python Who This Book Is For This book is intended for machine learning professionals who are looking to use deep learning algorithms to create real-world applications using Python. Thorough understanding of the machine learning concepts and Python libraries such as NumPy, SciPy and scikit-learn is expected. Additionally, basic knowledge in linear algebra and calculus is desired. What You Will Learn Implement different neural network models in Python Select the best Python framework for deep learning such as PyTorch, Tensorflow, MXNet and Keras Apply tips and tricks related to neural networks internals, to boost learning performances Consolidate machine learning principles and apply them in the deep learning field Reuse and adapt Python code snippets to everyday problems Evaluate the cost/benefits and performance implication of each discussed solution In Detail Deep Learning is revolutionizing a wide range of industries. For many applications, deep learning has proven to outperform humans by making faster and more accurate predictions. This book provides a top-down and bottom-up approach to demonstrate deep learning solutions to real-world problems in different areas. These applications include Computer Vision, Natural Language Processing, Time Series, and Robotics. The Python Deep Learning Cookbook presents technical solutions to the issues presented, along with a detailed explanation of the solutions. Furthermore, a discussion on corresponding pros and cons of implementing the proposed solution using one of the popular frameworks like TensorFlow, PyTorch, Keras and CNTK is provided. The book includes recipes that are related to the basic concepts of neural networks. All techniques s, as well as classical networks topologies. The main purpose of this book is to provide Python programmers a detailed list of recipes to apply deep learning to common and not-so-common scenarios. Style and approach Unique blend of independent recipes arranged in the most logical manner

Automated Speaking Assessment

Author: Klaus Zechner,Keelan Evanini
Publisher: Routledge
Automated Speaking Assessment by Klaus Zechner,Keelan Evanini Book Resume:

Automated Speaking Assessment: Using Language Technologies to Score Spontaneous Speech provides a thorough overview of state-of-the-art automated speech scoring technology as it is currently used at Educational Testing Service (ETS). Its main focus is related to the automated scoring of spontaneous speech elicited by TOEFL iBT Speaking section items, but other applications of speech scoring, such as for more predictable spoken responses or responses provided in a dialogic setting, are also discussed. The book begins with an in-depth overview of the nascent field of automated speech scoring—its history, applications, and challenges—followed by a discussion of psychometric considerations for automated speech scoring. The second and third parts discuss the integral main components of an automated speech scoring system as well as the different types of automatically generated measures extracted by the system features related to evaluate the speaking construct of communicative competence as measured defined by the TOEFL iBT Speaking assessment. Finally, the last part of the book touches on more recent developments, such as providing more detailed feedback on test takers’ spoken responses using speech features and scoring of dialogic speech. It concludes with a discussion, summary, and outlook on future developments in this area. Written with minimal technical details for the benefit of non-experts, this book is an ideal resource for graduate students in courses on Language Testing and Assessment as well as teachers and researchers in applied linguistics.

Speech Recognition and Coding

Author: Antonio J. Rubio Ayuso,Juan M. Lopez Soler
Publisher: Springer Science & Business Media
Speech Recognition and Coding by Antonio J. Rubio Ayuso,Juan M. Lopez Soler Book Resume:

Based on a NATO Advanced Study Institute held in 1993, this book addresses recent advances in automatic speech recognition and speech coding. The book contains contributions by many of the most outstanding researchers from the best laboratories worldwide in the field. The contributions have been grouped into five parts: on acoustic modeling; language modeling; speech processing, analysis and synthesis; speech coding; and vector quantization and neural nets. For each of these topics, some of the best-known researchers were invited to give a lecture. In addition to these lectures, the topics were complemented with discussions and presentations of the work of those attending. Altogether, the reader is given a wide perspective on recent advances in the field and will be able to see the trends for future work.