ASR Brief History: Trends Followed at Different Point of Time
ASR Brief History: Trends Followed at Different Point of Time
Works relating to Speech Recognition Systems started before mid 70's. However work started in a slow pace and significant research and works were done only in the 80's. Now we have number of different paths to follow for speech recognition. However much research is still to be done inorder to make the system more robust, effective and relaiable. Researchs are going on all round the globe. Places like CMU ( Carnegie Mellon University ), MIT, Stansford, etc are significant in this matter. Dedicated people are working in developing this science.
Two models have been applied very frequently for ASR, Automatic Speech Recognition. They are frame-based and segment-based approaches. Originally the segment-based approach was followed. In the past, there have been many segment-based ASR approaches which extracted feature vectors at speci.c temporal landmarks (Cole et al., 1983), including work during the early ARPASUR project in the 1970s (Weinstein et al., 1975). Most of these e.orts were hampered however, by attempting to explicitly incorporate speech knowledge by heuristic means through intense knowledge engineering, and by lack of a stochastic framework to deal with the present state of ignorance in our understanding of the human communication process and its inherent variabilities. In this approach the voice is segmented into small units like the phonemes or sylable i.e. boundaries or the landmarks are detected and a chain of smaller units is formed. Then different methods can be followed to recognize it. A probabilistic graph network may be used to determine the correct words. Acoustic or probabilistic landmarks form the basis for a phonetic segment network, or graph. Feature vectors are extracted both over hypothesized
phonetic segments and at their boundaries for phonetic analysis. The resulting observation space(the set of all feature vectors) takes the form of an acoustic-phonetic network, or graph, whereby different paths through the graph are associated with different sets of feature vectors. This graph based observation space is quite di.erent from prevailing approaches which employ a temporal sequence of observations, which typically contain short-time spectral information (e.g., MFCCs). The segmental and feature-extraction characteristics of this recognizer provide us with a framework within which we try to incorporate knowledge of the speech signal. Recent research has proved that a frame-based approach could give a better result. In this approach recognizer do not need to decode each phonetic units. It uses a stastical analysis. It uses grammar to select the next possible words. It uses the dictionary to collect the pronunciation. Several different models like HMM ( Hidden Marcov Model ) is used by a scorer to calculate the acoustic probability for a particular unit of speech. It selects next set of likely states ,scores incoming features against these states and selects the state with highest probability and prunes low scoring states. Over the past two decades, .rst-order hidden Markov models (HMMs) have emerged as the dominant stochastic model for automatic speech recognition (ASR) (Rabiner, 1989). With a wellformed mathematical foundation, and e.cient, automated training procedures which can process the ever increasing amounts of speech data, impressive HMM-based recognizers have been created for a wide-variety of increasingly di.cult ASR tasks. Several projects are being developed using these approaches. The SUMMIT speech recognizer developed in MIT has always used a segment-based framework for its acousticphonetic representation of the speech signal (Zue et al., 1989;Glass et al., 1996). Similary another project which is being developed in the Carnegie Mellon University is the 'sphnix'. It is based on the frame-based approach. Its knowlwdge base consists of dictionary, language model and the acoustic model. It uses the HMM to evaluate the acoustic probabilities for each part of speech and generates the most likely state as the result. It is being done completely in JAVA platform. Over the past year and a half, a telephonebased, weather information system called JUPITER [14 is being developed in MIT, which is available via a toll-free number for users to query a relational database of current weather conditions using natural, conversational speech. Using information obtained from several different internet sites, JUPITER can provide weather forecasts for approximately 500 cities around the world for three to five days in advance, and can answer questions about a wide range of weather properties such temperature, wind speed, humidity, precipitation, sunrise etc., as well as weather advisory information. History of ASR is quite short and most of the works are still research based. Still much work is to be done to improve errors involved the the process.