OverviewThe primary objective of speech recognition is to enable all of us to have easy access to the full range of computer services and communication systems, without the need for all of us to be able to type, or to be near a keyboard. By using a client/server approach in combination with the latest recognition systems, distributed speech recognition (DSR) will deliver the price/performance levels and access flexibility that will begin to make this practicable and affordable. As just one example of a spectrum of possible new applications, you will be able to dictate your meeting notes directly into your enhanced cellular handset immediately after a meeting, and the draft text will already be in your personal computer, ready for editing, by the time you return to your office (or hotel room, or home). The performance of speech recognition systems receiving speech that has been transmitted over mobile channels can be significantly degraded when compared to using an unmodified signal. The degradations are as a result of both the low bit rate speech coding and channel transmission errors. A Distributed Speech Recognition (DSR) system overcomes these problems by eliminating the speech channel and instead using an error protected data channel to send a parameterised representation of the speech, which is suitable for recognition. The processing is distributed between the terminal and the network. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to a remote "back-end" recogniser. The end result is that the transmission channel does not affect the recognition system performance and channel invariability is achieved. Aurora is divided into two groups for the facilitation of work:
Achieved workTwo work items have been created by the AFE group to define part of this DSR area. The first one proposes a front-end algorithm for DSR based on Mel-Cepstrum algorithm which performs in a low level of background noise environment. So far, ES 201 108 (published in April 2000) presents this algorithm for a front-end to ensure compatibility between the terminal and the remote recogniser. The current target bit rate is 4,8 kbit/s but other rates like 9,6 kbit/s can be envisaged. The second work item (DES/STQ-00008) will present another algorithm which should be able to match the Mel-Cepstrum algorithm performance in a more demanding environment like car, airport and so on, it aims to provide substantially improved performance in background noise. This has been measured in terms of reduction in error rate when evaluated on noisy speech databases covering a range of tasks and languages. The overall reduction in error rate is 53% i.e. less than half the error rate when compared to the previous standard. This work will be made publicly available during 2002. Future developmentsA new work item (DES/STQ-00030) has recently been created to study the Front-end extension for tonal language recognition and speech reconstruction. The purpose of this work item is to enable improved performance for tonal language recognition and to provide the ability to reconstruct a speech waveform form the DSR parameters. More information (PDF) Liaisons with other bodiesAn active liaison with 3GPP™ has been initiated. The delivery of future multi-modal services is expected to be supported by a combination of technologies standardised in different bodies e.g. 3GPP™, IETF & W3C as well as the integration of proprietary implementations of component technologies and content. A proposal has been made to the IETF AVT group to define a RTP payload for DSR. A work item has also been created. More informationAvios DSR paper (DSR Overview) PresentationsDSR Front-end Extension for Tonal-language Recognition and Speech Reconstruction DSR reconstruction in many different languages Presentations and speech files
|
A&P |
Applications and Protocols |
AFE |
Advanced DSR Front End |
AVT | Audio/Video Transport |
DSR |
Distributed Speech Recognition |
RTP | Real Time Protocol |
Last updated:
2008-07-07 13:33:33