STQ ToR
STQ Activity Report 2012
STQ Related agreements
STQ internal rule STQ Leaflet
STQ Workshop 27-28 Nov 2012 TB IPR Call
editHelp!
Technical Body Support E-Model
HTML Reference Web Pages
TB Membership

Overview Achieved work Future Developments Liaisons More Information Presentations Glossary

Overview

The primary objective of speech recognition is to enable all of us to have easy access to the full range of computer services and communication systems, without the need for all of us to be able to type, or to be near a keyboard. By using a client/server approach in combination with the latest recognition systems, distributed speech recognition (DSR) will deliver the price/performance levels and access flexibility that will begin to make this practicable and affordable. As just one example of a spectrum of possible new applications, you will be able to dictate your meeting notes directly into your enhanced cellular handset immediately after a meeting, and the draft text will already be in your personal computer, ready for editing, by the time you return to your office (or hotel room, or home).

The performance of speech recognition systems receiving speech that has been transmitted over mobile channels can be significantly degraded when compared to using an unmodified signal. The degradations are as a result of both the low bit rate speech coding and channel transmission errors. A Distributed Speech Recognition (DSR) system overcomes these problems by eliminating the speech channel and instead using an error protected data channel to send a parameterised representation of the speech, which is suitable for recognition. The processing is distributed between the terminal and the network. The terminal performs the feature parameter extraction, or the front-end of the speech recognition system. These features are transmitted over a data channel to a remote "back-end" recogniser. The end result is that the transmission channel does not affect the recognition system performance and channel invariability is achieved.

Aurora is divided into two groups for the facilitation of work:

the "Advanced Front End (AFE)" group defining the front-end and speech processing related matters

the "Applications and Protocols (A&P)" group that has been created to consider standards for Distributed Speech Recognition (DSR) client-server protocols. This group called Applications & Protocols will study:

Application requirements

Architecture;

Transport protocol;

Multi-modality;

Speech output;

Speech reconstruction.

Achieved work

Two work items have been created by the AFE group to define part of this DSR area. The first one proposes a front-end algorithm for DSR based on Mel-Cepstrum algorithm which performs in a low level of background noise environment. So far, ES 201 108 (published in April 2000) presents this algorithm for a front-end to ensure compatibility between the terminal and the remote recogniser. The current target bit rate is 4,8 kbit/s but other rates like 9,6 kbit/s can be envisaged.

The second work item (DES/STQ-00008) will present another algorithm which should be able to match the Mel-Cepstrum algorithm performance in a more demanding environment like car, airport and so on, it aims to provide substantially improved performance in background noise. This has been measured in terms of reduction in error rate when evaluated on noisy speech databases covering a range of tasks and languages. The overall reduction in error rate is 53% i.e. less than half the error rate when compared to the previous standard. This work will be made publicly available during 2002.

Future developments

A new work item (DES/STQ-00030) has recently been created to study the Front-end extension for tonal language recognition and speech reconstruction. The purpose of this work item is to enable improved performance for tonal language recognition and to provide the ability to reconstruct a speech waveform form the DSR parameters. More information (PDF)

Liaisons with other bodies

An active liaison with 3GPP™ has been initiated.

The delivery of future multi-modal services is expected to be supported by a combination of technologies standardised in different bodies e.g. 3GPP™, IETF & W3C as well as the integration of proprietary implementations of component technologies and content.

A proposal has been made to the IETF AVT group to define a RTP payload for DSR. A work item has also been created.

More information

STQ work programme

Avios DSR paper (DSR Overview)

Presentations

DSR Front-end Extension for Tonal-language Recognition and Speech Reconstruction

DSR reconstruction in many different languages

Presentations and speech files

Available here

Glossary

A&P
Applications and Protocols

AFE
Advanced DSR Front End

AVT Audio/Video Transport

DSR
Distributed Speech Recognition

RTP Real Time Protocol

Last updated: 2008-07-07 13:33:33