General Information

General Information

Speech Recognition Engine 2.0 is a Windows-9x based computer program, allowing a User to utilize a multi-sensor setup consisting of several acoustic microphones, breath-sensor and lips movement detection sensor. Is is not a demo program and threrfore its interface is rather complicated. The Engine allows the User to experiment with different speech and other sensors' output -based spectral parameters.

Speech Recognition Engine 2.0 is based on Intel Recognition Primitives Library (release 4.018) and Intel Signal Processing Library 4.0.

The User can simply produce new spectral parameters dynamically calculated and saved in special cycled buffers called "channels". To do this, the User has some pre-defined spectral channels (i.e. linear prediction spectrum of acoustic microphone output) as a basis for this operation. After that, there are various unary and binary operations with spectral channels. For example, to produce the time-smoothed log-LP-spectrum of a signal from a microphone, the User should take, first of all, a LP-spectrum as a parent channel, then create a time-smoothed child spectral channel using an unary time-smoothing operation and then produce a required channel using a log() unary spectral operation.

Moreover, the Engine allows the User to control the speech-detection process. It is possible to produce special "speech detection" spectral channels using "speech detection" unary operation. This is done in 3 steps: creating a noise-resistant spectral channel, in which silence and speech can be separated visually; taking this channel as a parent one for speech-detection channel and then training the speech detection parameters.

Several speech detectors may work simmultaniousely. In this case the speech is consedered to be detected if and only if all the detectors detect speech.