Digital Signal Processing and System Theory

Talk Bastian Sauert

Near-End Listening Enhancement: Theory and Application

Date: 20.01.2013, 17:15 h - 18:15 h
Room: Aquarium

Dr.-Ing. Bastian Sauert
RWTH, Aachen, Germany,


Mobile communication is often conducted in the presence of acoustical background noise. The near-end listener also experiences an increased listening effort and a possibly reduced speech intelligibility since he is located in the noisy environment and perceives a mixture of the clean far-end (downlink) speech and the acoustical background noise.

The noisy environment at the near-end side is usually not easily influenceable, like car noise at a busy street or speech babble noise in a cafeteria. In handset mode, one ear of the near-end listener is “covered” to some extend by the mobile phone. Nevertheless, the noise signal is still perceived by both ears without any possibility to intercept. Therefore, the manipulation of the far-end signal is the only way to effectively improve speech intelligibility for the near-end listener by digital signal processing, which holds in particular if the processing adapts to the sound characteristics of the ambient noise. This approach we call near-end listening enhancement (NELE).

A number of speech modification algorithms have been presented in literature to tackle the problem of NELE. To date, most of the proposed algorithms are noise independent, i.e., the same processing with the same setup is performed regardless of the SNR and other noise characteristics. This, however, also results in a modified speech signal even in quiet environments. These noise independent methods include boosting of the consonant-vowel-ratio, formant enhancement, manipulation of duration and prosody, and more advanced manipulations of the temporal structure. Recently, some techniques have been studied which utilize prior knowledge or estimates of the noise context. These approaches include formant enhancement, modification of the local SNR, spectral shaping and dynamic range compression, and optimization with respect to an objective criterion.

We derived a NELE algorithm which maximizes the Speech Intelligibility Index (SII) and thus speech intelligibility by frequency selective increase of the speech signal power. This represents an upper performance bound, which can only be reached with high-end loudspeakers. In mobile phones, however, the restrictions of the commonly used micro-loudspeakers need to be considered. Especially in hands-free operation, the maximum thermal load of the micro-loudspeaker constitutes a major limitation. The overall audio power is restricted to a maximum power, which refers to a constrained optimization of the SII.

Besides mobile telephony in handset as well as hands-free mode, near-end listening enhancement can further be applied in headphones, hands-free conference terminars, car multimedia systems, public address systems and digital hearing aids.