An advanced neural network-based system for enhancing single-channel speech by suppressing noise and reverberation, featuring dynamic control of suppression levels and improved performance for automated speech systems.
A sample spectrogram

Speech enhancement stands as a crucial technology in various fields, especially in broadcasting, telecommunications, and automated speech recognition systems. With an increasing digital communications demand, noise and reverberation often interfere with single-channel speech quality, necessitating efficient noise-suppression techniques. However, it is challenging to suppress noise without harming speech integrity, which is where this technology fits. Existing algorithms designed for noise and reverberation suppression often fail to preserve the original speech's quality. The typical trade-off between noise suppression and speech distortion is a fundamental limitation of many current technologies. Moreover, automated speech systems often struggle with identifying speakers and languages with a noisy background, often requiring a pre-processing step that can effectively reduce environmental noise.

Technology Description

The technology is a neural network-based end-to-end single-channel speech enhancement system. It is designed for joint suppression of noise and reverberation and integrates a unique approach known as attention masking. The system features both an autoencoder and an enhancement path. Switching off the masking mechanism facilitates the reconstruction of the original speech signal. Moreover, a novel loss function is employed to train both the enhancement and the autoencoder paths simultaneously, incorporating a perceptually-motivated waveform distance measure. These features collectively contribute to maintaining high speech quality and dynamic control of suppression levels. What sets this technology apart is its capacity to significantly suppress noise while preserving the integrity of the speech quality. This balance is a highly sought characteristic in speech enhancement technologies. Additionally, the enhancement system can aid the performance of automated speech systems, like speaker and language recognition, effectively serving as a pre-processing step. Therefore, it offers a remarkable service in automated speech technology by improving overall performance.

Benefits

  • Simultaneous suppression of noise and reverberation.
  • Maintains high speech quality.
  • Improves performance of automated speech systems.
  • Dynamic control of suppression levels to avoid over-suppression.
  • Unique loss function for effective simultaneous training.

Potential Use Cases

  • Telecommunications: Enhancing call quality by reducing noise and reverberation.
  • Audio Broadcasting: Ensuring clear sound for both live and recorded sessions.
  • Automated Speech Systems: Serving as a pre-processing step to improve language and speaker recognition.
  • Audio Forensics: Assisting in better clarity in noisy or distorted audio files.
  • Personal Assistants and Smart Speakers: Improving user interaction with noise-free, clear voice commands.